Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")

2017-05-10 Thread Anton Ivanov
On 11/05/17 03:43, Jason Wang wrote:
>
>
> On 2017年05月10日 17:42, Anton Ivanov wrote:
>> On 10/05/17 09:56, Jason Wang wrote:
>>>
>>>
>>> On 2017年05月10日 13:28, Anton Ivanov wrote:
 On 10/05/17 03:18, Jason Wang wrote:
>
> On 2017年05月09日 23:11, Stefan Hajnoczi wrote:
>> On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:
>>> I have figured it out. Two issues.
>>>
>>> 1) skb->xmit_more is hardly ever set under virtualization because
>>> the qdisc
>>> is usually bypassed because of TCQ_F_CAN_BYPASS. Once
>>> TCQ_F_CAN_BYPASS is
>>> set a virtual NIC driver is not likely see skb->xmit_more (this
>>> answers my
>>> "how does this work at all" question).
>>>
>>> 2) If that flag is turned off (I patched sched_generic to turn it
>>> off in
>>> pfifo_fast while testing), DQL keeps xmit_more from being set. If
>>> the driver
>>> is not DQL enabled xmit_more is never ever set. If the driver is
>>> DQL
>>> enabled
>>> the queue is adjusted to ensure xmit_more stops happening within
>>> 10-15 xmit
>>> cycles.
>>>
>>> That is plain *wrong* for virtual NICs - virtio, emulated NICs,
>>> etc.
>>> There,
>>> the BIG cost is telling the hypervisor that it needs to "kick" the
>>> packets.
>>> The cost of putting them into the vNIC buffers is negligible.
>>> You want
>>> xmit_more to happen - it makes between 50% and 300% (depending
>>> on vNIC
>>> design) difference. If there is no xmit_more the vNIC will
>>> immediately
>>> "kick" the hypervisor and try to signal that  the packet needs
>>> to move
>>> straight away (as for example in virtio_net).
> How do you measure the performance? TCP or just measure pps?
 In this particular case - tcp from guest. I have a couple of other
 benchmarks (forwarding, etc).
>>>
>>> One more question, is the number for virtio-net or other emulated vNIC?
>>
>> Other for now - you are cc-ed to keep you in the loop.
>>
>> Virtio is next on my list - I am revisiting the l2tpv3.c driver in
>> QEMU and looking at how to preserve bulking by adding back sendmmsg
>> (as well as a list of other features/transports).
>>
>> We had sendmmsg removed for the final inclusion in QEMU 2.1, it
>> presently uses only recvmmsg so for the time being it does not care.
>> That will most likely change once it starts using sendmmsg as well.
>
> An issue is that qemu net API does not support bulking, do you plan to
> add it?

Yes :)

A.

>
> Thanks
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661



[PATCH net] net: sched: optimize class dumps

2017-05-10 Thread Eric Dumazet
From: Eric Dumazet 

In commit 59cc1f61f09c ("net: sched: convert qdisc linked list to
hashtable") we missed the opportunity to considerably speed up
tc_dump_tclass_root() if a qdisc handle is provided by user.

Instead of iterating all the qdiscs, use qdisc_match_from_root()
to directly get the one we look for.

Signed-off-by: Eric Dumazet 
Cc: Jiri Kosina 
Cc: Jamal Hadi Salim 
Cc: Cong Wang 
Cc: Jiri Pirko 
---
 net/sched/sch_api.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 
bbe57d57b67fd498692bd41db49147511f1bb091..e88342fde1bc409aed6a3c86e7a628030eaac66f
 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1831,6 +1831,12 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
struct sk_buff *skb,
if (!qdisc_dev(root))
return 0;
 
+   if (tcm->tcm_parent) {
+   q = qdisc_match_from_root(root, TC_H_MAJ(tcm->tcm_parent));
+   if (q && tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
+   return -1;
+   return 0;
+   }
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;




RE: [PATCH] net: fec: select queue depending on VLAN priority

2017-05-10 Thread Andy Duan
From: Stefan Agner  Sent: Thursday, May 11, 2017 12:08 PM
>To: Andy Duan 
>Cc: David Miller ; and...@lunn.ch;
>feste...@gmail.com; netdev@vger.kernel.org; linux-
>ker...@vger.kernel.org
>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority
>
>On 2017-05-09 19:42, Andy Duan wrote:
>> From: David Miller  Sent: Tuesday, May 09, 2017
>> 9:39 PM
>>>To: ste...@agner.ch
>>>Cc: Andy Duan ; and...@lunn.ch;
>>>feste...@gmail.com; netdev@vger.kernel.org; linux-
>>>ker...@vger.kernel.org
>>>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority
>>>
>>>From: Stefan Agner 
>>>Date: Mon,  8 May 2017 22:37:08 -0700
>>>
 Since the addition of the multi queue code with commit 59d0f7465644
 ("net: fec: init multi queue date structure") the queue selection
 has been handelt by the default transmit queue selection
 implementation which tries to evenly distribute the traffic across
 all available queues. This selection presumes that the queues are
 using an equal priority, however, the queues 1 and 2 are actually of
 higher priority (the classification of the queues is enabled in
>fec_enet_enable_ring).

 This can lead to net scheduler warnings and continuous TX ring dumps
 when exercising the system with iperf.

 Use only queue 0 for all common traffic (no VLAN and P802.1p
 priority
 0 and 1) and route level 2-7 through queue 1 and 2.

 Signed-off-by: Fugang Duan 
 Fixes: 59d0f7465644 ("net: fec: init multi queue date structure")
>>>
>>>If the queues are used for prioritization, and it does not have
>>>multiple normal priority level queues, multiqueue is not what the
>>>driver should have implemented.
>> Firstly, HW multiple queues support:
>>  - Traffic-shaping bandwidth distribution supports credit-based and
>> round-robin-based policies. Either policy can be combined with
>> time-based shaping.
>>  - AVB (Audio Video Bridging, IEEE 802.1Qav) features:
>>  * Credit-based bandwidth distribution policy can be combined
>with
>> time-based shaping
>>  * AVB endpoint talker and listener support
>>  * Support for arbitration between different priority traffic 
>> (for
>> example, AVB class A, AVB class B, and non-AVB) Round-robin-based
>> policies:
>>  It has the same priority for three queues: In the round-robin QoS
>> scheme, each queue is given an equal opportunity to transmit one
>> frame. For example, if queue n has a frame to transmit, the queue
>> transmits its frame. After queue n has transmitted its frame, or if
>> queue n does not have a frame to transmit, queue n+1 is then allowed
>> to transmit its frame, and so on.
>>
>> Credit-based policies:
>>  The AVB credit based shaper acts independently, per class, to control
>> the bandwidth distribution between normal traffic and time-sensitive
>> traffic with respect to the total link bandwidth available.
>>  Credit based shaper conbined with time-based shaping:
>>  - priority: ClassA queue > ClassB queue > best effort
>>  - ensure the queue bandwidth as user set based on time-
>based shaping
>> algorithms (transmitter transmit frame from three queue in turn based
>> on time-based shaping algorithms)
>>  And in real AVB case,  each streaming can be independent, and are
>> fixed on related queue. Then driver level should implement
>> .ndo_select_queue() to put the streaming into related queue. That is
>> what the patch did.
>>
>> The current driver config the three queue to credit-based policies
>> (AVB), the patch seems no problem for the implementation. Do you have
>> any suggestion ?
>>
>
>I tried using the round robin mode by adding this:
>
>+   /* Set Round-Robin policy */
>+   writel(1, fep->hwp + FEC_QOS_SCHEME);
>
>After a while I got the warning non the less:
>
>WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
>dev_watchdog+0x248/0x24c NETDEV WATCHDOG: eth0 (fec): transmit queue
>2 timed out Modules linked in:
>CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00058-g56d22eced8bc-
>dirty #377 Hardware name: Freescale i.MX7 Dual (Device Tree) []
>(unwind_backtrace) from [] (show_stack+0x10/0x14) []
>(show_stack) from [] (dump_stack+0x78/0x8c) []
>(dump_stack) from [] (__warn+0xe8/0x100) []
>(__warn) from [] (warn_slowpath_fmt+0x38/0x48) []
>(warn_slowpath_fmt) from []
>(dev_watchdog+0x248/0x24c)
>[] (dev_watchdog) from [] (call_timer_fn+0x28/0x98)
>[] (call_timer_fn) from [] (expire_timers+0xa0/0xac)
>[] (expire_timers) from []
>(run_timer_softirq+0x9c/0x194)
>[] (run_timer_softirq) from []
>(__do_softirq+0x114/0x234)
>[] (__do_softirq) from [] (irq_exit+0xcc/0x108)
>[] (irq_exit) from []
>(__handle_domain_irq+0x80/0xec)
>[] (__handle_domain_irq) from []
>(gic_handle_irq+0x48/0x8c)
>[] (gic_handle_irq) from [] (__irq_svc+0x58/0x8c)

Re: [PATCH] net: fec: select queue depending on VLAN priority

2017-05-10 Thread Stefan Agner
On 2017-05-09 19:42, Andy Duan wrote:
> From: David Miller  Sent: Tuesday, May 09, 2017 9:39 PM
>>To: ste...@agner.ch
>>Cc: Andy Duan ; and...@lunn.ch;
>>feste...@gmail.com; netdev@vger.kernel.org; linux-
>>ker...@vger.kernel.org
>>Subject: Re: [PATCH] net: fec: select queue depending on VLAN priority
>>
>>From: Stefan Agner 
>>Date: Mon,  8 May 2017 22:37:08 -0700
>>
>>> Since the addition of the multi queue code with commit 59d0f7465644
>>> ("net: fec: init multi queue date structure") the queue selection has
>>> been handelt by the default transmit queue selection implementation
>>> which tries to evenly distribute the traffic across all available
>>> queues. This selection presumes that the queues are using an equal
>>> priority, however, the queues 1 and 2 are actually of higher priority
>>> (the classification of the queues is enabled in fec_enet_enable_ring).
>>>
>>> This can lead to net scheduler warnings and continuous TX ring dumps
>>> when exercising the system with iperf.
>>>
>>> Use only queue 0 for all common traffic (no VLAN and P802.1p priority
>>> 0 and 1) and route level 2-7 through queue 1 and 2.
>>>
>>> Signed-off-by: Fugang Duan 
>>> Fixes: 59d0f7465644 ("net: fec: init multi queue date structure")
>>
>>If the queues are used for prioritization, and it does not have multiple 
>>normal
>>priority level queues, multiqueue is not what the driver should have
>>implemented.
> Firstly, HW multiple queues support:
>   - Traffic-shaping bandwidth distribution supports credit-based and
> round-robin-based policies. Either policy can be combined with
> time-based shaping.
>   - AVB (Audio Video Bridging, IEEE 802.1Qav) features:
>   * Credit-based bandwidth distribution policy can be combined 
> with
> time-based shaping
>   * AVB endpoint talker and listener support
>   * Support for arbitration between different priority traffic 
> (for
> example, AVB class A, AVB class B, and non-AVB)
> Round-robin-based policies:
>   It has the same priority for three queues: In the round-robin QoS
> scheme, each queue is given an equal opportunity to transmit one
> frame. For example, if queue n has a frame to transmit, the queue
> transmits its frame. After queue n has transmitted its frame, or if
> queue n does not have a frame to transmit, queue n+1 is then allowed
> to transmit its frame, and so on.
> 
> Credit-based policies:
>   The AVB credit based shaper acts independently, per class, to control
> the bandwidth distribution between normal traffic and time-sensitive
> traffic with respect to the total link bandwidth available.
>   Credit based shaper conbined with time-based shaping:  
>   - priority: ClassA queue > ClassB queue > best effort
>   - ensure the queue bandwidth as user set based on time-based 
> shaping
> algorithms (transmitter transmit frame from three queue in turn based
> on time-based shaping algorithms)
>   And in real AVB case,  each streaming can be independent, and are
> fixed on related queue. Then driver level should implement
> .ndo_select_queue() to put the streaming into related queue. That is
> what the patch did.
> 
> The current driver config the three queue to credit-based policies
> (AVB), the patch seems no problem for the implementation. Do you have
> any suggestion ?
> 

I tried using the round robin mode by adding this:

+   /* Set Round-Robin policy */
+   writel(1, fep->hwp + FEC_QOS_SCHEME);

After a while I got the warning non the less:

WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
dev_watchdog+0x248/0x24c
NETDEV WATCHDOG: eth0 (fec): transmit queue 2 timed out
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.11.0-rc1-00058-g56d22eced8bc-dirty #377
Hardware name: Freescale i.MX7 Dual (Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x78/0x8c)
[] (dump_stack) from [] (__warn+0xe8/0x100)
[] (__warn) from [] (warn_slowpath_fmt+0x38/0x48)
[] (warn_slowpath_fmt) from []
(dev_watchdog+0x248/0x24c)
[] (dev_watchdog) from [] (call_timer_fn+0x28/0x98)
[] (call_timer_fn) from [] (expire_timers+0xa0/0xac)
[] (expire_timers) from []
(run_timer_softirq+0x9c/0x194)
[] (run_timer_softirq) from []
(__do_softirq+0x114/0x234)
[] (__do_softirq) from [] (irq_exit+0xcc/0x108)
[] (irq_exit) from []
(__handle_domain_irq+0x80/0xec)
[] (__handle_domain_irq) from []
(gic_handle_irq+0x48/0x8c)
[] (gic_handle_irq) from [] (__irq_svc+0x58/0x8c)
Exception stack(0xc1001f28 to 0xc1001f70)
1f20:   0001   c022fdc0 c100
c1003d80
1f40: c1003d34 c0e72ed0 c0bd9b04 c1001f80   
c1001f78
1f60: c022048c c0220490 6013 
[] (__irq_svc) from [] (arch_cpu_idle+0x38/0x3c)
[] (arch_cpu_idle) from [] (do_idle+0x170/0x204)
[] (do_idle) from [] (cpu_startup_entry+0x18/0x1c)
[] 

Re: [PATCH] libertas: Avoid reading past end of buffer

2017-05-10 Thread Kalle Valo
Joe Perches  writes:

> unrelated trivia:
>
> lbs_deb_enter is used incorrectly here at
> function exit as both enter and leave calls.
>
> That type of copy/paste defect may be common.
>
> $ git grep -w lbs_deb_enter | wc -l
> 148
> $ git grep -w lbs_deb_leave | wc -l
> 71
>
> One would expect these numbers to be the same.
>
> Another option would be to delete all these
> calls as ftrace function tracing works well.

Yeah, deleting all the enter/exit calls would be better.

-- 
Kalle Valo


Re: [PATCH net-next V4 10/10] vhost_net: try batch dequing from skb array

2017-05-10 Thread Jason Wang



On 2017年05月10日 20:34, Michael S. Tsirkin wrote:

On Wed, May 10, 2017 at 11:36:22AM +0800, Jason Wang wrote:

We used to dequeue one skb during recvmsg() from skb_array, this could
be inefficient because of the bad cache utilization and spinlock
touching for each packet. This patch tries to batch them by calling
batch dequeuing helpers explicitly on the exported skb array and pass
the skb back through msg_control for underlayer socket to finish the
userspace copying.

Batch dequeuing is also the requirement for more batching improvement
on rx.

Tests were done by pktgen on tap with XDP1 in guest on top of batch
zeroing:

rx batch | pps

2562.41Mpps (+6.16%)
1282.48Mpps (+8.80%)
64 2.38Mpps (+3.96%) <- Default
16 2.31Mpps (+1.76%)
4  2.31Mpps (+1.76%)
1  2.30Mpps (+1.32%)
0  2.27Mpps (+7.48%)

Signed-off-by: Jason Wang
---
  drivers/vhost/net.c | 117 +---
  1 file changed, 111 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9b51989..fbaecf3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -28,6 +28,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  
  #include 
  
@@ -85,6 +87,13 @@ struct vhost_net_ubuf_ref {

struct vhost_virtqueue *vq;
  };
  
+#define VHOST_RX_BATCH 64

+struct vhost_net_buf {
+   struct sk_buff *queue[VHOST_RX_BATCH];
+   int tail;
+   int head;
+};
+

Do you strictly need to put this inline? This structure is quite big
already. Do you see a measureabe difference if you make it

struct sk_buff **queue;
int tail;
int head;

?


I don't.



Will also make it easier to play with the size in the future
should someone want to see how does it work e.g. for different
ring sizes.



Ok, will do this in next version

Thanks


Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")

2017-05-10 Thread Jason Wang



On 2017年05月10日 17:42, Anton Ivanov wrote:

On 10/05/17 09:56, Jason Wang wrote:



On 2017年05月10日 13:28, Anton Ivanov wrote:

On 10/05/17 03:18, Jason Wang wrote:


On 2017年05月09日 23:11, Stefan Hajnoczi wrote:

On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:

I have figured it out. Two issues.

1) skb->xmit_more is hardly ever set under virtualization because
the qdisc
is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is
set a virtual NIC driver is not likely see skb->xmit_more (this
answers my
"how does this work at all" question).

2) If that flag is turned off (I patched sched_generic to turn it
off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If
the driver
is not DQL enabled xmit_more is never ever set. If the driver is DQL
enabled
the queue is adjusted to ensure xmit_more stops happening within
10-15 xmit
cycles.

That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There,
the BIG cost is telling the hypervisor that it needs to "kick" the
packets.
The cost of putting them into the vNIC buffers is negligible. You 
want
xmit_more to happen - it makes between 50% and 300% (depending on 
vNIC
design) difference. If there is no xmit_more the vNIC will 
immediately
"kick" the hypervisor and try to signal that  the packet needs to 
move

straight away (as for example in virtio_net).

How do you measure the performance? TCP or just measure pps?

In this particular case - tcp from guest. I have a couple of other
benchmarks (forwarding, etc).


One more question, is the number for virtio-net or other emulated vNIC?


Other for now - you are cc-ed to keep you in the loop.

Virtio is next on my list - I am revisiting the l2tpv3.c driver in 
QEMU and looking at how to preserve bulking by adding back sendmmsg 
(as well as a list of other features/transports).


We had sendmmsg removed for the final inclusion in QEMU 2.1, it 
presently uses only recvmmsg so for the time being it does not care. 
That will most likely change once it starts using sendmmsg as well.


An issue is that qemu net API does not support bulking, do you plan to 
add it?


Thanks


Re: openvswitch MTU patch needed in 4.10 stable

2017-05-10 Thread David Miller
From: Stephen Hemminger 
Date: Tue, 9 May 2017 08:46:56 -0700

> Could you queue the patch to stable?

It speeds things along if you actually specify the SHA1 ID of the
specific commit being requested for a -stable backport.

I did this work, but you could have taken a brief amount of time so
that this would not have been necessary.

Thanks.




Re: [PATCH net] tcp: avoid fragmenting peculiar skbs in SACK

2017-05-10 Thread Neal Cardwell
On Wed, May 10, 2017 at 8:01 PM, Yuchung Cheng  wrote:
>
> This patch fixes a bug in splitting an SKB during SACK
> processing. Specifically if an skb contains multiple
> packets and is only partially sacked in the higher sequences,
> tcp_match_sack_to_skb() splits the skb and marks the second fragment
> as SACKed.
>
> The current code further attempts rounding up the first fragment
> to MSS boundaries. But it misses a boundary condition when the
> rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
> such an skb is pointless and causses a kernel warning and aborts
> the SACK processing. This patch universally checks such over-split
> before calling tcp_fragment to prevent these unnecessary warnings.
>
> Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Soheil Hassas Yeganeh 

Acked-by: Neal Cardwell 

neal


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-10 Thread Ding Tianhong


On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck 
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong 
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck 
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple 

[PATCH net] tcp: avoid fragmenting peculiar skbs in SACK

2017-05-10 Thread Yuchung Cheng
This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.

The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.

Fixes: adb92db857ee ("tcp: Make SACK code to split only at mss boundaries")
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_input.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5a3ad09e2786..06e2dbc2b4a2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1179,13 +1179,14 @@ static int tcp_match_skb_to_sack(struct sock *sk, 
struct sk_buff *skb,
 */
if (pkt_len > mss) {
unsigned int new_len = (pkt_len / mss) * mss;
-   if (!in_sack && new_len < pkt_len) {
+   if (!in_sack && new_len < pkt_len)
new_len += mss;
-   if (new_len >= skb->len)
-   return 0;
-   }
pkt_len = new_len;
}
+
+   if (pkt_len >= skb->len && !in_sack)
+   return 0;
+
err = tcp_fragment(sk, skb, pkt_len, mss, GFP_ATOMIC);
if (err < 0)
return err;
-- 
2.13.0.rc2.291.g57267f2277-goog



[PATCH net] bpf, arm64: fix faulty emission of map access in tail calls

2017-05-10 Thread Daniel Borkmann
Shubham was recently asking on netdev why in arm64 JIT we don't multiply
the index for accessing the tail call map by 8. That led me into testing
out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
dereference on the tail call.

The buggy access is at:

  prog = array->ptrs[index];
  if (prog == NULL)
  goto out;

  [...]
  0060:  d2800e0a  mov x10, #0x70 // #112
  0064:  f86a682a  ldr x10, [x1,x10]
  0068:  f862694b  ldr x11, [x10,x2]
  006c:  b4ab  cbz x11, 0x0080
  [...]

The code triggering the crash is f862694b. x1 at the time contains the
address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
above we load the pointer to the program at map slot 0 into x10. x10
can then be NULL if the slot is not occupied, which we later on try to
access with a user given offset in x2 that is the map index.

Fix this by emitting the following instead:

  [...]
  0060:  d2800e0a  mov x10, #0x70 // #112
  0064:  8b0a002a  add x10, x1, x10
  0068:  d37df04b  lsl x11, x2, #3
  006c:  f86b694b  ldr x11, [x10,x11]
  0070:  b4ab  cbz x11, 0x0084
  [...]

This basically adds the offset to ptrs to the base address of the bpf
array we got and we later on access the map with an index * 8 offset
relative to that. The tail call map itself is basically one large area
with meta data at the head followed by the array of prog pointers.
This makes tail calls working again, tested on Cavium ThunderX ARMv8.

Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper")
Reported-by: Shubham Bansal 
Signed-off-by: Daniel Borkmann 
---
 arch/arm64/net/bpf_jit_comp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index c6e5358..71f9305 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -253,8 +253,9 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 */
off = offsetof(struct bpf_array, ptrs);
emit_a64_mov_i64(tmp, off, ctx);
-   emit(A64_LDR64(tmp, r2, tmp), ctx);
-   emit(A64_LDR64(prg, tmp, r3), ctx);
+   emit(A64_ADD(1, tmp, r2, tmp), ctx);
+   emit(A64_LSL(1, prg, r3, 3), ctx);
+   emit(A64_LDR64(prg, tmp, prg), ctx);
emit(A64_CBZ(1, prg, jmp_offset), ctx);
 
/* goto *(prog->bpf_func + prologue_size); */
-- 
1.9.3



Re: [PATCH] libertas: Avoid reading past end of buffer

2017-05-10 Thread Joe Perches
On Wed, 2017-05-10 at 12:24 -0700, Kees Cook wrote:
> Using memcpy() from a string that is shorter than the length copied means
> the destination buffer is being filled with arbitrary data from the kernel
> rodata segment. 

another bit of trivia:

> diff --git a/drivers/net/wireless/marvell/libertas/mesh.c 
> b/drivers/net/wireless/marvell/libertas/mesh.c
[]
> @@ -1170,17 +1170,11 @@ int lbs_mesh_ethtool_get_sset_count(struct net_device 
> *dev, int sset)
[]
> + memcpy(s, *mesh_stat_strings, sizeof(mesh_stat_strings));

That * isn't necessary.


Re: [PATCH] libertas: Avoid reading past end of buffer

2017-05-10 Thread Joe Perches
On Wed, 2017-05-10 at 12:24 -0700, Kees Cook wrote:
> Using memcpy() from a string that is shorter than the length copied means
[]
> diff --git a/drivers/net/wireless/marvell/libertas/mesh.c 
> b/drivers/net/wireless/marvell/libertas/mesh.c
[]
> @@ -1170,17 +1170,11 @@ int lbs_mesh_ethtool_get_sset_count(struct net_device 
> *dev, int sset)
>  void lbs_mesh_ethtool_get_strings(struct net_device *dev,
>   uint32_t stringset, uint8_t *s)
>  {
> - int i;
> -
>   lbs_deb_enter(LBS_DEB_ETHTOOL);
>  
>   switch (stringset) {
>   case ETH_SS_STATS:
> - for (i = 0; i < MESH_STATS_NUM; i++) {
> - memcpy(s + i * ETH_GSTRING_LEN,
> - mesh_stat_strings[i],
> - ETH_GSTRING_LEN);
> - }
> + memcpy(s, *mesh_stat_strings, sizeof(mesh_stat_strings));
>   break;
>   }
>   lbs_deb_enter(LBS_DEB_ETHTOOL);

unrelated trivia:

lbs_deb_enter is used incorrectly here at
function exit as both enter and leave calls.

That type of copy/paste defect may be common.

$ git grep -w lbs_deb_enter | wc -l
148
$ git grep -w lbs_deb_leave | wc -l
71

One would expect these numbers to be the same.

Another option would be to delete all these
calls as ftrace function tracing works well.




Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once

2017-05-10 Thread Daniel Borkmann

On 05/11/2017 12:46 AM, Jakub Kicinski wrote:

On Thu, 11 May 2017 00:24:56 +0200, Daniel Borkmann wrote:

I understand the counter argument that from user space perspective it
would make things slightly more complicated because there would be two
conditions in which driver hook is used:
   1) DRV_MODE set on dump;
   2) flags attribute not present (old kernel).

I'm concerned about number 2).  We can't simply depend on SKB_MODE
not being set because we may add more *_MODE flags in the future.  So
doing:

if (flags & SKB_MODE)
printf("skb-mode");
else
printf("drv-mode");

is not correct.  The flags attribute must not be present at all (think
HW_MODE flag).  But going further there can also be non-MODE flags,
like, say.. NEVER_TX, and then there may be flags present in dump,
and if SKB_MODE isn't be set, the mode could be some new MODE user space
doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no
way to tell :S


Yep, I see your point. Additionally, if we use XDP_FLAGS_* again for
dumping we're wasting bit space for flags we would never dump back
such as mentioned XDP_FLAGS_UPDATE_IF_NOEXIST (or any other future
flags that are only relevant for loading, but never for dumping).
Given dumping IFLA_XDP_FLAGS was added due to XDP_FLAGS_SKB_MODE,
we can still change this, since it's not too late.

How about the following proposal: IFLA_XDP_ATTACHED we have as-is
(need to keep that anyway), if that is true, it means "something
is attached at XDP layer". Then, we add a new attribute IFLA_XDP_MODE
(enum as type), which can contain XDP_DRV_MODE (0), XDP_SKB_MODE (1).
I don't think there's a strict requirement to really dump IFLA_XDP_FLAGS
back, separating both attrs would avoid this hassle of what current
or future load flag fits for dump as well and which not. Wdyt?


I really like the idea of not reusing IFLA_XDP_FLAGS for dumps!  New
enum sounds good, but perhaps reusing IFLA_XDP_ATTACHED isn't 100%
off-limits either?  4.11 would report (0) - driver supports XDP but
nothing is attached, (1) - something attached at the driver, could we
make (2) mean - something attached at generic XDP?  Would that be
considered breaking the ABI, are we bound to boolean semantics?


I like the idea, it would also render IFLA_XDP_ATTACHED not useless
or redundant then. Older binaries check !rta_getattr_u8(tb[IFLA_XDP_ATTACHED])
whether something is attached or not at XDP layer. So they won't know
IFLA_XDP_FLAGS attr either to make a more fine-grained distinction about
what mode. That seems actually cleaner to me, will give it a try.

Thanks,
Daniel


Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once

2017-05-10 Thread Jakub Kicinski
On Thu, 11 May 2017 00:24:56 +0200, Daniel Borkmann wrote:
> > I understand the counter argument that from user space perspective it
> > would make things slightly more complicated because there would be two
> > conditions in which driver hook is used:
> >   1) DRV_MODE set on dump;
> >   2) flags attribute not present (old kernel).
> >
> > I'm concerned about number 2).  We can't simply depend on SKB_MODE
> > not being set because we may add more *_MODE flags in the future.  So
> > doing:
> >
> > if (flags & SKB_MODE)
> > printf("skb-mode");
> > else
> > printf("drv-mode");
> >
> > is not correct.  The flags attribute must not be present at all (think
> > HW_MODE flag).  But going further there can also be non-MODE flags,
> > like, say.. NEVER_TX, and then there may be flags present in dump,
> > and if SKB_MODE isn't be set, the mode could be some new MODE user space
> > doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no
> > way to tell :S  
> 
> Yep, I see your point. Additionally, if we use XDP_FLAGS_* again for
> dumping we're wasting bit space for flags we would never dump back
> such as mentioned XDP_FLAGS_UPDATE_IF_NOEXIST (or any other future
> flags that are only relevant for loading, but never for dumping).
> Given dumping IFLA_XDP_FLAGS was added due to XDP_FLAGS_SKB_MODE,
> we can still change this, since it's not too late.
> 
> How about the following proposal: IFLA_XDP_ATTACHED we have as-is
> (need to keep that anyway), if that is true, it means "something
> is attached at XDP layer". Then, we add a new attribute IFLA_XDP_MODE
> (enum as type), which can contain XDP_DRV_MODE (0), XDP_SKB_MODE (1).
> I don't think there's a strict requirement to really dump IFLA_XDP_FLAGS
> back, separating both attrs would avoid this hassle of what current
> or future load flag fits for dump as well and which not. Wdyt?

I really like the idea of not reusing IFLA_XDP_FLAGS for dumps!  New
enum sounds good, but perhaps reusing IFLA_XDP_ATTACHED isn't 100%
off-limits either?  4.11 would report (0) - driver supports XDP but
nothing is attached, (1) - something attached at the driver, could we
make (2) mean - something attached at generic XDP?  Would that be
considered breaking the ABI, are we bound to boolean semantics?


Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once

2017-05-10 Thread Daniel Borkmann

On 05/10/2017 11:07 PM, Jakub Kicinski wrote:

On Wed, 10 May 2017 11:36:22 +0200, Daniel Borkmann wrote:

On 05/10/2017 05:18 AM, Jakub Kicinski wrote:

On Wed, 10 May 2017 03:31:31 +0200, Daniel Borkmann wrote:

[...]

xdp = nla_nest_start(skb, IFLA_XDP);
if (!xdp)
return -EMSGSIZE;
+
if (rcu_access_pointer(dev->xdp_prog)) {
xdp_flags = XDP_FLAGS_SKB_MODE;
val = 1;
-   } else if (dev->netdev_ops->ndo_xdp) {
-   struct netdev_xdp xdp_op = {};
-
-   xdp_op.command = XDP_QUERY_PROG;
-   err = dev->netdev_ops->ndo_xdp(dev, _op);
-   if (err)
-   goto err_cancel;
-   val = xdp_op.prog_attached;
+   } else {
+   val = dev_xdp_attached(dev);
}


Would it make sense to set xdp_flags to XDP_FLAGS_DRV_MODE here to keep
things symmetrical?  I know you are just preserving existing behaviour
but it may seem slightly arbitrary to a new comer to report one of the
very similarly named flags in the dump but not the other.


I thought about it, it's kind of redundant information since
IFLA_XDP_ATTACHED attribute w/o IFLA_XDP_FLAGS attribute today
says that it's native already. It might look strange if we add
also XDP_FLAGS_DRV_MODE there, since it doesn't give anything
new. I rather see it similar to XDP_FLAGS_UPDATE_IF_NOEXIST flag
that is for updating fd only, but I don't really have a strong
opinion on this though. I could add it to the respin if preferred.


XDP_FLAGS_UPDATE_IF_NOEXIST is indeed the precedent which makes things
a bit murky.  There are no reasonably useful semantics for IF_NOEXIST
on dump :(  Note that meaning of SKB_MODE flag shifts slightly between
set and dump IIUC.  At set time it means:
  "force installation at the generic hook",
at dump time it means:
  "installed at generic hook - regardless of whether the flag was set at
   installation time",

So I would argue that DRV_MODE flag is closer to SKB_MODE not only by
name but also by semantics, and it would be cool if we could keep the
semantics close on dump as well as set.


Right.


I understand the counter argument that from user space perspective it
would make things slightly more complicated because there would be two
conditions in which driver hook is used:
  1) DRV_MODE set on dump;
  2) flags attribute not present (old kernel).

I'm concerned about number 2).  We can't simply depend on SKB_MODE
not being set because we may add more *_MODE flags in the future.  So
doing:

if (flags & SKB_MODE)
printf("skb-mode");
else
printf("drv-mode");

is not correct.  The flags attribute must not be present at all (think
HW_MODE flag).  But going further there can also be non-MODE flags,
like, say.. NEVER_TX, and then there may be flags present in dump,
and if SKB_MODE isn't be set, the mode could be some new MODE user space
doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no
way to tell :S


Yep, I see your point. Additionally, if we use XDP_FLAGS_* again for
dumping we're wasting bit space for flags we would never dump back
such as mentioned XDP_FLAGS_UPDATE_IF_NOEXIST (or any other future
flags that are only relevant for loading, but never for dumping).
Given dumping IFLA_XDP_FLAGS was added due to XDP_FLAGS_SKB_MODE,
we can still change this, since it's not too late.

How about the following proposal: IFLA_XDP_ATTACHED we have as-is
(need to keep that anyway), if that is true, it means "something
is attached at XDP layer". Then, we add a new attribute IFLA_XDP_MODE
(enum as type), which can contain XDP_DRV_MODE (0), XDP_SKB_MODE (1).
I don't think there's a strict requirement to really dump IFLA_XDP_FLAGS
back, separating both attrs would avoid this hassle of what current
or future load flag fits for dump as well and which not. Wdyt?

Thanks,
Daniel


Re: Requirements for a shutdown function?

2017-05-10 Thread Florian Fainelli
On 05/10/2017 03:11 PM, Timur Tabi wrote:
> On 05/10/2017 04:47 PM, Florian Fainelli wrote:
>> AFAIR kexec takes care of shutting down network devices explicitly
>> (unless instructed otherwise with -x/--no-ifdown) so this may be where
>> this is coming from.
>>
>> Reading through drivers/base/core.c it does not appear that ->remove()
>> is called and then ->shutdown() gets called, only ->shutdown() gets
>> called from device_shutdown() called from kernel/reboot.c. It seems to
>> me like if you want to be on the safe side you would want to implement a
>> shutdown function that is identical to what your remove function does.
> 
> I finally found a testcase where the shutdown function is useful.  If you do
> a "reboot -f", it will call shutdown but not close.

Correct yes. Sorry, I did not recall which one of kexec or reboot would
call it, but both would actually now that I looked at what happens on
one of my systems again.
-- 
Florian


Re: Requirements for a shutdown function?

2017-05-10 Thread Timur Tabi
On 05/10/2017 04:47 PM, Florian Fainelli wrote:
> AFAIR kexec takes care of shutting down network devices explicitly
> (unless instructed otherwise with -x/--no-ifdown) so this may be where
> this is coming from.
> 
> Reading through drivers/base/core.c it does not appear that ->remove()
> is called and then ->shutdown() gets called, only ->shutdown() gets
> called from device_shutdown() called from kernel/reboot.c. It seems to
> me like if you want to be on the safe side you would want to implement a
> shutdown function that is identical to what your remove function does.

I finally found a testcase where the shutdown function is useful.  If you do
a "reboot -f", it will call shutdown but not close.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCH v2 net-next 03/12] drivers: net: xgene: Use rgmii mdio mac access

2017-05-10 Thread Florian Fainelli
On 05/10/2017 01:45 PM, Iyappan Subramanian wrote:
> From: Quan Nguyen 
> 
> This patch switches to use rgmii mdio mac access routines if available,
> as they share the same HW.
> 
> Signed-off-by: Quan Nguyen 
> Signed-off-by: Iyappan Subramanian 
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> index 2050c58..47c5b75 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> @@ -277,6 +277,13 @@ void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, 
> u32 wr_addr, u32 wr_data)
>   u8 wait = 10;
>   u32 done;
>  
> + if (pdata->mdio_driver && ndev->phydev &&
> + pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {

To be on the safe side you should check all 4 values of
PHY_INTERFACE_MODE_RGMII, not just this one.
-- 
Florian


Re: Requirements for a shutdown function?

2017-05-10 Thread Florian Fainelli
On 05/10/2017 01:17 PM, Timur Tabi wrote:
> On 05/09/2017 02:06 PM, Florian Fainelli wrote:
>> On 05/09/2017 11:51 AM, Timur Tabi wrote:
> 
>>> Is it possible that the network stack detects a kexec and automatically
>>> stops all network devices?
>>
>> No. why would it? However the device driver model does call into your
>> driver's remove function and that one does a right job already because
>> it does an network device unregister, and so on.
> 
> I ran some more tests.  When I launch kexec, the driver's
> net_device_ops.ndo_stop function is called, which already stops the interface.
> 
> So it looks to me as if the network stack does close the interface during a
> kexec.  With the interface closed, there's no point in having a shutdown
> function, is there?

AFAIR kexec takes care of shutting down network devices explicitly
(unless instructed otherwise with -x/--no-ifdown) so this may be where
this is coming from.

Reading through drivers/base/core.c it does not appear that ->remove()
is called and then ->shutdown() gets called, only ->shutdown() gets
called from device_shutdown() called from kernel/reboot.c. It seems to
me like if you want to be on the safe side you would want to implement a
shutdown function that is identical to what your remove function does.

> 
>>> My in-house driver stops the RX and TX queues.  I'm guessing that's good
>>> enough, but I don't have a failing test case to prove it.
>>>
>>
>> That's probably good enough, yes.
> 
> Except that it turns out that the queues are already stopped by then because
> the emac_close() function has already been called.

-- 
Florian


[PATCH net-next] ipv6: Implement limits on hop by hop and destination options

2017-05-10 Thread Tom Herbert
RFC 2460 (IPv6) defines hop by hop options and destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).

By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.

  25.38%  [kernel][k] __fib6_clean_all
  21.63%  [kernel][k] ip6_parse_tlv
   4.21%  [kernel][k] __local_bh_enable_ip
   2.18%  [kernel][k] ip6_pol_route.isra.39
   1.98%  [kernel][k] fib6_walk_continue
   1.88%  [kernel][k] _raw_write_lock_bh
   1.65%  [kernel][k] dst_release

This patches adds configurable limits to destination and hop by hop
options. There are three limits that may be set:
  - Limit the number of non-padding TLVs that may be in an extension header
  - Limit the length of a hop by hop or destination options extension header
  - Disallow unknown options

The limits are set in corresponding sysctls:

  ipv6.sysctl.max_dst_opts_cnt
  ipv6.sysctl.max_hbh_opts_cnt
  ipv6.sysctl.max_dst_opts_len
  ipv6.sysctl.max_hbh_opts_len

If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.

If a limit is exceeded when processing an extension header the packet is
dropped.

Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).

Tested: I've only complied this code, working on getting a test
environment set up which is why RFC. If anyone has resources and time
to do some testing or development, let me know!

Tested:

Martin Lau was kind enough to test this on real HW. He verified
that processing IPv6 TLVs CPU intensive and could very well be a
DDOS, and he verified that the patch drops packets that exceed the
limits to avoid the issue:

I tested out 1 thread (i.e. one raw_udp process).

I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.
---
 Documentation/networking/ip-sysctl.txt | 22 +
 include/net/ipv6.h | 33 +
 include/net/netns/ipv6.h   |  4 
 net/ipv6/af_inet6.c|  4 
 net/ipv6/exthdrs.c | 44 ++
 net/ipv6/sysctl_net_ipv6.c | 32 +
 6 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 974ab47..476a5c5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1379,6 +1379,28 @@ mld_qrv - INTEGER
Default: 2 (as specified by RFC3810 9.1)
Minimum: 1 (as specified by RFC6636 4.5)
 
+max_dst_opts_cnt - INTEGER
+   Maximum number of non-padding TLVs allowed in a destination
+   options extension header. If this value is less than zero
+   then unknown options are disallowed and the number of known
+   TLVs allowed are the absolute value of this numer.
+
+   Default: 8
+
+max_hbh_opts_cnt - INTEGER
+   Maximum number of non-padding TLVs allowed in a hop by hop
+   options extension header. If this value is less than zero
+   then unknown options are disallowed and the number of known
+   TLVs allowed are the absolute value of this number.
+
+max dst_opts_len - INTEGER
+   Maximum length allowed for a destination options extension
+   header.
+
+max hbh_opts_len - INTEGER
+   Maximum length allowed for a hop by hop options extension
+   header.
+
 IPv6 Fragmentation:
 
 ip6frag_high_thresh - INTEGER
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index dbf0abb..9f724ae 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -50,6 +50,39 @@
 #define IPV6_DEFAULT_HOPLIMIT   64
 #define IPV6_DEFAULT_MCASTHOPS 1
 
+/* Limits on hop 

Re: Hello

2017-05-10 Thread Rechel Diarra
-- 
Hello, I have something important to discuss with you as soon as you reply back.
Regards.
Miss.Rechel


Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once

2017-05-10 Thread Jakub Kicinski
On Wed, 10 May 2017 11:36:22 +0200, Daniel Borkmann wrote:
> On 05/10/2017 05:18 AM, Jakub Kicinski wrote:
> > On Wed, 10 May 2017 03:31:31 +0200, Daniel Borkmann wrote:  
> >> @@ -6851,6 +6851,32 @@ int dev_change_proto_down(struct net_device *dev, 
> >> bool proto_down)
> >>   }
> >>   EXPORT_SYMBOL(dev_change_proto_down);
> >>
> >> +bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op)  
> >
> > Out of curiosity - the leading underscores refer to caller having to
> > hold rtnl?  I assume they are not needed in the function below because
> > it's static?  
> 
> I think I don't quite follow the last question, but it probably makes
> sense to add an ASSERT_RTNL() into dev_xdp_attached() inline helper to
> make it clearly visible to callers of this api.

Sorry, I missed you have a dev_xdp_attached() defined in the header,
hence the confusion.

> >> +{
> >> +  struct netdev_xdp xdp;
> >> +
> >> +  memset(, 0, sizeof(xdp));
> >> +  xdp.command = XDP_QUERY_PROG;  
> >
> > Probably personal preference, but seems like designated struct
> > initializer would do quite nicely here and save the memset :)  
> 
> I had that initially, but I recalled that gcc < 4.6 does not handle this
> style for the initialization of anonymous struct/union properly (e.g.,
> we fixed that in iproute2 as well). Andrew Morton still uses gcc 4.4.4
> and occasionally sends kernel fixes, so we might end up like this anyway.

Ah, good to know!

> >> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> >> index dda9f16..99320f0 100644
> >> --- a/net/core/rtnetlink.c
> >> +++ b/net/core/rtnetlink.c
> >> @@ -1251,24 +1251,20 @@ static int rtnl_xdp_fill(struct sk_buff *skb, 
> >> struct net_device *dev)
> >>   {
> >>struct nlattr *xdp;
> >>u32 xdp_flags = 0;
> >> -  u8 val = 0;
> >>int err;
> >> +  u8 val;
> >>
> >>xdp = nla_nest_start(skb, IFLA_XDP);
> >>if (!xdp)
> >>return -EMSGSIZE;
> >> +
> >>if (rcu_access_pointer(dev->xdp_prog)) {
> >>xdp_flags = XDP_FLAGS_SKB_MODE;
> >>val = 1;
> >> -  } else if (dev->netdev_ops->ndo_xdp) {
> >> -  struct netdev_xdp xdp_op = {};
> >> -
> >> -  xdp_op.command = XDP_QUERY_PROG;
> >> -  err = dev->netdev_ops->ndo_xdp(dev, _op);
> >> -  if (err)
> >> -  goto err_cancel;
> >> -  val = xdp_op.prog_attached;
> >> +  } else {
> >> +  val = dev_xdp_attached(dev);
> >>}  
> >
> > Would it make sense to set xdp_flags to XDP_FLAGS_DRV_MODE here to keep
> > things symmetrical?  I know you are just preserving existing behaviour
> > but it may seem slightly arbitrary to a new comer to report one of the
> > very similarly named flags in the dump but not the other.  
> 
> I thought about it, it's kind of redundant information since
> IFLA_XDP_ATTACHED attribute w/o IFLA_XDP_FLAGS attribute today
> says that it's native already. It might look strange if we add
> also XDP_FLAGS_DRV_MODE there, since it doesn't give anything
> new. I rather see it similar to XDP_FLAGS_UPDATE_IF_NOEXIST flag
> that is for updating fd only, but I don't really have a strong
> opinion on this though. I could add it to the respin if preferred.

XDP_FLAGS_UPDATE_IF_NOEXIST is indeed the precedent which makes things
a bit murky.  There are no reasonably useful semantics for IF_NOEXIST
on dump :(  Note that meaning of SKB_MODE flag shifts slightly between
set and dump IIUC.  At set time it means:
 "force installation at the generic hook",
at dump time it means:
 "installed at generic hook - regardless of whether the flag was set at
  installation time",

So I would argue that DRV_MODE flag is closer to SKB_MODE not only by
name but also by semantics, and it would be cool if we could keep the
semantics close on dump as well as set.

I understand the counter argument that from user space perspective it
would make things slightly more complicated because there would be two
conditions in which driver hook is used:
 1) DRV_MODE set on dump;
 2) flags attribute not present (old kernel).

I'm concerned about number 2).  We can't simply depend on SKB_MODE
not being set because we may add more *_MODE flags in the future.  So
doing:

if (flags & SKB_MODE)
printf("skb-mode");
else
printf("drv-mode");

is not correct.  The flags attribute must not be present at all (think
HW_MODE flag).  But going further there can also be non-MODE flags,
like, say.. NEVER_TX, and then there may be flags present in dump,
and if SKB_MODE isn't be set, the mode could be some new MODE user space
doesn't understand, or it could be DRV_MODE+a new non-MODE flag... no
way to tell :S


[PATCH v2 net-next 07/12] drivers: net: xgene: Extend ethtool statistics

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds extended ethtool statistics support.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 89 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 29 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 51 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |  8 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  4 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h  |  1 +
 6 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 217cde8..a7eed3b 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -23,9 +23,17 @@
 struct xgene_gstrings_stats {
char name[ETH_GSTRING_LEN];
int offset;
+   u32 addr;
+   u32 mask;
 };
 
 #define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) }
+#define XGENE_EXTD_STAT(s, a, m)   \
+   {   \
+   .name = #s, \
+   .addr = a ## _ADDR, \
+   .mask = m   \
+   }
 
 static const struct xgene_gstrings_stats gstrings_stats[] = {
XGENE_STAT(rx_packets),
@@ -40,7 +48,51 @@ struct xgene_gstrings_stats {
XGENE_STAT(rx_fifo_errors)
 };
 
+static const struct xgene_gstrings_stats gstrings_extd_stats[] = {
+   XGENE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64, 31),
+   XGENE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127, 31),
+   XGENE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255, 31),
+   XGENE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511, 31),
+   XGENE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K, 31),
+   XGENE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX, 31),
+   XGENE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV, 31),
+   XGENE_EXTD_STAT(rx_fcs_error_cntr, RFCS, 16),
+   XGENE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA, 31),
+   XGENE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA, 31),
+   XGENE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF, 16),
+   XGENE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF, 16),
+   XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16),
+   XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16),
+   XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16),
+   XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16),
+   XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16),
+   XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16),
+   XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16),
+   XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16),
+   XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16),
+   XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16),
+   XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31),
+   XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31),
+   XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16),
+   XGENE_EXTD_STAT(tx_defer_pkt_cntr, TDFR, 31),
+   XGENE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF, 31),
+   XGENE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL, 31),
+   XGENE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL, 31),
+   XGENE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL, 31),
+   XGENE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL, 31),
+   XGENE_EXTD_STAT(tx_total_col_cntr, TNCL, 31),
+   XGENE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH, 16),
+   XGENE_EXTD_STAT(tx_drop_frame_cntr, TDRP, 16),
+   XGENE_EXTD_STAT(tx_jabber_frame_cntr, TJBR, 12),
+   XGENE_EXTD_STAT(tx_fcs_error_cntr, TFCS, 12),
+   XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12),
+   XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12),
+   XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12),
+   XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12)
+};
+
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
+#define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -142,6 +194,11 @@ static void xgene_get_strings(struct net_device *ndev, u32 
stringset, u8 *data)
memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
+
+   for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) {
+   memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN);
+   p += ETH_GSTRING_LEN;
+   }
 }
 
 static int xgene_get_sset_count(struct net_device *ndev, int sset)
@@ -149,19 +206,49 @@ static int xgene_get_sset_count(struct net_device *ndev, 
int sset)
if (sset != ETH_SS_STATS)
return -EINVAL;
 
-   return XGENE_STATS_LEN;
+   return XGENE_STATS_LEN + XGENE_EXTD_STATS_LEN;
+}
+
+static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata)
+{
+   u32 tmp;
+   int i;
+
+   for (i 

[PATCH v2 net-next 06/12] drivers: net: xgene: Remove unused macros

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch cleans up unused macros to improve readability.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 3ce349f..ed51005 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -217,12 +217,6 @@ enum xgene_enet_rm {
 #define FULL_DUPLEX2   BIT(0)
 #define PAD_CRCBIT(2)
 #define LENGTH_CHK BIT(4)
-#define SCAN_AUTO_INCR BIT(5)
-#define TBYT_ADDR  0x38
-#define TPKT_ADDR  0x39
-#define TDRP_ADDR  0x45
-#define TFCS_ADDR  0x47
-#define TUND_ADDR  0x4a
 
 #define TSO_IPPROTO_TCP1
 
-- 
1.9.1



[PATCH v2 net-next 01/12] drivers: net: xgene: Protect indirect MAC access

2017-05-10 Thread Iyappan Subramanian
This patch,

 - refactors mac read/write functions
 - adds lock to protect indirect mac access

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Quan Nguyen 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 119 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|   3 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  |   1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |   1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c |  71 -
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c |  44 ++--
 6 files changed, 62 insertions(+), 177 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 2a835e0..2050c58 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -270,42 +270,32 @@ static void xgene_enet_wr_mcx_csr(struct xgene_enet_pdata 
*pdata,
iowrite32(val, addr);
 }
 
-static bool xgene_enet_wr_indirect(void __iomem *addr, void __iomem *wr,
-  void __iomem *cmd, void __iomem *cmd_done,
-  u32 wr_addr, u32 wr_data)
+void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, u32 wr_addr, u32 
wr_data)
 {
-   u32 done;
+   void __iomem *addr, *wr, *cmd, *cmd_done;
+   struct net_device *ndev = pdata->ndev;
u8 wait = 10;
+   u32 done;
 
+   addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET;
+   wr = pdata->mcx_mac_addr + MAC_WRITE_REG_OFFSET;
+   cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
+   cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
+
+   spin_lock(>mac_lock);
iowrite32(wr_addr, addr);
iowrite32(wr_data, wr);
iowrite32(XGENE_ENET_WR_CMD, cmd);
 
-   /* wait for write command to complete */
while (!(done = ioread32(cmd_done)) && wait--)
udelay(1);
 
if (!done)
-   return false;
+   netdev_err(ndev, "mac write failed, addr: %04x data: %08x\n",
+  wr_addr, wr_data);
 
iowrite32(0, cmd);
-
-   return true;
-}
-
-static void xgene_enet_wr_mcx_mac(struct xgene_enet_pdata *pdata,
- u32 wr_addr, u32 wr_data)
-{
-   void __iomem *addr, *wr, *cmd, *cmd_done;
-
-   addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET;
-   wr = pdata->mcx_mac_addr + MAC_WRITE_REG_OFFSET;
-   cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
-   cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
-
-   if (!xgene_enet_wr_indirect(addr, wr, cmd, cmd_done, wr_addr, wr_data))
-   netdev_err(pdata->ndev, "MCX mac write failed, addr: %04x\n",
-  wr_addr);
+   spin_unlock(>mac_lock);
 }
 
 static void xgene_enet_rd_csr(struct xgene_enet_pdata *pdata,
@@ -332,42 +322,33 @@ static void xgene_enet_rd_mcx_csr(struct xgene_enet_pdata 
*pdata,
*val = ioread32(addr);
 }
 
-static bool xgene_enet_rd_indirect(void __iomem *addr, void __iomem *rd,
-  void __iomem *cmd, void __iomem *cmd_done,
-  u32 rd_addr, u32 *rd_data)
+u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 rd_addr)
 {
-   u32 done;
+   void __iomem *addr, *rd, *cmd, *cmd_done;
+   u32 done, rd_data;
u8 wait = 10;
 
+   addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET;
+   rd = pdata->mcx_mac_addr + MAC_READ_REG_OFFSET;
+   cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
+   cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
+
+   spin_lock(>mac_lock);
iowrite32(rd_addr, addr);
iowrite32(XGENE_ENET_RD_CMD, cmd);
 
-   /* wait for read command to complete */
while (!(done = ioread32(cmd_done)) && wait--)
udelay(1);
 
if (!done)
-   return false;
+   netdev_err(pdata->ndev, "mac read failed, addr: %04x\n",
+  rd_addr);
 
-   *rd_data = ioread32(rd);
+   rd_data = ioread32(rd);
iowrite32(0, cmd);
+   spin_unlock(>mac_lock);
 
-   return true;
-}
-
-static void xgene_enet_rd_mcx_mac(struct xgene_enet_pdata *pdata,
- u32 rd_addr, u32 *rd_data)
-{
-   void __iomem *addr, *rd, *cmd, *cmd_done;
-
-   addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET;
-   rd = pdata->mcx_mac_addr + MAC_READ_REG_OFFSET;
-   cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
-   cmd_done = pdata->mcx_mac_addr + MAC_COMMAND_DONE_REG_OFFSET;
-
-   if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
-   netdev_err(pdata->ndev, "MCX mac read failed, addr: %04x\n",
-  rd_addr);
+   return rd_data;
 }
 
 static void xgene_gmac_set_mac_addr(struct 

[PATCH v2 net-next 00/12] drivers: net: xgene: Add ethtool stats and bug fixes

2017-05-10 Thread Iyappan Subramanian
This patch set,

- adds ethtool extended statistics support
- addresses errata workarounds
- fixes bugs related to statistics

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Quan Nguyen 
---
v2: Address review comments from v1
- Adds lock to protect mdio-xgene indirect MAC access
- Refactors xgene-enet indirect MAC read/write functions
- Uses mdio-xgene MAC access routines, if xgene-enet port
  use the same HW.
v1:
- Initial version
---

Iyappan Subramanian (3):
  drivers: net: xgene: Protect indirect MAC access
  drivers: net: xgene: Add rx_overrun/tx_underrun statistics
  drivers: net: xgene: Fix redundant prefetch buffer cleanup

Quan Nguyen (9):
  drivers: net: phy: xgene: Add lock to protect mac access
  drivers: net: xgene: Use rgmii mdio mac access
  drivers: net: xgene: Remove redundant local stats
  drivers: net: xgene: Refactor statistics error parsing code
  drivers: net: xgene: Remove unused macros
  drivers: net: xgene: Extend ethtool statistics
  drivers: net: xgene: Workaround for HW errata 10GE_4
  drivers: net: xgene: Add frame recovered statistics counter for errata
10GE_8/ENET_11
  drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15

 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 132 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 188 +++--
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h |  70 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |  59 +--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  12 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c  | 110 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c  |  77 +++--
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h  |   5 +
 drivers/net/phy/mdio-xgene.c   |  74 
 drivers/net/phy/mdio-xgene.h   |   3 +
 10 files changed, 428 insertions(+), 302 deletions(-)

-- 
1.9.1



[PATCH v2 net-next 12/12] drivers: net: xgene: Fix redundant prefetch buffer cleanup

2017-05-10 Thread Iyappan Subramanian
Prefetch buffer cleanup code was called twice, causing EDAC to
report errors during reboot.

[ 1130.972475] xgene-edac 7880.edac: IOB bridge agent (BA) transaction
error
[ 1130.979584] xgene-edac 7880.edac: IOB BA write response error
[ 1130.985648] xgene-edac 7880.edac: IOB BA write access at 0x00.
()
[ 1130.993612] xgene-edac 7880.edac: IOB BA requestor ID 0x2400
[ 1131.000242] xgene-edac 7880.edac: IOB bridge agent (BA) transaction
error
...

This patch fixes the errors by,

- removing the redundant prefetch buffer cleanup from port_ops->shutdown()
- moving port_ops->shutdown() after delete_rings()

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 21 -
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  |  2 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 20 
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 20 
 4 files changed, 1 insertion(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 3235d0c..6ac27c7 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -763,27 +763,6 @@ static void xgene_enet_clear(struct xgene_enet_pdata 
*pdata,
 static void xgene_gport_shutdown(struct xgene_enet_pdata *pdata)
 {
struct device *dev = >pdev->dev;
-   struct xgene_enet_desc_ring *ring;
-   u32 pb;
-   int i;
-
-   pb = 0;
-   for (i = 0; i < pdata->rxq_cnt; i++) {
-   ring = pdata->rx_ring[i]->buf_pool;
-   pb |= BIT(xgene_enet_get_fpsel(ring->id));
-   ring = pdata->rx_ring[i]->page_pool;
-   if (ring)
-   pb |= BIT(xgene_enet_get_fpsel(ring->id));
-
-   }
-   xgene_enet_wr_ring_if(pdata, ENET_CFGSSQMIFPRESET_ADDR, pb);
-
-   pb = 0;
-   for (i = 0; i < pdata->txq_cnt; i++) {
-   ring = pdata->tx_ring[i];
-   pb |= BIT(xgene_enet_ring_bufnum(ring->id));
-   }
-   xgene_enet_wr_ring_if(pdata, ENET_CFGSSQMIWQRESET_ADDR, pb);
 
if (dev->of_node) {
if (!IS_ERR(pdata->clk))
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 01e389d..21cd4ef 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -2159,8 +2159,8 @@ static int xgene_enet_remove(struct platform_device *pdev)
xgene_enet_mdio_remove(pdata);
 
unregister_netdev(ndev);
-   pdata->port_ops->shutdown(pdata);
xgene_enet_delete_desc_rings(pdata);
+   pdata->port_ops->shutdown(pdata);
free_netdev(ndev);
 
return 0;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
index 31df8f3..b1a83fd 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
@@ -534,26 +534,6 @@ static void xgene_enet_clear(struct xgene_enet_pdata 
*pdata,
 static void xgene_enet_shutdown(struct xgene_enet_pdata *p)
 {
struct device *dev = >pdev->dev;
-   struct xgene_enet_desc_ring *ring;
-   u32 pb;
-   int i;
-
-   pb = 0;
-   for (i = 0; i < p->rxq_cnt; i++) {
-   ring = p->rx_ring[i]->buf_pool;
-   pb |= BIT(xgene_enet_get_fpsel(ring->id));
-   ring = p->rx_ring[i]->page_pool;
-   if (ring)
-   pb |= BIT(xgene_enet_get_fpsel(ring->id));
-   }
-   xgene_enet_wr_ring_if(p, ENET_CFGSSQMIFPRESET_ADDR, pb);
-
-   pb = 0;
-   for (i = 0; i < p->txq_cnt; i++) {
-   ring = p->tx_ring[i];
-   pb |= BIT(xgene_enet_ring_bufnum(ring->id));
-   }
-   xgene_enet_wr_ring_if(p, ENET_CFGSSQMIWQRESET_ADDR, pb);
 
if (dev->of_node) {
if (!IS_ERR(p->clk))
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index eab6f1c..b7d75d0 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -446,26 +446,6 @@ static void xgene_enet_xgcle_bypass(struct 
xgene_enet_pdata *pdata,
 static void xgene_enet_shutdown(struct xgene_enet_pdata *pdata)
 {
struct device *dev = >pdev->dev;
-   struct xgene_enet_desc_ring *ring;
-   u32 pb;
-   int i;
-
-   pb = 0;
-   for (i = 0; i < pdata->rxq_cnt; i++) {
-   ring = pdata->rx_ring[i]->buf_pool;
-   pb |= BIT(xgene_enet_get_fpsel(ring->id));
-   ring = pdata->rx_ring[i]->page_pool;
-   if (ring)
-   pb |= BIT(xgene_enet_get_fpsel(ring->id));
-   }
-   xgene_enet_wr_ring_if(pdata, 

[PATCH v2 net-next 02/12] drivers: net: phy: xgene: Add lock to protect mac access

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch,

- refactors mac access routine
- adds lock to protect mac indirect access

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/phy/mdio-xgene.c | 74 ++--
 drivers/net/phy/mdio-xgene.h |  3 ++
 2 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/drivers/net/phy/mdio-xgene.c b/drivers/net/phy/mdio-xgene.c
index 3e2ac07..bfd3090 100644
--- a/drivers/net/phy/mdio-xgene.c
+++ b/drivers/net/phy/mdio-xgene.c
@@ -34,76 +34,73 @@
 
 static bool xgene_mdio_status;
 
-static u32 xgene_enet_rd_mac(void __iomem *base_addr, u32 rd_addr)
+u32 xgene_mdio_rd_mac(struct xgene_mdio_pdata *pdata, u32 rd_addr)
 {
void __iomem *addr, *rd, *cmd, *cmd_done;
u32 done, rd_data = BUSY_MASK;
u8 wait = 10;
 
-   addr = base_addr + MAC_ADDR_REG_OFFSET;
-   rd = base_addr + MAC_READ_REG_OFFSET;
-   cmd = base_addr + MAC_COMMAND_REG_OFFSET;
-   cmd_done = base_addr + MAC_COMMAND_DONE_REG_OFFSET;
+   addr = pdata->mac_csr_addr + MAC_ADDR_REG_OFFSET;
+   rd = pdata->mac_csr_addr + MAC_READ_REG_OFFSET;
+   cmd = pdata->mac_csr_addr + MAC_COMMAND_REG_OFFSET;
+   cmd_done = pdata->mac_csr_addr + MAC_COMMAND_DONE_REG_OFFSET;
 
+   spin_lock(>mac_lock);
iowrite32(rd_addr, addr);
iowrite32(XGENE_ENET_RD_CMD, cmd);
 
-   while (wait--) {
-   done = ioread32(cmd_done);
-   if (done)
-   break;
+   while (!(done = ioread32(cmd_done)) && wait--)
udelay(1);
-   }
 
-   if (!done)
-   return rd_data;
+   if (done)
+   rd_data = ioread32(rd);
 
-   rd_data = ioread32(rd);
iowrite32(0, cmd);
+   spin_unlock(>mac_lock);
 
return rd_data;
 }
+EXPORT_SYMBOL(xgene_mdio_rd_mac);
 
-static void xgene_enet_wr_mac(void __iomem *base_addr, u32 wr_addr, u32 
wr_data)
+void xgene_mdio_wr_mac(struct xgene_mdio_pdata *pdata, u32 wr_addr, u32 data)
 {
void __iomem *addr, *wr, *cmd, *cmd_done;
u8 wait = 10;
u32 done;
 
-   addr = base_addr + MAC_ADDR_REG_OFFSET;
-   wr = base_addr + MAC_WRITE_REG_OFFSET;
-   cmd = base_addr + MAC_COMMAND_REG_OFFSET;
-   cmd_done = base_addr + MAC_COMMAND_DONE_REG_OFFSET;
+   addr = pdata->mac_csr_addr + MAC_ADDR_REG_OFFSET;
+   wr = pdata->mac_csr_addr + MAC_WRITE_REG_OFFSET;
+   cmd = pdata->mac_csr_addr + MAC_COMMAND_REG_OFFSET;
+   cmd_done = pdata->mac_csr_addr + MAC_COMMAND_DONE_REG_OFFSET;
 
+   spin_lock(>mac_lock);
iowrite32(wr_addr, addr);
-   iowrite32(wr_data, wr);
+   iowrite32(data, wr);
iowrite32(XGENE_ENET_WR_CMD, cmd);
 
-   while (wait--) {
-   done = ioread32(cmd_done);
-   if (done)
-   break;
+   while (!(done = ioread32(cmd_done)) && wait--)
udelay(1);
-   }
 
if (!done)
pr_err("MCX mac write failed, addr: 0x%04x\n", wr_addr);
 
iowrite32(0, cmd);
+   spin_unlock(>mac_lock);
 }
+EXPORT_SYMBOL(xgene_mdio_wr_mac);
 
 int xgene_mdio_rgmii_read(struct mii_bus *bus, int phy_id, int reg)
 {
-   void __iomem *addr = (void __iomem *)bus->priv;
+   struct xgene_mdio_pdata *pdata = (struct xgene_mdio_pdata *)bus->priv;
u32 data, done;
u8 wait = 10;
 
data = SET_VAL(PHY_ADDR, phy_id) | SET_VAL(REG_ADDR, reg);
-   xgene_enet_wr_mac(addr, MII_MGMT_ADDRESS_ADDR, data);
-   xgene_enet_wr_mac(addr, MII_MGMT_COMMAND_ADDR, READ_CYCLE_MASK);
+   xgene_mdio_wr_mac(pdata, MII_MGMT_ADDRESS_ADDR, data);
+   xgene_mdio_wr_mac(pdata, MII_MGMT_COMMAND_ADDR, READ_CYCLE_MASK);
do {
usleep_range(5, 10);
-   done = xgene_enet_rd_mac(addr, MII_MGMT_INDICATORS_ADDR);
+   done = xgene_mdio_rd_mac(pdata, MII_MGMT_INDICATORS_ADDR);
} while ((done & BUSY_MASK) && wait--);
 
if (done & BUSY_MASK) {
@@ -111,8 +108,8 @@ int xgene_mdio_rgmii_read(struct mii_bus *bus, int phy_id, 
int reg)
return -EBUSY;
}
 
-   data = xgene_enet_rd_mac(addr, MII_MGMT_STATUS_ADDR);
-   xgene_enet_wr_mac(addr, MII_MGMT_COMMAND_ADDR, 0);
+   data = xgene_mdio_rd_mac(pdata, MII_MGMT_STATUS_ADDR);
+   xgene_mdio_wr_mac(pdata, MII_MGMT_COMMAND_ADDR, 0);
 
return data;
 }
@@ -120,17 +117,17 @@ int xgene_mdio_rgmii_read(struct mii_bus *bus, int 
phy_id, int reg)
 
 int xgene_mdio_rgmii_write(struct mii_bus *bus, int phy_id, int reg, u16 data)
 {
-   void __iomem *addr = (void __iomem *)bus->priv;
+   struct xgene_mdio_pdata *pdata = (struct xgene_mdio_pdata *)bus->priv;
u32 val, done;
u8 wait = 10;
 
val = SET_VAL(PHY_ADDR, phy_id) | SET_VAL(REG_ADDR, reg);
-   xgene_enet_wr_mac(addr, MII_MGMT_ADDRESS_ADDR, val);
+   

[PATCH v2 net-next 11/12] drivers: net: xgene: Workaround for HW errata 10GE_10/ENET_15

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds workaround for HW errata 10GE_10 and ENET_15:
"HW statistic counters value are duplicated".

- RFCS duplicates RALN counter
- RFLR duplicates RUND counter
- TFCS duplicates TFRG counter
- RALN should be intepreted as 0 in 10G mode

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 33 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   | 20 +++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  2 ++
 3 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 5c2c84a..0fdec78 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -71,6 +71,7 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR, 16),
XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16),
XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16),
+   XGENE_EXTD_STAT(rx_jabber_recov_cntr, DUMP, 0),
XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16),
XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0),
XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31),
@@ -96,9 +97,16 @@ struct xgene_gstrings_stats {
 
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
 #define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
+#define RFCS_IDX   7
+#define RALN_IDX   13
+#define RFLR_IDX   14
 #define FALSE_RFLR_IDX 15
-#define RX_OVERRUN_IDX 23
-#define TX_UNDERRUN_IDX42
+#define RUND_IDX   18
+#define FALSE_RJBR_IDX 22
+#define RX_OVERRUN_IDX 24
+#define TFCS_IDX   38
+#define TFRG_IDX   42
+#define TX_UNDERRUN_IDX43
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -218,14 +226,25 @@ static int xgene_get_sset_count(struct net_device *ndev, 
int sset)
 static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata)
 {
u32 rx_drop, tx_drop;
-   u32 tmp;
+   u32 mask, tmp;
int i;
 
for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) {
tmp = xgene_enet_rd_stat(pdata, gstrings_extd_stats[i].addr);
-   if (gstrings_extd_stats[i].mask)
-   pdata->extd_stats[i] += tmp &
-   GENMASK(gstrings_extd_stats[i].mask - 1, 0);
+   if (gstrings_extd_stats[i].mask) {
+   mask = GENMASK(gstrings_extd_stats[i].mask - 1, 0);
+   pdata->extd_stats[i] += (tmp & mask);
+   }
+   }
+
+   if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
+   /* Errata 10GE_10 - SW should intepret RALN as 0 */
+   pdata->extd_stats[RALN_IDX] = 0;
+   } else {
+   /* Errata ENET_15 - Fixes RFCS, RFLR, TFCS counter */
+   pdata->extd_stats[RFCS_IDX] -= pdata->extd_stats[RALN_IDX];
+   pdata->extd_stats[RFLR_IDX] -= pdata->extd_stats[RUND_IDX];
+   pdata->extd_stats[TFCS_IDX] -= pdata->extd_stats[TFRG_IDX];
}
 
pdata->mac_ops->get_drop_cnt(pdata, _drop, _drop);
@@ -234,6 +253,8 @@ static void xgene_get_extd_stats(struct xgene_enet_pdata 
*pdata)
 
/* Errata 10GE_8 -  Update Frame recovered from Errata 10GE_8/ENET_11 */
pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr;
+   /* Errata ENET_15 - Jabber Frame recov'ed from Errata 10GE_10/ENET_15 */
+   pdata->extd_stats[FALSE_RJBR_IDX] = pdata->vlan_rjbr;
 }
 
 int xgene_extd_stats_init(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index c2d38da..01e389d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -656,6 +656,18 @@ static void xgene_enet_free_pagepool(struct 
xgene_enet_desc_ring *buf_pool,
buf_pool->head = head;
 }
 
+/* Errata 10GE_10 and ENET_15 - Fix duplicated HW statistic counters */
+static bool xgene_enet_errata_10GE_10(struct sk_buff *skb, u32 len, u8 status)
+{
+   if (status == INGRESS_CRC &&
+   len >= (ETHER_STD_PACKET + 1) &&
+   len <= (ETHER_STD_PACKET + 4) &&
+   skb->protocol == htons(ETH_P_8021Q))
+   return true;
+
+   return false;
+}
+
 /* Errata 10GE_8 and ENET_11 - allow packet with length <=64B */
 static bool xgene_enet_errata_10GE_8(struct sk_buff *skb, u32 len, u8 status)
 {
@@ -706,14 +718,16 @@ static int xgene_enet_rx_frame(struct 
xgene_enet_desc_ring *rx_ring,
status = (GET_VAL(ELERR, le64_to_cpu(raw_desc->m0)) << LERR_LEN) |
  GET_VAL(LERR, 

[PATCH v2 net-next 08/12] drivers: net: xgene: Add rx_overrun/tx_underrun statistics

2017-05-10 Thread Iyappan Subramanian
This patch adds rx_overrun and tx_underrun ethtool statistic counters.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 16 +---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c  | 11 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h  |  8 
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h|  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c   | 14 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c   | 11 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h   |  4 
 7 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index a7eed3b..6b2a4b9 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -71,6 +71,7 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(rx_fragments_cntr, RFRG, 16),
XGENE_EXTD_STAT(rx_jabber_cntr, RJBR, 16),
XGENE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP, 16),
+   XGENE_EXTD_STAT(rx_overrun_cntr, DUMP, 0),
XGENE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA, 31),
XGENE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA, 31),
XGENE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF, 16),
@@ -88,11 +89,14 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF, 12),
XGENE_EXTD_STAT(tx_oversize_frame_cntr, TOVR, 12),
XGENE_EXTD_STAT(tx_undersize_frame_cntr, TUND, 12),
-   XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12)
+   XGENE_EXTD_STAT(tx_fragments_cntr, TFRG, 12),
+   XGENE_EXTD_STAT(tx_underrun_cntr, DUMP, 0)
 };
 
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
 #define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
+#define RX_OVERRUN_IDX 22
+#define TX_UNDERRUN_IDX41
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -211,14 +215,20 @@ static int xgene_get_sset_count(struct net_device *ndev, 
int sset)
 
 static void xgene_get_extd_stats(struct xgene_enet_pdata *pdata)
 {
+   u32 rx_drop, tx_drop;
u32 tmp;
int i;
 
for (i = 0; i < XGENE_EXTD_STATS_LEN; i++) {
tmp = xgene_enet_rd_stat(pdata, gstrings_extd_stats[i].addr);
-   pdata->extd_stats[i] += tmp &
-   GENMASK(gstrings_extd_stats[i].mask - 1, 0);
+   if (gstrings_extd_stats[i].mask)
+   pdata->extd_stats[i] += tmp &
+   GENMASK(gstrings_extd_stats[i].mask - 1, 0);
}
+
+   pdata->mac_ops->get_drop_cnt(pdata, _drop, _drop);
+   pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop;
+   pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop;
 }
 
 int xgene_extd_stats_init(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 9e89193..5278d62 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -618,6 +618,16 @@ static void xgene_gmac_init(struct xgene_enet_pdata *pdata)
xgene_enet_wr_csr(pdata, CFG_BYPASS_ADDR, RESUME_TX);
 }
 
+static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata *pdata,
+   u32 *rx, u32 *tx)
+{
+   u32 count;
+
+   xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, );
+   *rx = ICM_DROP_COUNT(count);
+   *tx = ECM_DROP_COUNT(count);
+}
+
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata)
 {
u32 val = 0x;
@@ -1027,6 +1037,7 @@ void xgene_enet_mdio_remove(struct xgene_enet_pdata 
*pdata)
.tx_enable = xgene_gmac_tx_enable,
.rx_disable = xgene_gmac_rx_disable,
.tx_disable = xgene_gmac_tx_disable,
+   .get_drop_cnt = xgene_gmac_get_drop_cnt,
.set_speed = xgene_gmac_set_speed,
.set_mac_addr = xgene_gmac_set_mac_addr,
.set_framesize = xgene_enet_set_frame_size,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index f1a4cfa..5d3e18d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -192,6 +192,10 @@ enum xgene_enet_rm {
 #define CFG_CLE_NXTFPSEL0(val) (((val) << 20) & GENMASK(23, 20))
 #define ICM_CONFIG0_REG_0_ADDR 0x0400
 #define ICM_CONFIG2_REG_0_ADDR 0x0410
+#define ECM_CONFIG0_REG_0_ADDR 0x0500
+#define ECM_CONFIG0_REG_1_ADDR 0x0504
+#define ICM_ECM_DROP_COUNT_REG0_ADDR   0x0508
+#define ICM_ECM_DROP_COUNT_REG1_ADDR   0x050c
 #define RX_DV_GATE_REG_0_ADDR  0x05fc
 #define TX_DV_GATE_EN0 BIT(2)
 #define RX_DV_GATE_EN0   

[PATCH v2 net-next 03/12] drivers: net: xgene: Use rgmii mdio mac access

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch switches to use rgmii mdio mac access routines if available,
as they share the same HW.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 2050c58..47c5b75 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -277,6 +277,13 @@ void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, u32 
wr_addr, u32 wr_data)
u8 wait = 10;
u32 done;
 
+   if (pdata->mdio_driver && ndev->phydev &&
+   pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
+   struct mii_bus *bus = ndev->phydev->mdio.bus;
+
+   return xgene_mdio_wr_mac(bus->priv, wr_addr, wr_data);
+   }
+
addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET;
wr = pdata->mcx_mac_addr + MAC_WRITE_REG_OFFSET;
cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
@@ -328,6 +335,13 @@ u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 
rd_addr)
u32 done, rd_data;
u8 wait = 10;
 
+   if (pdata->mdio_driver && pdata->ndev->phydev &&
+   pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
+   struct mii_bus *bus = pdata->ndev->phydev->mdio.bus;
+
+   return xgene_mdio_rd_mac(bus->priv, rd_addr);
+   }
+
addr = pdata->mcx_mac_addr + MAC_ADDR_REG_OFFSET;
rd = pdata->mcx_mac_addr + MAC_READ_REG_OFFSET;
cmd = pdata->mcx_mac_addr + MAC_COMMAND_REG_OFFSET;
-- 
1.9.1



[PATCH v2 net-next 10/12] drivers: net: xgene: Add frame recovered statistics counter for errata 10GE_8/ENET_11

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds statistic counter for frames recovered from HW errata
10GE_8 and ENET_11:
"HW reports Length error for valid 64 byte frames with len <46 bytes".

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 9 +++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 2 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 6b2a4b9..5c2c84a 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -64,6 +64,7 @@ struct xgene_gstrings_stats {
XGENE_EXTD_STAT(rx_unk_opcode_cntr, RXUO, 16),
XGENE_EXTD_STAT(rx_align_err_cntr, RALN, 16),
XGENE_EXTD_STAT(rx_frame_len_err_cntr, RFLR, 16),
+   XGENE_EXTD_STAT(rx_frame_len_err_recov_cntr, DUMP, 0),
XGENE_EXTD_STAT(rx_code_err_cntr, RCDE, 16),
XGENE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE, 16),
XGENE_EXTD_STAT(rx_undersize_pkt_cntr, RUND, 16),
@@ -95,8 +96,9 @@ struct xgene_gstrings_stats {
 
 #define XGENE_STATS_LENARRAY_SIZE(gstrings_stats)
 #define XGENE_EXTD_STATS_LEN   ARRAY_SIZE(gstrings_extd_stats)
-#define RX_OVERRUN_IDX 22
-#define TX_UNDERRUN_IDX41
+#define FALSE_RFLR_IDX 15
+#define RX_OVERRUN_IDX 23
+#define TX_UNDERRUN_IDX42
 
 static void xgene_get_drvinfo(struct net_device *ndev,
  struct ethtool_drvinfo *info)
@@ -229,6 +231,9 @@ static void xgene_get_extd_stats(struct xgene_enet_pdata 
*pdata)
pdata->mac_ops->get_drop_cnt(pdata, _drop, _drop);
pdata->extd_stats[RX_OVERRUN_IDX] += rx_drop;
pdata->extd_stats[TX_UNDERRUN_IDX] += tx_drop;
+
+   /* Errata 10GE_8 -  Update Frame recovered from Errata 10GE_8/ENET_11 */
+   pdata->extd_stats[FALSE_RFLR_IDX] = pdata->false_rflr;
 }
 
 int xgene_extd_stats_init(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index bd2486e..c2d38da 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -712,6 +712,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring 
*rx_ring,
xgene_enet_parse_error(rx_ring, status);
rx_ring->rx_dropped++;
goto out;
+   } else {
+   pdata->false_rflr++;
}
}
 
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index cabe54e..0f5f0b0 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -224,6 +224,7 @@ struct xgene_enet_pdata {
enum xgene_enet_rm rm;
struct xgene_enet_cle cle;
u64 *extd_stats;
+   u64 false_rflr;
spinlock_t stats_lock; /* statistics lock */
const struct xgene_mac_ops *mac_ops;
spinlock_t mac_lock; /* mac lock */
-- 
1.9.1



[PATCH v2 net-next 05/12] drivers: net: xgene: Refactor statistics error parsing code

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch fixes the tx error counters and adds more rx error counters.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   |  6 --
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h   |  2 --
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 26 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |  2 ++
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 47c5b75..02df577 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -205,30 +205,24 @@ static u32 xgene_enet_ring_len(struct 
xgene_enet_desc_ring *ring)
 }
 
 void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring,
-   struct xgene_enet_pdata *pdata,
enum xgene_enet_err_code status)
 {
switch (status) {
case INGRESS_CRC:
ring->rx_crc_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_CHECKSUM:
case INGRESS_CHECKSUM_COMPUTE:
ring->rx_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_TRUNC_FRAME:
ring->rx_frame_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_PKT_LEN:
ring->rx_length_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_PKT_UNDER:
ring->rx_frame_errors++;
-   ring->rx_dropped++;
break;
case INGRESS_FIFO_OVERRUN:
ring->rx_fifo_errors++;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 057f951..3ce349f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -380,9 +380,7 @@ static inline u16 xgene_enet_get_numslots(u16 id, u32 size)
 }
 
 void xgene_enet_parse_error(struct xgene_enet_desc_ring *ring,
-   struct xgene_enet_pdata *pdata,
enum xgene_enet_err_code status);
-
 int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata);
 void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata);
 bool xgene_ring_mgr_init(struct xgene_enet_pdata *p);
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index e4f2ef2..3f24b83 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -246,9 +246,9 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
skb_frag_t *frag;
dma_addr_t *frag_dma_addr;
u16 skb_index;
-   u8 status;
-   int i, ret = 0;
u8 mss_index;
+   u8 status;
+   int i;
 
skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
skb = cp_ring->cp_skb[skb_index];
@@ -275,19 +275,17 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
/* Checking for error */
status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0));
if (unlikely(status > 2)) {
-   xgene_enet_parse_error(cp_ring, netdev_priv(cp_ring->ndev),
-  status);
-   ret = -EIO;
+   cp_ring->tx_dropped++;
+   cp_ring->tx_errors++;
}
 
if (likely(skb)) {
dev_kfree_skb_any(skb);
} else {
netdev_err(cp_ring->ndev, "completion skb is NULL\n");
-   ret = -EIO;
}
 
-   return ret;
+   return 0;
 }
 
 static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss)
@@ -711,7 +709,8 @@ static int xgene_enet_rx_frame(struct xgene_enet_desc_ring 
*rx_ring,
if (!xgene_enet_errata_10GE_8(skb, datalen, status)) {
dev_kfree_skb_any(skb);
xgene_enet_free_pagepool(page_pool, raw_desc, exp_desc);
-   xgene_enet_parse_error(rx_ring, pdata, status);
+   xgene_enet_parse_error(rx_ring, status);
+   rx_ring->rx_dropped++;
goto out;
}
}
@@ -1477,6 +1476,8 @@ static void xgene_enet_get_stats64(
if (ring) {
stats->tx_packets += ring->tx_packets;
stats->tx_bytes += ring->tx_bytes;
+   stats->tx_dropped += ring->tx_dropped;
+   stats->tx_errors += ring->tx_errors;
}
}
 
@@ -1485,11 +1486,16 @@ static void xgene_enet_get_stats64(
if (ring) {
stats->rx_packets += 

[PATCH v2 net-next 04/12] drivers: net: xgene: Remove redundant local stats

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

Commit 5944701df90d ("net: remove useless memset's in drivers get_stats64")
makes the pdata->stats redundant. This patch removes pdata->stats and
updates get_stats64() callback accordingly.

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 7 ---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 4 +---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h| 1 -
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 28fdedc..217cde8 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -25,7 +25,7 @@ struct xgene_gstrings_stats {
int offset;
 };
 
-#define XGENE_STAT(m) { #m, offsetof(struct xgene_enet_pdata, stats.m) }
+#define XGENE_STAT(m) { #m, offsetof(struct rtnl_link_stats64, m) }
 
 static const struct xgene_gstrings_stats gstrings_stats[] = {
XGENE_STAT(rx_packets),
@@ -156,11 +156,12 @@ static void xgene_get_ethtool_stats(struct net_device 
*ndev,
struct ethtool_stats *dummy,
u64 *data)
 {
-   void *pdata = netdev_priv(ndev);
+   struct rtnl_link_stats64 stats;
int i;
 
+   dev_get_stats(ndev, );
for (i = 0; i < XGENE_STATS_LEN; i++)
-   *data++ = *(u64 *)(pdata + gstrings_stats[i].offset);
+   data[i] = *(u64 *)((char *) + gstrings_stats[i].offset);
 }
 
 static void xgene_get_pauseparam(struct net_device *ndev,
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 9a28ac3..e4f2ef2 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1466,10 +1466,9 @@ static int xgene_enet_create_desc_rings(struct 
net_device *ndev)
 
 static void xgene_enet_get_stats64(
struct net_device *ndev,
-   struct rtnl_link_stats64 *storage)
+   struct rtnl_link_stats64 *stats)
 {
struct xgene_enet_pdata *pdata = netdev_priv(ndev);
-   struct rtnl_link_stats64 *stats = >stats;
struct xgene_enet_desc_ring *ring;
int i;
 
@@ -1493,7 +1492,6 @@ static void xgene_enet_get_stats64(
stats->rx_dropped += ring->rx_dropped;
}
}
-   memcpy(storage, stats, sizeof(struct rtnl_link_stats64));
 }
 
 static int xgene_enet_set_mac_address(struct net_device *ndev, void *addr)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 827b33d..5e6fd71 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -219,7 +219,6 @@ struct xgene_enet_pdata {
int phy_mode;
enum xgene_enet_rm rm;
struct xgene_enet_cle cle;
-   struct rtnl_link_stats64 stats;
const struct xgene_mac_ops *mac_ops;
spinlock_t mac_lock; /* mac lock */
const struct xgene_port_ops *port_ops;
-- 
1.9.1



[PATCH v2 net-next 09/12] drivers: net: xgene: Workaround for HW errata 10GE_4

2017-05-10 Thread Iyappan Subramanian
From: Quan Nguyen 

This patch adds workaround for HW errata 10GE_4:
"XGENET_ICM_ECM_DROP_COUNT_REG_0 reg not clear on read".

Signed-off-by: Quan Nguyen 
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 2 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 5 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 5278d62..3235d0c 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -626,6 +626,8 @@ static void xgene_gmac_get_drop_cnt(struct xgene_enet_pdata 
*pdata,
xgene_enet_rd_mcx_csr(pdata, ICM_ECM_DROP_COUNT_REG0_ADDR, );
*rx = ICM_DROP_COUNT(count);
*tx = ECM_DROP_COUNT(count);
+   /* Errata: 10GE_4 - Fix ICM_ECM_DROP_COUNT not clear-on-read */
+   xgene_enet_rd_mcx_csr(pdata, ECM_CONFIG0_REG_0_ADDR, );
 }
 
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
index 2919d3e..31df8f3 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
@@ -106,6 +106,11 @@ static void xgene_sgmac_get_drop_cnt(struct 
xgene_enet_pdata *pdata,
count = xgene_enet_rd_mcx_csr(pdata, addr);
*rx = ICM_DROP_COUNT(count);
*tx = ECM_DROP_COUNT(count);
+   /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */
+   addr = (pdata->enet_id != XGENE_ENET1) ?
+   XG_MCX_ECM_CONFIG0_REG_0_ADDR :
+   ECM_CONFIG0_REG_0_ADDR + pdata->port_id * OFFSET_4;
+   xgene_enet_rd_mcx_csr(pdata, addr);
 }
 
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *p)
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index 56dff38..eab6f1c 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -188,6 +188,8 @@ static void xgene_xgmac_get_drop_cnt(struct 
xgene_enet_pdata *pdata,
xgene_enet_rd_axg_csr(pdata, XGENET_ICM_ECM_DROP_COUNT_REG0, );
*rx = ICM_DROP_COUNT(count);
*tx = ECM_DROP_COUNT(count);
+   /* Errata: 10GE_4 - ICM_ECM_DROP_COUNT not clear-on-read */
+   xgene_enet_rd_axg_csr(pdata, XGENET_ECM_CONFIG0_REG_0, );
 }
 
 static void xgene_enet_config_ring_if_assoc(struct xgene_enet_pdata *pdata)
-- 
1.9.1



Re: Requirements for a shutdown function?

2017-05-10 Thread Timur Tabi
On 05/09/2017 02:06 PM, Florian Fainelli wrote:
> On 05/09/2017 11:51 AM, Timur Tabi wrote:

>> Is it possible that the network stack detects a kexec and automatically
>> stops all network devices?
> 
> No. why would it? However the device driver model does call into your
> driver's remove function and that one does a right job already because
> it does an network device unregister, and so on.

I ran some more tests.  When I launch kexec, the driver's
net_device_ops.ndo_stop function is called, which already stops the interface.

So it looks to me as if the network stack does close the interface during a
kexec.  With the interface closed, there's no point in having a shutdown
function, is there?

>> My in-house driver stops the RX and TX queues.  I'm guessing that's good
>> enough, but I don't have a failing test case to prove it.
>>
> 
> That's probably good enough, yes.

Except that it turns out that the queues are already stopped by then because
the emac_close() function has already been called.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [Patch net] ipv4: restore rt->fi for reference counting

2017-05-10 Thread Julian Anastasov

Hello,

On Wed, 10 May 2017, Cong Wang wrote:

> On Wed, May 10, 2017 at 12:38 AM, Julian Anastasov  wrote:
> >
> > During NETDEV_UNREGISTER packets for dev should not
> > be flying but packets for other devs can walk the nexthops
> > for multipath routes. It is the rcu_barrier before
> > NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL
> > during this grace period but there are many places to fix that
> > assume nh_dev!=NULL.
> 
> Excellent point. Unfortunately NETDEV_UNREGISTER_FINAL is not
> yet captured by the fib event notifier.
> 
> I think readers are still safe to assume nh_dev!=NULL since we wait
> for existing readers, readers coming later can't see it any more. Or
> am I missing anything?

It is not safe because other devs can deliver packets
that still can see the multipath fib info in the FIB nodes.

CPU 1 (unreg dev)   CPU 2 (packet to another dev2)
NETDEV_DOWN => set RTNH_F_DEAD

Now traffic sent via downed
dev should stop

flush_all_backlogs
synchronize_net

Now received traffic for dev
should stop on all CPUs

fib_sync_down_dev
nh_dev = NULL
fib_table_lookup: boom in
__in_dev_get_rcu(nh->nh_dev),
all nh_dev dereferences must be
checked. If checked, it should be
safe to use this nh_dev due to the
rcu_barrier in netdev_run_todo.
But only the structure can be accessed,
traffic should not go via unreged dev.

fib_flush: unlink fi
from fa_info in
fib_table_flush

Now other CPUs will not see this
fi on lookups (after fib_table_flush)

netdev_run_todo
 rcu_barrier
Now other CPUs should not send
packets via this fib info
 NETDEV_UNREGISTER_FINAL

Notes:

- when NETDEV_DOWN is delivered fib_sync_down_dev sets RTNH_F_DEAD
but only for link routes, not for host routes

- fib_table_lookup runs without any memory ordering barriers,
when CPU 2 delivers packet from different dev it can hit
a multipath route with nh_dev set to NULL. Not so often
if RTNH_F_DEAD was set early and properly checked before
dereferencing nh_dev.

> > But why we leak routes? Is there some place that holds
> > routes without listening for NETDEV_UNREGISTER? On fib_flush
> > the infos are unlinked from trees, so after a grace period packets
> > should not see/hold such infos. If we hold routes somewhere for
> > long time, problem can happen also for routes with single nexthop.
> 
> My theory is that dst which holds a refcnt to fib_info (of course, after
> this patch) is moved in GC after NETDEV_UNREGISTER but still
> referenced somewhere, it therefore holds these nh_dev's, which
> in turn hold back to the netdevice which is being unregistered, thus
> Eric saw these refcnt leak warnings.

Oh, well, the sockets can hold cached dst.
But if the promise is that rt->fi is used only as
reference to metrics we have to find a way to drop
the dev references at NETDEV_UNREGISTER time. If you
set nh_dev to NULL then all lookups should check it
for != NULL. The sockets will not walk NHs via rt->fi,
i.e. the route lookups will get valid res.fi from trees,
so it may work in this way.

Regards

--
Julian Anastasov 


[PATCH] net-procfs: Adjust buffer output in dev_mc_seq_show()

2017-05-10 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 10 May 2017 21:17:43 +0200

* Use a special format string specification for the desired output of the
  array "addr" into a sequence.

  This issue was detected by using the Coccinelle software.

* Delete the local variable "i" and a call of the function "seq_putc"
  which became unnecessary with this refactoring.

Signed-off-by: Markus Elfring 
---
 net/core/net-procfs.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 14d09345f00d..6aa427a2fb98 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -363,15 +363,9 @@ static int dev_mc_seq_show(struct seq_file *seq, void *v)
 
netif_addr_lock_bh(dev);
netdev_for_each_mc_addr(ha, dev) {
-   int i;
-
seq_printf(seq, "%-4d %-15s %-5d %-5d ", dev->ifindex,
   dev->name, ha->refcount, ha->global_use);
-
-   for (i = 0; i < dev->addr_len; i++)
-   seq_printf(seq, "%02x", ha->addr[i]);
-
-   seq_putc(seq, '\n');
+   seq_printf(seq, "%*phN\n", dev->addr_len, ha->addr);
}
netif_addr_unlock_bh(dev);
return 0;
-- 
2.12.3



[PATCH] libertas: Avoid reading past end of buffer

2017-05-10 Thread Kees Cook
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, redefine the stat strings to be ETH_GSTRING_LEN
sizes, like other drivers. This lets us use a single memcpy that does not
leak rodata contents. Additionally adjust indentation to keep checkpatch.pl
happy.

This was found with the future CONFIG_FORTIFY_SOURCE feature.

Cc: Daniel Micay 
Signed-off-by: Kees Cook 
---
v2: use ETH_GSTRING_LEN; joe
---
 drivers/net/wireless/marvell/libertas/mesh.c | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wireless/marvell/libertas/mesh.c 
b/drivers/net/wireless/marvell/libertas/mesh.c
index d0c881dd5846..6076c83ce5ab 100644
--- a/drivers/net/wireless/marvell/libertas/mesh.c
+++ b/drivers/net/wireless/marvell/libertas/mesh.c
@@ -1108,15 +1108,15 @@ void lbs_mesh_set_txpd(struct lbs_private *priv,
  * Ethtool related
  */
 
-static const char * const mesh_stat_strings[] = {
-   "drop_duplicate_bcast",
-   "drop_ttl_zero",
-   "drop_no_fwd_route",
-   "drop_no_buffers",
-   "fwded_unicast_cnt",
-   "fwded_bcast_cnt",
-   "drop_blind_table",
-   "tx_failed_cnt"
+static const char mesh_stat_strings[MESH_STATS_NUM][ETH_GSTRING_LEN] = {
+   "drop_duplicate_bcast",
+   "drop_ttl_zero",
+   "drop_no_fwd_route",
+   "drop_no_buffers",
+   "fwded_unicast_cnt",
+   "fwded_bcast_cnt",
+   "drop_blind_table",
+   "tx_failed_cnt"
 };
 
 void lbs_mesh_ethtool_get_stats(struct net_device *dev,
@@ -1170,17 +1170,11 @@ int lbs_mesh_ethtool_get_sset_count(struct net_device 
*dev, int sset)
 void lbs_mesh_ethtool_get_strings(struct net_device *dev,
uint32_t stringset, uint8_t *s)
 {
-   int i;
-
lbs_deb_enter(LBS_DEB_ETHTOOL);
 
switch (stringset) {
case ETH_SS_STATS:
-   for (i = 0; i < MESH_STATS_NUM; i++) {
-   memcpy(s + i * ETH_GSTRING_LEN,
-   mesh_stat_strings[i],
-   ETH_GSTRING_LEN);
-   }
+   memcpy(s, *mesh_stat_strings, sizeof(mesh_stat_strings));
break;
}
lbs_deb_enter(LBS_DEB_ETHTOOL);
-- 
2.7.4


-- 
Kees Cook
Pixel Security


[PATCH 4/5] bpf: Add bpf_verify_program() to the library.

2017-05-10 Thread David Miller

This allows a test case to load a BPF program and unconditionally
acquire the verifier log.

It also allows specification of the strict alignment flag.

Signed-off-by: David S. Miller 
---
 tools/lib/bpf/bpf.c | 22 ++
 tools/lib/bpf/bpf.h |  4 
 2 files changed, 26 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4fe444b80..6e17898 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -117,6 +117,28 @@ int bpf_load_program(enum bpf_prog_type type, const struct 
bpf_insn *insns,
return sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
 }
 
+int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+  size_t insns_cnt, int strict_alignment,
+  const char *license, __u32 kern_version,
+  char *log_buf, size_t log_buf_sz)
+{
+   union bpf_attr attr;
+
+   bzero(, sizeof(attr));
+   attr.prog_type = type;
+   attr.insn_cnt = (__u32)insns_cnt;
+   attr.insns = ptr_to_u64(insns);
+   attr.license = ptr_to_u64(license);
+   attr.log_buf = ptr_to_u64(log_buf);
+   attr.log_size = log_buf_sz;
+   attr.log_level = 2;
+   log_buf[0] = 0;
+   attr.kern_version = kern_version;
+   attr.prog_flags = strict_alignment ? BPF_F_STRICT_ALIGNMENT : 0;
+
+   return sys_bpf(BPF_PROG_LOAD, , sizeof(attr));
+}
+
 int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags)
 {
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index edb4dae..972bd83 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -35,6 +35,10 @@ int bpf_load_program(enum bpf_prog_type type, const struct 
bpf_insn *insns,
 size_t insns_cnt, const char *license,
 __u32 kern_version, char *log_buf,
 size_t log_buf_sz);
+int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+  size_t insns_cnt, int strict_alignment,
+  const char *license, __u32 kern_version,
+  char *log_buf, size_t log_buf_sz);
 
 int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
-- 
2.1.2.532.g19b5d50



[PATCH 5/5] bpf: Add verifier test case for alignment.

2017-05-10 Thread David Miller

Signed-off-by: David S. Miller 
---
 tools/testing/selftests/bpf/Makefile |   3 +-
 tools/testing/selftests/bpf/test_align.c | 417 +++
 2 files changed, 419 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_align.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 91edd05..f92f27d 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -11,7 +11,8 @@ endif
 CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) 
-I../../../include
 LDLIBS += -lcap -lelf
 
-TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs
+TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
+   test_align
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o
 
diff --git a/tools/testing/selftests/bpf/test_align.c 
b/tools/testing/selftests/bpf/test_align.c
new file mode 100644
index 000..9dd9794
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -0,0 +1,417 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "../../../include/linux/filter.h"
+
+#ifndef ARRAY_SIZE
+# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#define MAX_INSNS  512
+#define MAX_MATCHES16
+
+struct bpf_align_test {
+   const char *descr;
+   struct bpf_insn insns[MAX_INSNS];
+   enum {
+   UNDEF,
+   ACCEPT,
+   REJECT
+   } result;
+   enum bpf_prog_type prog_type;
+   const char *matches[MAX_MATCHES];
+};
+
+static struct bpf_align_test tests[] = {
+   {
+   .descr = "mov",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_3, 2),
+   BPF_MOV64_IMM(BPF_REG_3, 4),
+   BPF_MOV64_IMM(BPF_REG_3, 8),
+   BPF_MOV64_IMM(BPF_REG_3, 16),
+   BPF_MOV64_IMM(BPF_REG_3, 32),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   .matches = {
+   "1: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 
R10=fp",
+   "2: R1=ctx R3=imm4,min_value=4,max_value=4,min_align=4 
R10=fp",
+   "3: R1=ctx R3=imm8,min_value=8,max_value=8,min_align=8 
R10=fp",
+   "4: R1=ctx 
R3=imm16,min_value=16,max_value=16,min_align=16 R10=fp",
+   "5: R1=ctx 
R3=imm32,min_value=32,max_value=32,min_align=32 R10=fp",
+   },
+   },
+   {
+   .descr = "shift",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_3, 1),
+   BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1),
+   BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1),
+   BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1),
+   BPF_ALU64_IMM(BPF_LSH, BPF_REG_3, 1),
+   BPF_ALU64_IMM(BPF_RSH, BPF_REG_3, 4),
+   BPF_MOV64_IMM(BPF_REG_4, 32),
+   BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1),
+   BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1),
+   BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1),
+   BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 1),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+   .matches = {
+   "1: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R10=fp",
+   "2: R1=ctx R3=imm2,min_value=2,max_value=2,min_align=2 
R10=fp",
+   "3: R1=ctx R3=imm4,min_value=4,max_value=4,min_align=4 
R10=fp",
+   "4: R1=ctx R3=imm8,min_value=8,max_value=8,min_align=8 
R10=fp",
+   "5: R1=ctx 
R3=imm16,min_value=16,max_value=16,min_align=16 R10=fp",
+   "6: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R10=fp",
+   "7: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R4=imm32,min_value=32,max_value=32,min_align=32 R10=fp",
+   "8: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R4=imm16,min_value=16,max_value=16,min_align=16 R10=fp",
+   "9: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R4=imm8,min_value=8,max_value=8,min_align=8 R10=fp",
+   "10: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R4=imm4,min_value=4,max_value=4,min_align=4 R10=fp",
+   "11: R1=ctx R3=imm1,min_value=1,max_value=1,min_align=1 
R4=imm2,min_value=2,max_value=2,min_align=2 R10=fp",
+ 

[PATCH 3/5] bpf: Add strict alignment flag for BPF_PROG_LOAD.

2017-05-10 Thread David Miller

Add a new field, "prog_flags", and an initial flag value
BPF_F_STRCIT_ALIGNMENT.

When set, the verifier will enforce strict pointer alignment
regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS.

The verifier, in this mode, will also use a fixed value of "2" in
place of NET_IP_ALIGN.

This facilitates test cases that will exercise and validate this part
of the verifier even when run on architectures where alignment doesn't
matter.

Signed-off-by: David S. Miller 
---
 include/uapi/linux/bpf.h   |  8 
 kernel/bpf/syscall.c   |  5 -
 kernel/bpf/verifier.c  | 20 
 tools/build/feature/test-bpf.c |  1 +
 tools/include/uapi/linux/bpf.h | 11 +--
 5 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 945a1f5..94dfa9d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -132,6 +132,13 @@ enum bpf_attach_type {
  */
 #define BPF_F_ALLOW_OVERRIDE   (1U << 0)
 
+/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
+ * verifier will perform strict alignment checking as if the kernel
+ * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
+ * and NET_IP_ALIGN defined to 2.
+ */
+#define BPF_F_STRICT_ALIGNMENT (1U << 0)
+
 #define BPF_PSEUDO_MAP_FD  1
 
 /* flags for BPF_MAP_UPDATE_ELEM command */
@@ -177,6 +184,7 @@ union bpf_attr {
__u32   log_size;   /* size of user buffer */
__aligned_u64   log_buf;/* user supplied buffer */
__u32   kern_version;   /* checked when 
prog_type=kprobe */
+   __u32   prog_flags;
};
 
struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fd2411f..265a0d8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -783,7 +783,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum 
bpf_prog_type type)
 EXPORT_SYMBOL_GPL(bpf_prog_get_type);
 
 /* last field in 'union bpf_attr' used by this command */
-#defineBPF_PROG_LOAD_LAST_FIELD kern_version
+#defineBPF_PROG_LOAD_LAST_FIELD prog_flags
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -796,6 +796,9 @@ static int bpf_prog_load(union bpf_attr *attr)
if (CHECK_ATTR(BPF_PROG_LOAD))
return -EINVAL;
 
+   if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
+   return -EINVAL;
+
/* copy eBPF program license from user space */
if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
  sizeof(license) - 1) < 0)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e1effce..a406555 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -796,6 +796,7 @@ static bool is_pointer_value(struct bpf_verifier_env *env, 
int regno)
 static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
   int off, int size, bool strict)
 {
+   int ip_align;
int reg_off;
 
/* Byte size accesses are always allowed. */
@@ -812,10 +813,14 @@ static int check_pkt_ptr_alignment(const struct 
bpf_reg_state *reg,
reg_off += reg->aux_off;
}
 
-   /* skb->data is NET_IP_ALIGN-ed */
-   if ((NET_IP_ALIGN + reg_off + off) % size != 0) {
+   /* skb->data is NET_IP_ALIGN-ed, but for strict alignment checking
+* we force this to 2 which is universally what architectures use
+* when they don't set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
+*/
+   ip_align = strict ? 2 : NET_IP_ALIGN;
+   if ((ip_align + reg_off + off) % size != 0) {
verbose("misaligned packet access off %d+%d+%d size %d\n",
-   NET_IP_ALIGN, reg_off, off, size);
+   ip_align, reg_off, off, size);
return -EACCES;
}
 
@@ -833,10 +838,12 @@ static int check_val_ptr_alignment(const struct 
bpf_reg_state *reg,
return 0;
 }
 
+static bool strict_alignment;
+
 static int check_ptr_alignment(const struct bpf_reg_state *reg,
   int off, int size)
 {
-   bool strict = false;
+   bool strict = strict_alignment;
 
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
strict = true;
@@ -3574,6 +3581,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
} else {
log_level = 0;
}
+   if (attr->prog_flags & BPF_F_STRICT_ALIGNMENT)
+   strict_alignment = true;
+   else
+   strict_alignment = false;
 
ret = replace_map_fd_with_map_ptr(env);
if (ret < 0)
@@ -3679,6 +3690,7 @@ int bpf_analyzer(struct bpf_prog *prog, const struct 
bpf_ext_analyzer_ops *ops,
mutex_lock(_verifier_lock);
 
log_level = 0;
+   strict_alignment = false;
 

[PATCH 2/5] bpf: Do per-instruction state dumping in verifier when log_level > 1.

2017-05-10 Thread David Miller

If log_level > 1, do a state dump every instruction and emit it in
a more compact way (without a leading newline).

This will facilitate more sophisticated test cases which inspect the
verifier log for register state.

Signed-off-by: David S. Miller 
---
 kernel/bpf/verifier.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 56230ff..e1effce 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2932,8 +2932,12 @@ static int do_check(struct bpf_verifier_env *env)
goto process_bpf_exit;
}
 
-   if (log_level && do_print_state) {
-   verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);
+   if (log_level > 1 || (log_level && do_print_state)) {
+   if (log_level > 1)
+   verbose("%d:", insn_idx);
+   else
+   verbose("\nfrom %d to %d:",
+   prev_insn_idx, insn_idx);
print_verifier_state(>cur_state);
do_print_state = false;
}
-- 
2.1.2.532.g19b5d50



[PATCH 1/5] bpf: Track alignment of register values in the verifier.

2017-05-10 Thread David Miller

Currently if we add only constant values to pointers we can fully
validate the alignment, and properly check if we need to reject the
program on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.

However, once an unknown value is introduced we only allow byte sized
memory accesses which is too restrictive.

Add logic to track the known minimum alignment of register values,
and propagate this state into registers containing pointers.

The most common paradigm that makes use of this new logic is computing
the transport header using the IP header length field.  For example:

struct ethhdr *ep = skb->data;
struct iphdr *iph = (struct iphdr *) (ep + 1);
struct tcphdr *th;
 ...
n = iph->ihl;
th = ((void *)iph + (n * 4));
port = th->dest;

The existing code will reject the load of th->dport because it cannot
validate that the alignment is at least 2 once "n * 4" is added the
the packet pointer.

In the new code, the register holding "n * 4" will have a reg->min_align
value of 4, because any value multiplied by 4 will be at least 4 byte
aligned.  (actually, the eBPF code emitted by the compiler in this case
is most likely to use a shift left by 2, but the end result is identical)

At the critical addition:

th = ((void *)iph + (n * 4));

The register holding 'th' will start with reg->off value of 14.  The
pointer addition will transform that reg into something that looks like:

reg->aux_off = 14
reg->aux_off_align = 4

Next, the verifier will look at the th->dest load, and it will see
a load offset of 2, and first check:

if (reg->aux_off_align % size)

which will pass because aux_off_align is 4.  reg_off will be computed:

reg_off = reg->off;
 ...
reg_off += reg->aux_off;

plus we have off==2, and it will thus check:

if ((NET_IP_ALIGN + reg_off + off) % size != 0)

which evaluates to:

if ((NET_IP_ALIGN + 14 + 2) % size != 0)

On strict alignment architectures, NET_IP_ALIGN is 2, thus:

if ((2 + 14 + 2) % size != 0)

which passes.

These pointer transformations and checks work regardless of whether
the constant offset or the variable with known alignment is added
first to the pointer register.

Signed-off-by: David S. Miller 
---
 include/linux/bpf_verifier.h |   3 ++
 kernel/bpf/verifier.c| 114 +++
 2 files changed, 98 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 5efb4db..7c6a519 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -40,6 +40,9 @@ struct bpf_reg_state {
 */
s64 min_value;
u64 max_value;
+   u32 min_align;
+   u32 aux_off;
+   u32 aux_off_align;
 };
 
 enum bpf_stack_slot_type {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c5b56c9..56230ff 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -241,6 +241,12 @@ static void print_verifier_state(struct bpf_verifier_state 
*state)
if (reg->max_value != BPF_REGISTER_MAX_RANGE)
verbose(",max_value=%llu",
(unsigned long long)reg->max_value);
+   if (reg->min_align)
+   verbose(",min_align=%u", reg->min_align);
+   if (reg->aux_off)
+   verbose(",aux_off=%u", reg->aux_off);
+   if (reg->aux_off_align)
+   verbose(",aux_off_align=%u", reg->aux_off_align);
}
for (i = 0; i < MAX_BPF_STACK; i += BPF_REG_SIZE) {
if (state->stack_slot_type[i] == STACK_SPILL)
@@ -466,6 +472,9 @@ static void init_reg_state(struct bpf_reg_state *regs)
regs[i].imm = 0;
regs[i].min_value = BPF_REGISTER_MIN_RANGE;
regs[i].max_value = BPF_REGISTER_MAX_RANGE;
+   regs[i].min_align = 0;
+   regs[i].aux_off = 0;
+   regs[i].aux_off_align = 0;
}
 
/* frame pointer */
@@ -494,11 +503,17 @@ static void reset_reg_range_values(struct bpf_reg_state 
*regs, u32 regno)
regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
 }
 
+static void reset_reg_align(struct bpf_reg_state *regs, u32 regno)
+{
+   regs[regno].min_align = 0;
+}
+
 static void mark_reg_unknown_value_and_range(struct bpf_reg_state *regs,
 u32 regno)
 {
mark_reg_unknown_value(regs, regno);
reset_reg_range_values(regs, regno);
+   reset_reg_align(regs, regno);
 }
 
 enum reg_arg_type {
@@ -779,17 +794,28 @@ static bool is_pointer_value(struct bpf_verifier_env 
*env, int regno)
 }
 
 static int check_pkt_ptr_alignment(const struct bpf_reg_state *reg,
-  int off, int size)
+  int off, int size, bool strict)
 {
-   if (reg->id && size != 1) {
-   

[PATCH 0/5] bpf: Add alignment tracker to verifier.

2017-05-10 Thread David Miller

This is the whole series, more details in the individual commit
messages.

First we add the alignment tracking logic to the verifier.

Next, we work on building up infrastructure to facilitate regression
testing of this facility.

Finally, we add the "test_align" test case.

Signed-off-by: David S. Miller 


Re: [PATCH] libertas: Avoid reading past end of buffer

2017-05-10 Thread Kees Cook
On Tue, May 9, 2017 at 9:33 PM, Joe Perches  wrote:
> On Tue, 2017-05-09 at 16:23 -0700, Kees Cook wrote:
>> Using memcpy() from a string that is shorter than the length copied means
>> the destination buffer is being filled with arbitrary data from the kernel
>> rodata segment. Instead, use strncpy() which will fill the trailing bytes
>> with zeros. Additionally adjust indentation to keep checkpatch.pl happy.
>>
>> This was found with the future CONFIG_FORTIFY_SOURCE feature.
> []
>> diff --git a/drivers/net/wireless/marvell/libertas/mesh.c 
>> b/drivers/net/wireless/marvell/libertas/mesh.c
> []
>> @@ -1177,9 +1177,9 @@ void lbs_mesh_ethtool_get_strings(struct net_device 
>> *dev,
>>   switch (stringset) {
>>   case ETH_SS_STATS:
>>   for (i = 0; i < MESH_STATS_NUM; i++) {
>> - memcpy(s + i * ETH_GSTRING_LEN,
>> - mesh_stat_strings[i],
>> - ETH_GSTRING_LEN);
>> + strncpy(s + i * ETH_GSTRING_LEN,
>> + mesh_stat_strings[i],
>> + ETH_GSTRING_LEN);
>>   }
>
> The better solution is to declare
> mesh_stat_strings in in the normal way
>
> ---
>  drivers/net/wireless/marvell/libertas/mesh.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/wireless/marvell/libertas/mesh.c 
> b/drivers/net/wireless/marvell/libertas/mesh.c
> index d0c881dd5846..a535e7f48d2d 100644
> --- a/drivers/net/wireless/marvell/libertas/mesh.c
> +++ b/drivers/net/wireless/marvell/libertas/mesh.c
> @@ -1108,15 +1108,15 @@ void lbs_mesh_set_txpd(struct lbs_private *priv,
>   * Ethtool related
>   */
>
> -static const char * const mesh_stat_strings[] = {
> -   "drop_duplicate_bcast",
> -   "drop_ttl_zero",
> -   "drop_no_fwd_route",
> -   "drop_no_buffers",
> -   "fwded_unicast_cnt",
> -   "fwded_bcast_cnt",
> -   "drop_blind_table",
> -   "tx_failed_cnt"
> +static const char mesh_stat_strings[][ETH_GSTRING_LEN] = {
> +   "drop_duplicate_bcast",
> +   "drop_ttl_zero",
> +   "drop_no_fwd_route",
> +   "drop_no_buffers",
> +   "fwded_unicast_cnt",
> +   "fwded_bcast_cnt",
> +   "drop_blind_table",
> +   "tx_failed_cnt",
>  };
>
>  void lbs_mesh_ethtool_get_stats(struct net_device *dev,

Ah yeah! And that simplifies the memcpy too, since it can just do it
in one go. New patch on the way...

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] wcn36xx: Close SMD channel on device removal

2017-05-10 Thread Bjorn Andersson
On Wed 10 May 00:27 PDT 2017, Arend van Spriel wrote:

> On 5/10/2017 1:03 AM, Bjorn Andersson wrote:
> > On Mon 08 May 23:17 PDT 2017, Kalle Valo wrote:
> > 
> > > Bjorn Andersson  writes:
> > > 
> > > > The SMD channel is not the primary WCNSS channel and must explicitly be
> > > > closed as the device is removed, or the channel will already by open on
> > > > a subsequent probe call in e.g. the case of reloading the kernel module.
> > > > 
> > > > This issue was introduced because I simplified the underlying SMD
> > > > implementation while the SMD adaptions of the driver sat on the mailing
> > > > list, but missed to update these patches. The patch does however only
> > > > apply back to the transition to rpmsg, hence the limited Fixes.
> > > > 
> > > > Fixes: 5052de8deff5 ("soc: qcom: smd: Transition client drivers from 
> > > > smd to rpmsg")
> > > > Reported-by: Eyal Ilsar 
> > > > Signed-off-by: Bjorn Andersson 
> > > 
> > > As this is a regression I'll queue this to 4.12.
> > > 
> > 
> > Thanks.
> > 
> > > But if this is an older bug (didn't quite understand your description
> > > though) should there be a separate patch for stable releases?
> > > 
> > 
> > AFAICT this never worked, as it seems I did the rework in SMD while we
> > tried to figure out the dependency issues we had with moving to SMD. So
> > v4.9 through v4.11 has SMD support - with this bug.
> > 
> > How do I proceed, do you want me to write up a fix for stable@? Do I
> > send that out as an ordinary patch?
> 
> If the patch applies cleanly on branches linux-4.9.y through linux-4.11.y in
> the stable repository you can go for '--- Option 1 ---' as described in
> /Documentation/stable_kernel_rules.txt.
> 

It does not, before v4.12 it's a completely different function to call
to close the channel.

But "Option 3" describes the situation, thanks for the reference. I'll
try to find the time to verify the patch on v4.11 and send it to
stable@.

Regards,
Bjorn


[PATCH] net: ethernet: ti: netcp_core: return error while dma channel open issue

2017-05-10 Thread Ivan Khoronzhuk
Fix error path while dma open channel issue. Also, no need to check output
on NULL if it's never returned.

Signed-off-by: Ivan Khoronzhuk 
---
Based on net-next

 drivers/net/ethernet/ti/netcp_core.c | 6 --
 drivers/soc/ti/knav_dma.c| 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index 729a7da..e6222e5 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1353,9 +1353,10 @@ int netcp_txpipe_open(struct netcp_tx_pipe *tx_pipe)
 
tx_pipe->dma_channel = knav_dma_open_channel(dev,
tx_pipe->dma_chan_name, );
-   if (IS_ERR_OR_NULL(tx_pipe->dma_channel)) {
+   if (IS_ERR(tx_pipe->dma_channel)) {
dev_err(dev, "failed opening tx chan(%s)\n",
tx_pipe->dma_chan_name);
+   ret = PTR_ERR(tx_pipe->dma_channel);
goto err;
}
 
@@ -1673,9 +1674,10 @@ static int netcp_setup_navigator_resources(struct 
net_device *ndev)
 
netcp->rx_channel = knav_dma_open_channel(netcp->netcp_device->device,
netcp->dma_chan_name, );
-   if (IS_ERR_OR_NULL(netcp->rx_channel)) {
+   if (IS_ERR(netcp->rx_channel)) {
dev_err(netcp->ndev_dev, "failed opening rx chan(%s\n",
netcp->dma_chan_name);
+   ret = PTR_ERR(netcp->rx_channel);
goto fail;
}
 
diff --git a/drivers/soc/ti/knav_dma.c b/drivers/soc/ti/knav_dma.c
index ecebe2e..026182d 100644
--- a/drivers/soc/ti/knav_dma.c
+++ b/drivers/soc/ti/knav_dma.c
@@ -413,7 +413,7 @@ static int of_channel_match_helper(struct device_node *np, 
const char *name,
  * @name:  slave channel name
  * @config:dma configuration parameters
  *
- * Returns pointer to appropriate DMA channel on success or NULL.
+ * Returns pointer to appropriate DMA channel on success or error.
  */
 void *knav_dma_open_channel(struct device *dev, const char *name,
struct knav_dma_cfg *config)
-- 
2.7.4



[PATCH net 2/4] s390/qeth: unbreak OSM and OSN support

2017-05-10 Thread Julian Wiedmann
commit b4d72c08b358 ("qeth: bridgeport support - basic control")
broke the support for OSM and OSN devices as follows:

As OSM and OSN are L2 only, qeth_core_probe_device() does an early
setup by loading the l2 discipline and calling qeth_l2_probe_device().
In this context, adding the l2-specific bridgeport sysfs attributes
via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
since the basic sysfs infrastructure for the device hasn't been
established yet.

Note that OSN actually has its own unique sysfs attributes
(qeth_osn_devtype), so the additional attributes shouldn't be created
at all.
For OSM, add a new qeth_l2_devtype that contains all the common
and l2-specific sysfs attributes.
When qeth_core_probe_device() does early setup for OSM or OSN, assign
the corresponding devtype so that the ccwgroup probe code creates the
full set of sysfs attributes.
This allows us to skip qeth_l2_create_device_attributes() in case
of an early setup.

Any device that can't do early setup will initially have only the
generic sysfs attributes, and when it's probed later
qeth_l2_probe_device() adds the l2-specific attributes.

If an early-setup device is removed (by calling ccwgroup_ungroup()),
device_unregister() will - using the devtype - delete the
l2-specific attributes before qeth_l2_remove_device() is called.
So make sure to not remove them twice.

What complicates the issue is that qeth_l2_probe_device() and
qeth_l2_remove_device() is also called on a device when its
layer2 attribute changes (ie. its layer mode is switched).
For early-setup devices this wouldn't work properly - we wouldn't
remove the l2-specific attributes when switching to L3.
But switching the layer mode doesn't actually make any sense;
we already decided that the device can only operate in L2!
So just refuse to switch the layer mode on such devices. Note that
OSN doesn't have a layer2 attribute, so we only need to special-case
OSM.

Based on an initial patch by Ursula Braun.

Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core.h  |  4 
 drivers/s390/net/qeth_core_main.c | 17 +
 drivers/s390/net/qeth_core_sys.c  | 22 ++
 drivers/s390/net/qeth_l2.h|  2 ++
 drivers/s390/net/qeth_l2_main.c   | 17 +
 drivers/s390/net/qeth_l2_sys.c|  8 
 drivers/s390/net/qeth_l3_main.c   |  1 +
 7 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index f6aa21176d89..30bc6105aac3 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -701,6 +701,7 @@ enum qeth_discipline_id {
 };
 
 struct qeth_discipline {
+   const struct device_type *devtype;
void (*start_poll)(struct ccw_device *, int, unsigned long);
qdio_handler_t *input_handler;
qdio_handler_t *output_handler;
@@ -875,6 +876,9 @@ extern struct qeth_discipline qeth_l2_discipline;
 extern struct qeth_discipline qeth_l3_discipline;
 extern const struct attribute_group *qeth_generic_attr_groups[];
 extern const struct attribute_group *qeth_osn_attr_groups[];
+extern const struct attribute_group qeth_device_attr_group;
+extern const struct attribute_group qeth_device_blkt_group;
+extern const struct device_type qeth_generic_devtype;
 extern struct workqueue_struct *qeth_wq;
 
 int qeth_card_hw_is_reachable(struct qeth_card *);
diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 5bfd7c15d6a9..fc6d85f2b38d 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5530,10 +5530,12 @@ void qeth_core_free_discipline(struct qeth_card *card)
card->discipline = NULL;
 }
 
-static const struct device_type qeth_generic_devtype = {
+const struct device_type qeth_generic_devtype = {
.name = "qeth_generic",
.groups = qeth_generic_attr_groups,
 };
+EXPORT_SYMBOL_GPL(qeth_generic_devtype);
+
 static const struct device_type qeth_osn_devtype = {
.name = "qeth_osn",
.groups = qeth_osn_attr_groups,
@@ -5659,23 +5661,22 @@ static int qeth_core_probe_device(struct 
ccwgroup_device *gdev)
goto err_card;
}
 
-   if (card->info.type == QETH_CARD_TYPE_OSN)
-   gdev->dev.type = _osn_devtype;
-   else
-   gdev->dev.type = _generic_devtype;
-
switch (card->info.type) {
case QETH_CARD_TYPE_OSN:
case QETH_CARD_TYPE_OSM:
rc = qeth_core_load_discipline(card, QETH_DISCIPLINE_LAYER2);
if (rc)
goto err_card;
+
+   gdev->dev.type = (card->info.type != QETH_CARD_TYPE_OSN)
+   ? card->discipline->devtype
+   : _osn_devtype;
rc = card->discipline->setup(card->gdev);
if (rc)
  

[PATCH net 3/4] s390/qeth: avoid null pointer dereference on OSN

2017-05-10 Thread Julian Wiedmann
Access card->dev only after checking whether's its valid.

Signed-off-by: Julian Wiedmann 
Reviewed-by: Ursula Braun 
---
 drivers/s390/net/qeth_l2_main.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 04666fe231aa..bd2df62a5cdf 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -965,7 +965,6 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
case QETH_CARD_TYPE_OSN:
card->dev = alloc_netdev(0, "osn%d", NET_NAME_UNKNOWN,
 ether_setup);
-   card->dev->flags |= IFF_NOARP;
break;
default:
card->dev = alloc_etherdev(0);
@@ -980,9 +979,12 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
card->dev->min_mtu = 64;
card->dev->max_mtu = ETH_MAX_MTU;
card->dev->netdev_ops = _l2_netdev_ops;
-   card->dev->ethtool_ops =
-   (card->info.type != QETH_CARD_TYPE_OSN) ?
-   _l2_ethtool_ops : _l2_osn_ops;
+   if (card->info.type == QETH_CARD_TYPE_OSN) {
+   card->dev->ethtool_ops = _l2_osn_ops;
+   card->dev->flags |= IFF_NOARP;
+   } else {
+   card->dev->ethtool_ops = _l2_ethtool_ops;
+   }
card->dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
if (card->info.type == QETH_CARD_TYPE_OSD && !card->info.guestlan) {
card->dev->hw_features = NETIF_F_SG;
-- 
2.10.2



[PATCH net 0/4] s390/net fixes

2017-05-10 Thread Julian Wiedmann
Hello Dave,

some qeth fixes for -net, the OSM/OSN one being the most crucial.
Please also queue these up for stable.

Thanks,
Julian

Julian Wiedmann (2):
  s390/qeth: unbreak OSM and OSN support
  s390/qeth: avoid null pointer dereference on OSN

Ursula Braun (2):
  s390/qeth: handle sysfs error during initialization
  s390/qeth: add missing hash table initializations

 drivers/s390/net/qeth_core.h  |  4 
 drivers/s390/net/qeth_core_main.c | 21 -
 drivers/s390/net/qeth_core_sys.c  | 24 
 drivers/s390/net/qeth_l2.h|  2 ++
 drivers/s390/net/qeth_l2_main.c   | 26 --
 drivers/s390/net/qeth_l2_sys.c|  8 
 drivers/s390/net/qeth_l3_main.c   |  8 +++-
 7 files changed, 69 insertions(+), 24 deletions(-)

-- 
2.10.2



[PATCH net 1/4] s390/qeth: handle sysfs error during initialization

2017-05-10 Thread Julian Wiedmann
From: Ursula Braun 

When setting up the device from within the layer discipline's
probe routine, creating the layer-specific sysfs attributes can fail.
Report this error back to the caller, and handle it by
releasing the layer discipline.

Signed-off-by: Ursula Braun 
[jwi: updated commit msg, moved an OSN change to a subsequent patch]
Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_core_main.c | 4 +++-
 drivers/s390/net/qeth_core_sys.c  | 2 ++
 drivers/s390/net/qeth_l2_main.c   | 5 -
 drivers/s390/net/qeth_l3_main.c   | 5 -
 4 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 38114a8d56e0..5bfd7c15d6a9 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5731,8 +5731,10 @@ static int qeth_core_set_online(struct ccwgroup_device 
*gdev)
if (rc)
goto err;
rc = card->discipline->setup(card->gdev);
-   if (rc)
+   if (rc) {
+   qeth_core_free_discipline(card);
goto err;
+   }
}
rc = card->discipline->set_online(gdev);
 err:
diff --git a/drivers/s390/net/qeth_core_sys.c b/drivers/s390/net/qeth_core_sys.c
index 75b29fd2fcf4..412ff61891ac 100644
--- a/drivers/s390/net/qeth_core_sys.c
+++ b/drivers/s390/net/qeth_core_sys.c
@@ -426,6 +426,8 @@ static ssize_t qeth_dev_layer2_store(struct device *dev,
goto out;
 
rc = card->discipline->setup(card->gdev);
+   if (rc)
+   qeth_core_free_discipline(card);
 out:
mutex_unlock(>discipline_mutex);
return rc ? rc : count;
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 1b07f382d74c..2d2623182abf 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -883,8 +883,11 @@ static int qeth_l2_stop(struct net_device *dev)
 static int qeth_l2_probe_device(struct ccwgroup_device *gdev)
 {
struct qeth_card *card = dev_get_drvdata(>dev);
+   int rc;
 
-   qeth_l2_create_device_attributes(>dev);
+   rc = qeth_l2_create_device_attributes(>dev);
+   if (rc)
+   return rc;
INIT_LIST_HEAD(>vid_list);
hash_init(card->mac_htable);
card->options.layer2 = 1;
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 6e0354ef4b86..ae70daf33bd5 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3039,8 +3039,11 @@ static int qeth_l3_setup_netdev(struct qeth_card *card)
 static int qeth_l3_probe_device(struct ccwgroup_device *gdev)
 {
struct qeth_card *card = dev_get_drvdata(>dev);
+   int rc;
 
-   qeth_l3_create_device_attributes(>dev);
+   rc = qeth_l3_create_device_attributes(>dev);
+   if (rc)
+   return rc;
card->options.layer2 = 0;
card->info.hwtrap = 0;
return 0;
-- 
2.10.2



[PATCH net 4/4] s390/qeth: add missing hash table initializations

2017-05-10 Thread Julian Wiedmann
From: Ursula Braun 

commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
added new hash tables, but missed to initialize them.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Ursula Braun 
Reviewed-by: Julian Wiedmann 
Signed-off-by: Julian Wiedmann 
---
 drivers/s390/net/qeth_l3_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 6c2146fc831a..d8df1e635163 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3044,6 +3044,8 @@ static int qeth_l3_probe_device(struct ccwgroup_device 
*gdev)
rc = qeth_l3_create_device_attributes(>dev);
if (rc)
return rc;
+   hash_init(card->ip_htable);
+   hash_init(card->ip_mc_htable);
card->options.layer2 = 0;
card->info.hwtrap = 0;
return 0;
-- 
2.10.2



Re: [Patch net] ipv4: restore rt->fi for reference counting

2017-05-10 Thread Cong Wang
On Wed, May 10, 2017 at 12:38 AM, Julian Anastasov  wrote:
>
> Hello,
>
> On Tue, 9 May 2017, Cong Wang wrote:
>
>> > Also setting nexthop_nh->nh_dev to NULL looks quite dangerous
>> >
>> > We have plenty of sites doing :
>> >
>> > if (fi->fib_dev)
>> > x = fi->fib_dev->field
>> >
>> > fib_route_seq_show() is one example.
>> >
>>
>> All of them take RCU read lock, so, as I explained in the code comment,
>> they all should be fine because of synchronize_net() on unregister path.
>> Do you see anything otherwise?
>
> During NETDEV_UNREGISTER packets for dev should not
> be flying but packets for other devs can walk the nexthops
> for multipath routes. It is the rcu_barrier before
> NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL
> during this grace period but there are many places to fix that
> assume nh_dev!=NULL.

Excellent point. Unfortunately NETDEV_UNREGISTER_FINAL is not
yet captured by the fib event notifier.

I think readers are still safe to assume nh_dev!=NULL since we wait
for existing readers, readers coming later can't see it any more. Or
am I missing anything?

>
> But why we leak routes? Is there some place that holds
> routes without listening for NETDEV_UNREGISTER? On fib_flush
> the infos are unlinked from trees, so after a grace period packets
> should not see/hold such infos. If we hold routes somewhere for
> long time, problem can happen also for routes with single nexthop.

My theory is that dst which holds a refcnt to fib_info (of course, after
this patch) is moved in GC after NETDEV_UNREGISTER but still
referenced somewhere, it therefore holds these nh_dev's, which
in turn hold back to the netdevice which is being unregistered, thus
Eric saw these refcnt leak warnings.

I am not sure if sitting in GC for a long time is problematic or not,
but from the code where we transfer dst->dev to loopback_dev,
it seems this is expected otherwise no need to transfer? But I don't
dig the history though.

Thanks.


Re: [PATCH net 2/3] net/mlx4_en: Avoid adding steering rules with invalid ring

2017-05-10 Thread Or Gerlitz
On Tue, May 9, 2017 at 2:45 PM, Tariq Toukan  wrote:
> From: Talat Batheesh 
>
> Inserting steering rules with illegal ring is an invalid operation, block it.

Hi Dave,

I realized today that the patch introduced a regression, Tariq will
see if to revert it as a whole or fix the regression.

Just wanted to drop you a note and to make sure you don't further
carry it to -stable,

xxit happens :(

Or.


Re: bpf pointer alignment validation

2017-05-10 Thread David Miller
From: Daniel Borkmann 
Date: Wed, 10 May 2017 18:21:50 +0200

> On 05/10/2017 05:57 PM, David Miller wrote:
>> From: Daniel Borkmann 
>> Date: Wed, 10 May 2017 17:51:50 +0200
>>
>>> Would probably be good nevertheless to have this as a flag for
>>> program loads, which gets then passed through to the verifier to
>>> explicitly enable strict alignment checks.
>>>
>>> Might certainly aide developing & testing programs on archs with
>>> efficient unaligned access and later actually running them on archs
>>> that don't have it. (And at minimum, it also helps for checking
>>> the test suite against the verifier.)
>>
>> Ok, I can implement this flag.
>>
>> The only question is where to put it?  An unused bit in the program
>> type? :-)
> 
> See for example 7f677633379b ("bpf: introduce BPF_F_ALLOW_OVERRIDE
> flag"). We can add a flags field to the prog loading part of union
> bpf_attr; we would need to make sure to update
> BPF_PROG_LOAD_LAST_FIELD
> to the new member, and to reject unknown flags, of course. Then the
> syscall will handle compat with older binaries just fine by design,
> the main bpf syscall code and CHECK_ATTR() macros will ensure this
> (backward compat, and to a limited degree also forward compat).

Thanks to both of you for guidance on how to do this properly.

Here is what I have right now, once things look good enough I'll split
it all out into a proper patch series:

1) Add alignment tracking.
2) Add "log_level > 1" per-insn state dumping
3) Add BPF_F_STRICT_ALIGNMENT
4) Add bpf_verify_program() to library
5) Add test_align to selftests


Subject: [PATCH] BPF alignment framework

---
 include/linux/bpf_verifier.h |   3 +
 include/uapi/linux/bpf.h |   8 +
 kernel/bpf/syscall.c |   5 +-
 kernel/bpf/verifier.c| 132 --
 tools/include/uapi/linux/bpf.h   |  11 +-
 tools/lib/bpf/bpf.c  |  22 ++
 tools/lib/bpf/bpf.h  |   4 +
 tools/testing/selftests/bpf/Makefile |   4 +-
 tools/testing/selftests/bpf/test_align.c | 417 +++
 9 files changed, 581 insertions(+), 25 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_align.c

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 5efb4db..7c6a519 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -40,6 +40,9 @@ struct bpf_reg_state {
 */
s64 min_value;
u64 max_value;
+   u32 min_align;
+   u32 aux_off;
+   u32 aux_off_align;
 };
 
 enum bpf_stack_slot_type {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 945a1f5..94dfa9d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -132,6 +132,13 @@ enum bpf_attach_type {
  */
 #define BPF_F_ALLOW_OVERRIDE   (1U << 0)
 
+/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
+ * verifier will perform strict alignment checking as if the kernel
+ * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
+ * and NET_IP_ALIGN defined to 2.
+ */
+#define BPF_F_STRICT_ALIGNMENT (1U << 0)
+
 #define BPF_PSEUDO_MAP_FD  1
 
 /* flags for BPF_MAP_UPDATE_ELEM command */
@@ -177,6 +184,7 @@ union bpf_attr {
__u32   log_size;   /* size of user buffer */
__aligned_u64   log_buf;/* user supplied buffer */
__u32   kern_version;   /* checked when 
prog_type=kprobe */
+   __u32   prog_flags;
};
 
struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 13642c7..027053a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -784,7 +784,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum 
bpf_prog_type type)
 EXPORT_SYMBOL_GPL(bpf_prog_get_type);
 
 /* last field in 'union bpf_attr' used by this command */
-#defineBPF_PROG_LOAD_LAST_FIELD kern_version
+#defineBPF_PROG_LOAD_LAST_FIELD prog_flags
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -797,6 +797,9 @@ static int bpf_prog_load(union bpf_attr *attr)
if (CHECK_ATTR(BPF_PROG_LOAD))
return -EINVAL;
 
+   if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
+   return -EINVAL;
+
/* copy eBPF program license from user space */
if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
  sizeof(license) - 1) < 0)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c2ff608..87ecb4d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -241,6 +241,12 @@ static void print_verifier_state(struct bpf_verifier_state 
*state)
if (reg->max_value != BPF_REGISTER_MAX_RANGE)
verbose(",max_value=%llu",
(unsigned long 

Re: [Patch net] ipv4: restore rt->fi for reference counting

2017-05-10 Thread Cong Wang
On Tue, May 9, 2017 at 4:51 PM, Eric Dumazet  wrote:
> On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote:
>
>> This statement is only used to ensure we pass the "dead == fi->fib_nhs"
>> check right below the inner loop, it is fine to keep it without break since
>> fi is not changed in the inner loop.
>>
>
> So the dead++ above wont end up with (dead > fi->fib_nhs) ?

Good point, it could happen, we probably need another boolean to
address this.


Re: [Patch net] ipv4: restore rt->fi for reference counting

2017-05-10 Thread Cong Wang
On Tue, May 9, 2017 at 4:50 PM, Eric Dumazet  wrote:
> On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote:
>
>> All of them take RCU read lock, so, as I explained in the code comment,
>> they all should be fine because of synchronize_net() on unregister path.
>> Do you see anything otherwise?
>
> They might take rcu lock, but compiler is still allowed to read
> fi->fib_dev multiple times, and crashes might happen.
>
> You will need to audit all code and fix it, using proper
> rcu_dereference() or similar code ensuring compiler wont do stupid
> things.
>

Point taken. But without my patch, nh_dev is supposed to be protected
by RCU too, it is freed in a rcu callback and dereferenced like:

struct in_device *in_dev = __in_dev_get_rcu(nh->nh_dev);


Re: bpf pointer alignment validation

2017-05-10 Thread Daniel Borkmann

On 05/10/2017 05:57 PM, David Miller wrote:

From: Daniel Borkmann 
Date: Wed, 10 May 2017 17:51:50 +0200


Would probably be good nevertheless to have this as a flag for
program loads, which gets then passed through to the verifier to
explicitly enable strict alignment checks.

Might certainly aide developing & testing programs on archs with
efficient unaligned access and later actually running them on archs
that don't have it. (And at minimum, it also helps for checking
the test suite against the verifier.)


Ok, I can implement this flag.

The only question is where to put it?  An unused bit in the program
type? :-)


See for example 7f677633379b ("bpf: introduce BPF_F_ALLOW_OVERRIDE
flag"). We can add a flags field to the prog loading part of union
bpf_attr; we would need to make sure to update BPF_PROG_LOAD_LAST_FIELD
to the new member, and to reject unknown flags, of course. Then the
syscall will handle compat with older binaries just fine by design,
the main bpf syscall code and CHECK_ATTR() macros will ensure this
(backward compat, and to a limited degree also forward compat).


Re: bpf pointer alignment validation

2017-05-10 Thread Alexei Starovoitov

On 5/10/17 8:57 AM, David Miller wrote:

From: Daniel Borkmann 
Date: Wed, 10 May 2017 17:51:50 +0200


Would probably be good nevertheless to have this as a flag for
program loads, which gets then passed through to the verifier to
explicitly enable strict alignment checks.

Might certainly aide developing & testing programs on archs with
efficient unaligned access and later actually running them on archs
that don't have it. (And at minimum, it also helps for checking
the test suite against the verifier.)


Ok, I can implement this flag.

The only question is where to put it?  An unused bit in the program
type? :-)


just add '__u32 prog_flags' to bpf_attr PROG_LOAD anon union.
There was no need for such flags in the past, hence no flags field
existed. Now it's time.



Re: [net-mellanox] question about potential null pointer dereference

2017-05-10 Thread Gustavo A. R. Silva


Quoting Ido Schimmel :


On Wed, May 10, 2017 at 10:36:59AM -0500, Gustavo A. R. Silva wrote:


Hello everybody,

While looking into Coverity ID 1350941 I ran into the following piece of
code at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:1483:

1483static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
1484char *sfn_pl, int rec_index,
1485bool adding)
1486{
1487struct mlxsw_sp_port *mlxsw_sp_port;
1488char mac[ETH_ALEN];
1489u8 local_port;
1490u16 vid, fid;
1491bool do_notification = true;
1492int err;
1493
1494mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, ,
_port);
1495mlxsw_sp_port = mlxsw_sp->ports[local_port];
1496if (!mlxsw_sp_port) {
1497dev_err_ratelimited(mlxsw_sp->bus_info->dev, "Incorrect
local port in FDB notification\n");
1498goto just_remove;
1499}
1500
1501if (mlxsw_sp_fid_is_vfid(fid)) {
1502struct mlxsw_sp_port *mlxsw_sp_vport;
1503
1504mlxsw_sp_vport =
mlxsw_sp_port_vport_find_by_fid(mlxsw_sp_port,
1505 fid);
1506if (!mlxsw_sp_vport) {
1507netdev_err(mlxsw_sp_port->dev, "Failed to find a
matching vPort following FDB notification\n");
1508goto just_remove;
1509}
1510vid = 0;
1511/* Override the physical port with the vPort. */
1512mlxsw_sp_port = mlxsw_sp_vport;
1513} else {
1514vid = fid;
1515}
1516
1517do_fdb_op:
1518err = mlxsw_sp_port_fdb_uc_op(mlxsw_sp, local_port, mac, fid,
1519  adding, true);
1520if (err) {
1521if (net_ratelimit())
1522netdev_err(mlxsw_sp_port->dev, "Failed to set
FDB entry\n");
1523return;
1524}
1525
1526if (!do_notification)
1527return;
1528mlxsw_sp_fdb_call_notifiers(mlxsw_sp_port->learning_sync,
1529adding, mac, vid,  
mlxsw_sp_port->dev);

1530return;
1531
1532just_remove:
1533adding = false;
1534do_notification = false;
1535goto do_fdb_op;
1536}


The issue here is that line 1496 implies that mlxsw_sp_port might be NULL.
If this is the case, the execution path jumps to line 1532 and then to line
1517. All this could end up dereferencing a NULL pointer at line 1522.

Is there any chance for mlxsw_sp_port to be NULL at line 1496 and, at the
same time, a NULL pointer dereference occurs at line 1522?

I'm trying to figure out if this is a false positive or something that
actually needs to be fixed.


In theory, yes, it can happen, but it didn't happen yet. I recently
patched that and now it's in Jiri's queue. I guess he'll send it
tomorrow.

https://github.com/jpirko/linux_mlxsw/commit/4160fb9ad6eeacba7736cfdbf7f52248432c2e89



Great, glad to see it is fixed now.


Thanks for looking into this!



Sure thing, thanks for clarifying. :)
--
Gustavo A. R. Silva








Re: bpf pointer alignment validation

2017-05-10 Thread David Miller
From: Daniel Borkmann 
Date: Wed, 10 May 2017 17:51:50 +0200

> Would probably be good nevertheless to have this as a flag for
> program loads, which gets then passed through to the verifier to
> explicitly enable strict alignment checks.
> 
> Might certainly aide developing & testing programs on archs with
> efficient unaligned access and later actually running them on archs
> that don't have it. (And at minimum, it also helps for checking
> the test suite against the verifier.)

Ok, I can implement this flag.

The only question is where to put it?  An unused bit in the program
type? :-)


Re: [net-mellanox] question about potential null pointer dereference

2017-05-10 Thread Ido Schimmel
On Wed, May 10, 2017 at 10:36:59AM -0500, Gustavo A. R. Silva wrote:
> 
> Hello everybody,
> 
> While looking into Coverity ID 1350941 I ran into the following piece of
> code at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:1483:
> 
> 1483static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
> 1484char *sfn_pl, int rec_index,
> 1485bool adding)
> 1486{
> 1487struct mlxsw_sp_port *mlxsw_sp_port;
> 1488char mac[ETH_ALEN];
> 1489u8 local_port;
> 1490u16 vid, fid;
> 1491bool do_notification = true;
> 1492int err;
> 1493
> 1494mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, ,
> _port);
> 1495mlxsw_sp_port = mlxsw_sp->ports[local_port];
> 1496if (!mlxsw_sp_port) {
> 1497dev_err_ratelimited(mlxsw_sp->bus_info->dev, "Incorrect
> local port in FDB notification\n");
> 1498goto just_remove;
> 1499}
> 1500
> 1501if (mlxsw_sp_fid_is_vfid(fid)) {
> 1502struct mlxsw_sp_port *mlxsw_sp_vport;
> 1503
> 1504mlxsw_sp_vport =
> mlxsw_sp_port_vport_find_by_fid(mlxsw_sp_port,
> 1505 fid);
> 1506if (!mlxsw_sp_vport) {
> 1507netdev_err(mlxsw_sp_port->dev, "Failed to find a
> matching vPort following FDB notification\n");
> 1508goto just_remove;
> 1509}
> 1510vid = 0;
> 1511/* Override the physical port with the vPort. */
> 1512mlxsw_sp_port = mlxsw_sp_vport;
> 1513} else {
> 1514vid = fid;
> 1515}
> 1516
> 1517do_fdb_op:
> 1518err = mlxsw_sp_port_fdb_uc_op(mlxsw_sp, local_port, mac, fid,
> 1519  adding, true);
> 1520if (err) {
> 1521if (net_ratelimit())
> 1522netdev_err(mlxsw_sp_port->dev, "Failed to set
> FDB entry\n");
> 1523return;
> 1524}
> 1525
> 1526if (!do_notification)
> 1527return;
> 1528mlxsw_sp_fdb_call_notifiers(mlxsw_sp_port->learning_sync,
> 1529adding, mac, vid, mlxsw_sp_port->dev);
> 1530return;
> 1531
> 1532just_remove:
> 1533adding = false;
> 1534do_notification = false;
> 1535goto do_fdb_op;
> 1536}
> 
> 
> The issue here is that line 1496 implies that mlxsw_sp_port might be NULL.
> If this is the case, the execution path jumps to line 1532 and then to line
> 1517. All this could end up dereferencing a NULL pointer at line 1522.
> 
> Is there any chance for mlxsw_sp_port to be NULL at line 1496 and, at the
> same time, a NULL pointer dereference occurs at line 1522?
> 
> I'm trying to figure out if this is a false positive or something that
> actually needs to be fixed.

In theory, yes, it can happen, but it didn't happen yet. I recently
patched that and now it's in Jiri's queue. I guess he'll send it
tomorrow.

https://github.com/jpirko/linux_mlxsw/commit/4160fb9ad6eeacba7736cfdbf7f52248432c2e89

Thanks for looking into this!

> 
> I'd really appreciate any comment on this.
> Thank you!
> --
> Gustavo A. R. Silva


Re: bpf pointer alignment validation

2017-05-10 Thread Daniel Borkmann

On 05/10/2017 05:33 PM, David Miller wrote:

From: Alexei Starovoitov 
Date: Tue, 9 May 2017 22:57:37 -0700


On Tue, May 09, 2017 at 02:32:34PM -0400, David Miller wrote:


+static u32 calc_align(u32 imm)
+{
+   u32 align = 1;
+
+   if (!imm)
+   return 1U << 31;
+
+   while (!(imm & 1)) {
+   imm >>= 1;
+   align <<= 1;
+   }
+   return align;
+}


same question as in previous reply.
Why not to use something like:
static u32 calc_align(u32 n)
{
 if (!n)
 return 1U << 31;
 return n - ((n - 1) & n);
}


Ok.

I did a cursory search and we don't have a generic kernel helper for
this kind of calculation.  I was actually quite surprised, as we
have one for everything else :-)


this needs to be tweaked like
if (log_level > 1)
 verbose("%d:", insn_idx);
else
verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);

otherwise it prints prev_insn_idx which is meaningful
only with processing exit and search pruning.


Agreed.


Nice to see all these comments.
I wonder how we can make them automatic in the verifier.
Like if verifier can somehow hint the user in such human friendly way
about what is happening with the program.
Today that's the #1 problem. Most folks complaining
that verifier error messages are too hard to understand.


We can put whatever text strings we want into that verifier buffer.
It is as flexible as netlink extended acks.


would it make sense to bpf_prog_test_run() it here as well?


We could.


On x86 not much value, but on sparc we can somehow look for traps?
Is there some counter of unaligned traps that we can read and report
as error to user space after prog_test_run ?


Unfortunately, no.


These tests we cannot really run, since they don't do any load/store.
I mean more for some future tests. Or some sort of debug warn
that there were traps while bpf prog was executed, so the user
is alarmed and reports to us, since that would be a bug in verifier
align logic?


One thing I could definitely do is add logic to the unaligned trap
handler to print something special in the logs if we find that we
are executing BPF code.

The basic structure of the log message can be codified in a generic
bpf_xxx() helper, which architectures call with the PC and unaligned
address as arguments.

I was thinking more last night about strict alignment mode for the
verifier, so that bugs can be spotted on all architectures.  But
it stupidly will not work.

The problem is that x86 and powerpc define NET_IP_ALIGN as 0, so all
bets are off.

One thing we could do in "strict alignment" verifier mode is pretend
that NET_IP_ALIGN is 2 in the alignment checks.


Would probably be good nevertheless to have this as a flag for
program loads, which gets then passed through to the verifier to
explicitly enable strict alignment checks.

Might certainly aide developing & testing programs on archs with
efficient unaligned access and later actually running them on archs
that don't have it. (And at minimum, it also helps for checking
the test suite against the verifier.)


[net-mellanox] question about potential null pointer dereference

2017-05-10 Thread Gustavo A. R. Silva


Hello everybody,

While looking into Coverity ID 1350941 I ran into the following piece  
of code at  
drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:1483:


1483static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
1484char *sfn_pl, int rec_index,
1485bool adding)
1486{
1487struct mlxsw_sp_port *mlxsw_sp_port;
1488char mac[ETH_ALEN];
1489u8 local_port;
1490u16 vid, fid;
1491bool do_notification = true;
1492int err;
1493
1494mlxsw_reg_sfn_mac_unpack(sfn_pl, rec_index, mac, ,  
_port);

1495mlxsw_sp_port = mlxsw_sp->ports[local_port];
1496if (!mlxsw_sp_port) {
1497dev_err_ratelimited(mlxsw_sp->bus_info->dev,  
"Incorrect local port in FDB notification\n");

1498goto just_remove;
1499}
1500
1501if (mlxsw_sp_fid_is_vfid(fid)) {
1502struct mlxsw_sp_port *mlxsw_sp_vport;
1503
1504mlxsw_sp_vport =  
mlxsw_sp_port_vport_find_by_fid(mlxsw_sp_port,

1505 fid);
1506if (!mlxsw_sp_vport) {
1507netdev_err(mlxsw_sp_port->dev, "Failed to  
find a matching vPort following FDB notification\n");

1508goto just_remove;
1509}
1510vid = 0;
1511/* Override the physical port with the vPort. */
1512mlxsw_sp_port = mlxsw_sp_vport;
1513} else {
1514vid = fid;
1515}
1516
1517do_fdb_op:
1518err = mlxsw_sp_port_fdb_uc_op(mlxsw_sp, local_port, mac, fid,
1519  adding, true);
1520if (err) {
1521if (net_ratelimit())
1522netdev_err(mlxsw_sp_port->dev, "Failed to  
set FDB entry\n");

1523return;
1524}
1525
1526if (!do_notification)
1527return;
1528mlxsw_sp_fdb_call_notifiers(mlxsw_sp_port->learning_sync,
1529adding, mac, vid, mlxsw_sp_port->dev);
1530return;
1531
1532just_remove:
1533adding = false;
1534do_notification = false;
1535goto do_fdb_op;
1536}


The issue here is that line 1496 implies that mlxsw_sp_port might be  
NULL. If this is the case, the execution path jumps to line 1532 and  
then to line 1517. All this could end up dereferencing a NULL pointer  
at line 1522.


Is there any chance for mlxsw_sp_port to be NULL at line 1496 and, at  
the same time, a NULL pointer dereference occurs at line 1522?


I'm trying to figure out if this is a false positive or something that  
actually needs to be fixed.


I'd really appreciate any comment on this.
Thank you!
--
Gustavo A. R. Silva











Re: bpf pointer alignment validation

2017-05-10 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 9 May 2017 22:57:37 -0700

> On Tue, May 09, 2017 at 02:32:34PM -0400, David Miller wrote:
>>  
>> +static u32 calc_align(u32 imm)
>> +{
>> +u32 align = 1;
>> +
>> +if (!imm)
>> +return 1U << 31;
>> +
>> +while (!(imm & 1)) {
>> +imm >>= 1;
>> +align <<= 1;
>> +}
>> +return align;
>> +}
> 
> same question as in previous reply.
> Why not to use something like:
> static u32 calc_align(u32 n)
> {
> if (!n)
> return 1U << 31;
> return n - ((n - 1) & n);
> }

Ok.

I did a cursory search and we don't have a generic kernel helper for
this kind of calculation.  I was actually quite surprised, as we
have one for everything else :-)

> this needs to be tweaked like
> if (log_level > 1)
> verbose("%d:", insn_idx);
> else
>   verbose("\nfrom %d to %d:", prev_insn_idx, insn_idx);
> 
> otherwise it prints prev_insn_idx which is meaningful
> only with processing exit and search pruning.

Agreed.

> Nice to see all these comments.
> I wonder how we can make them automatic in the verifier.
> Like if verifier can somehow hint the user in such human friendly way
> about what is happening with the program.
> Today that's the #1 problem. Most folks complaining
> that verifier error messages are too hard to understand.

We can put whatever text strings we want into that verifier buffer.
It is as flexible as netlink extended acks.

> would it make sense to bpf_prog_test_run() it here as well?

We could.

> On x86 not much value, but on sparc we can somehow look for traps?
> Is there some counter of unaligned traps that we can read and report
> as error to user space after prog_test_run ?

Unfortunately, no.

> These tests we cannot really run, since they don't do any load/store.
> I mean more for some future tests. Or some sort of debug warn
> that there were traps while bpf prog was executed, so the user
> is alarmed and reports to us, since that would be a bug in verifier
> align logic?

One thing I could definitely do is add logic to the unaligned trap
handler to print something special in the logs if we find that we
are executing BPF code.

The basic structure of the log message can be codified in a generic
bpf_xxx() helper, which architectures call with the PC and unaligned
address as arguments.

I was thinking more last night about strict alignment mode for the
verifier, so that bugs can be spotted on all architectures.  But
it stupidly will not work.

The problem is that x86 and powerpc define NET_IP_ALIGN as 0, so all
bets are off.

One thing we could do in "strict alignment" verifier mode is pretend
that NET_IP_ALIGN is 2 in the alignment checks.


[PATCH] mdio: mux: Correct mdio_mux_init error path issues

2017-05-10 Thread Jon Mason
There is a potential unnecessary refcount decriment on error path of
put_device(>mii_bus->dev), as it is possible to avoid the
of_mdio_find_bus() call if mux_bus is specified by the calling function.

The same put_device() is not called in the error path if the
devm_kzalloc of pb fails.  This caused the variable used in the
put_device() to be changed, as the pb pointer was obviously not set up.

There is an unnecessary of_node_get() on child_bus_node if the
of_mdiobus_register() is successful, as the
for_each_available_child_of_node() automatically increments this.
Thus the refcount on this node will always be +1 more than it should be.

There is no of_node_put() on child_bus_node if the of_mdiobus_register()
call fails.

Finally, it is lacking devm_kfree() of pb in the error path.  While this
might not be technically necessary, it was present in other parts of the
function.  So, I am adding it where necessary to make it uniform.

Signed-off-by: Jon Mason 
Fixes: f20e6657a875 ("mdio: mux: Enhanced MDIO mux framework for integrated 
multiplexers")
Fixes: 0ca2997d1452 ("netdev/of/phy: Add MDIO bus multiplexer support.")
---
 drivers/net/phy/mdio-mux.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c
index 963838d4fac1..6943c5ece44a 100644
--- a/drivers/net/phy/mdio-mux.c
+++ b/drivers/net/phy/mdio-mux.c
@@ -122,10 +122,9 @@ int mdio_mux_init(struct device *dev,
pb = devm_kzalloc(dev, sizeof(*pb), GFP_KERNEL);
if (pb == NULL) {
ret_val = -ENOMEM;
-   goto err_parent_bus;
+   goto err_pb_kz;
}
 
-
pb->switch_data = data;
pb->switch_fn = switch_fn;
pb->current_child = -1;
@@ -154,6 +153,7 @@ int mdio_mux_init(struct device *dev,
cb->mii_bus = mdiobus_alloc();
if (!cb->mii_bus) {
ret_val = -ENOMEM;
+   devm_kfree(dev, cb);
of_node_put(child_bus_node);
break;
}
@@ -169,8 +169,8 @@ int mdio_mux_init(struct device *dev,
if (r) {
mdiobus_free(cb->mii_bus);
devm_kfree(dev, cb);
+   of_node_put(child_bus_node);
} else {
-   of_node_get(child_bus_node);
cb->next = pb->children;
pb->children = cb;
}
@@ -181,9 +181,11 @@ int mdio_mux_init(struct device *dev,
return 0;
}
 
+   devm_kfree(dev, pb);
+err_pb_kz:
/* balance the reference of_mdio_find_bus() took */
-   put_device(>mii_bus->dev);
-
+   if (!mux_bus)
+   put_device(_bus->dev);
 err_parent_bus:
of_node_put(parent_bus_node);
return ret_val;
-- 
2.7.4



Re: [PATCH net-next V4 00/10] vhost_net batch dequeuing

2017-05-10 Thread Michael S. Tsirkin
On Wed, May 10, 2017 at 11:36:12AM +0800, Jason Wang wrote:
> This series tries to implement rx batching for vhost-net. This is done
> by batching the dequeuing from skb_array which was exported by
> underlayer socket and pass the sbk back through msg_control to finish
> userspace copying. This is also the requirement for more batching
> implemention on rx path.
> 
> Tests shows at most 8.8% improvment bon rx pps on top of batch zeroing.
> 
> Please review.
> 
> Thanks
> 
> Changes from V3:
> - add batch zeroing patch to fix the build warnings

FYI that one's going upstream now.

> Changes from V2:
> - rebase to net-next HEAD
> - use unconsume helpers to put skb back on releasing
> - introduce and use vhost_net internal buffer helpers
> - renew performance numbers on top of batch zeroing
> 
> Changes from V1:
> - switch to use for() in __ptr_ring_consume_batched()
> - rename peek_head_len_batched() to fetch_skbs()
> - use skb_array_consume_batched() instead of
>   skb_array_consume_batched_bh() since no consumer run in bh
> - drop the lockless peeking patch since skb_array could be resized, so
>   it's not safe to call lockless one
> 
> Jason Wang (8):
>   skb_array: introduce skb_array_unconsume
>   ptr_ring: introduce batch dequeuing
>   skb_array: introduce batch dequeuing
>   tun: export skb_array
>   tap: export skb_array
>   tun: support receiving skb through msg_control
>   tap: support receiving skb from msg_control
>   vhost_net: try batch dequing from skb array
> 
> Michael S. Tsirkin (2):
>   ptr_ring: batch ring zeroing
>   ptr_ring: add ptr_ring_unconsume
> 
>  drivers/net/tap.c |  25 ++-
>  drivers/net/tun.c |  31 ++--
>  drivers/vhost/net.c   | 117 +++--
>  include/linux/if_tap.h|   5 ++
>  include/linux/if_tun.h|   5 ++
>  include/linux/ptr_ring.h  | 183 
> +++---
>  include/linux/skb_array.h |  31 
>  7 files changed, 370 insertions(+), 27 deletions(-)
> 
> -- 
> 2.7.4


Re: [PATCH net-next V4 10/10] vhost_net: try batch dequing from skb array

2017-05-10 Thread Michael S. Tsirkin
On Wed, May 10, 2017 at 11:36:22AM +0800, Jason Wang wrote:
> We used to dequeue one skb during recvmsg() from skb_array, this could
> be inefficient because of the bad cache utilization and spinlock
> touching for each packet. This patch tries to batch them by calling
> batch dequeuing helpers explicitly on the exported skb array and pass
> the skb back through msg_control for underlayer socket to finish the
> userspace copying.
> 
> Batch dequeuing is also the requirement for more batching improvement
> on rx.
> 
> Tests were done by pktgen on tap with XDP1 in guest on top of batch
> zeroing:
> 
> rx batch | pps
> 
> 2562.41Mpps (+6.16%)
> 1282.48Mpps (+8.80%)
> 64 2.38Mpps (+3.96%) <- Default
> 16 2.31Mpps (+1.76%)
> 4  2.31Mpps (+1.76%)
> 1  2.30Mpps (+1.32%)
> 0  2.27Mpps (+7.48%)
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/net.c | 117 
> +---
>  1 file changed, 111 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9b51989..fbaecf3 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -28,6 +28,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  
> @@ -85,6 +87,13 @@ struct vhost_net_ubuf_ref {
>   struct vhost_virtqueue *vq;
>  };
>  
> +#define VHOST_RX_BATCH 64
> +struct vhost_net_buf {
> + struct sk_buff *queue[VHOST_RX_BATCH];
> + int tail;
> + int head;
> +};
> +

Do you strictly need to put this inline? This structure is quite big
already. Do you see a measureabe difference if you make it

struct sk_buff **queue;
int tail;
int head;

?

Will also make it easier to play with the size in the future
should someone want to see how does it work e.g. for different
ring sizes.

>  struct vhost_net_virtqueue {
>   struct vhost_virtqueue vq;
>   size_t vhost_hlen;
> @@ -99,6 +108,8 @@ struct vhost_net_virtqueue {
>   /* Reference counting for outstanding ubufs.
>* Protected by vq mutex. Writers must also take device mutex. */
>   struct vhost_net_ubuf_ref *ubufs;
> + struct skb_array *rx_array;
> + struct vhost_net_buf rxq;
>  };
>  
>  struct vhost_net {
> @@ -117,6 +128,71 @@ struct vhost_net {
>  
>  static unsigned vhost_net_zcopy_mask __read_mostly;
>  
> +static void *vhost_net_buf_get_ptr(struct vhost_net_buf *rxq)
> +{
> + if (rxq->tail != rxq->head)
> + return rxq->queue[rxq->head];
> + else
> + return NULL;
> +}
> +
> +static int vhost_net_buf_get_size(struct vhost_net_buf *rxq)
> +{
> + return rxq->tail - rxq->head;
> +}
> +
> +static int vhost_net_buf_is_empty(struct vhost_net_buf *rxq)
> +{
> + return rxq->tail == rxq->head;
> +}
> +
> +static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
> +{
> + void *ret = vhost_net_buf_get_ptr(rxq);
> + ++rxq->head;
> + return ret;
> +}
> +
> +static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
> +{
> + struct vhost_net_buf *rxq = >rxq;
> +
> + rxq->head = 0;
> + rxq->tail = skb_array_consume_batched(nvq->rx_array, rxq->queue,
> +   VHOST_RX_BATCH);
> + return rxq->tail;
> +}
> +
> +static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
> +{
> + struct vhost_net_buf *rxq = >rxq;
> +
> + if (nvq->rx_array && !vhost_net_buf_is_empty(rxq)) {
> + skb_array_unconsume(nvq->rx_array, rxq->queue + rxq->head,
> + vhost_net_buf_get_size(rxq));
> + rxq->head = rxq->tail = 0;
> + }
> +}
> +
> +static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq)
> +{
> + struct vhost_net_buf *rxq = >rxq;
> +
> + if (!vhost_net_buf_is_empty(rxq))
> + goto out;
> +
> + if (!vhost_net_buf_produce(nvq))
> + return 0;
> +
> +out:
> + return __skb_array_len_with_tag(vhost_net_buf_get_ptr(rxq));
> +}
> +
> +static void vhost_net_buf_init(struct vhost_net_buf *rxq)
> +{
> + rxq->head = rxq->tail = 0;
> +}
> +
>  static void vhost_net_enable_zcopy(int vq)
>  {
>   vhost_net_zcopy_mask |= 0x1 << vq;
> @@ -201,6 +277,7 @@ static void vhost_net_vq_reset(struct vhost_net *n)
>   n->vqs[i].ubufs = NULL;
>   n->vqs[i].vhost_hlen = 0;
>   n->vqs[i].sock_hlen = 0;
> + vhost_net_buf_init(>vqs[i].rxq);
>   }
>  
>  }
> @@ -503,15 +580,14 @@ static void handle_tx(struct vhost_net *net)
>   mutex_unlock(>mutex);
>  }
>  
> -static int peek_head_len(struct sock *sk)
> +static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
>  {
> - struct socket *sock = sk->sk_socket;
>   struct sk_buff *head;
>   int len = 0;
>   unsigned long flags;
>  
> - if (sock->ops->peek_len)
> - return sock->ops->peek_len(sock);
> + if 

[PULL] virtio: fixes, cleanups, performance

2017-05-10 Thread Michael S. Tsirkin
The following changes since commit a351e9b9fc24e982ec2f0e76379a49826036da12:

  Linux 4.11 (2017-04-30 19:47:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to c8b0d7290657996a29f318b948d2397d30e70c36:

  s390/virtio: change maintainership (2017-05-09 16:43:25 +0300)


virtio: fixes, cleanups, performance

A bunch of changes to virtio, most affecting virtio net.
ptr_ring batched zeroing - first of batching enhancements
that seems ready.

Signed-off-by: Michael S. Tsirkin 


Christian Borntraeger (1):
  s390/virtio: change maintainership

Colin Ian King (1):
  tools/virtio: fix spelling mistake: "wakeus" -> "wakeups"

Cornelia Huck (1):
  virtio: virtio_driver doc

Dan Carpenter (2):
  ringtest: fix an assert statement
  virtio_net: tidy a couple debug statements

Michael S. Tsirkin (11):
  virtio: wrap find_vqs
  virtio: add context flag to find vqs
  virtio: allow extra context per descriptor
  virtio_net: allow specifying context for rx
  virtio_net: rework mergeable buffer handling
  virtio_net: reduce alignment for buffers
  virtio_net: fix support for small rings
  virtio_net: don't reset twice on XDP on/off
  ptr_ring: batch ring zeroing
  ringtest: support test specific parameters
  ptr_ring: support testing different batching sizes

Sekhar Nori (1):
  tools/virtio: fix build breakage

 MAINTAINERS|   2 +-
 drivers/block/virtio_blk.c |   3 +-
 drivers/char/virtio_console.c  |   6 +-
 drivers/crypto/virtio/virtio_crypto_core.c |   3 +-
 drivers/gpu/drm/virtio/virtgpu_kms.c   |   3 +-
 drivers/misc/mic/vop/vop_main.c|   9 +-
 drivers/net/caif/caif_virtio.c |   3 +-
 drivers/net/virtio_net.c   | 147 -
 drivers/remoteproc/remoteproc_virtio.c |  10 +-
 drivers/rpmsg/virtio_rpmsg_bus.c   |   2 +-
 drivers/s390/virtio/kvm_virtio.c   |   8 +-
 drivers/s390/virtio/virtio_ccw.c   |   7 +-
 drivers/scsi/virtio_scsi.c |   3 +-
 drivers/virtio/virtio_balloon.c|   3 +-
 drivers/virtio/virtio_input.c  |   3 +-
 drivers/virtio/virtio_mmio.c   |   8 +-
 drivers/virtio/virtio_pci_common.c |  17 ++--
 drivers/virtio/virtio_pci_common.h |   4 +-
 drivers/virtio/virtio_pci_legacy.c |   4 +-
 drivers/virtio/virtio_pci_modern.c |  12 ++-
 drivers/virtio/virtio_ring.c   |  77 ---
 include/linux/ptr_ring.h   |  63 +++--
 include/linux/virtio.h |  13 +++
 include/linux/virtio_config.h  |  25 -
 include/linux/virtio_ring.h|   3 +
 net/vmw_vsock/virtio_transport.c   |   6 +-
 tools/virtio/linux/virtio.h|   1 +
 tools/virtio/ringtest/main.c   |  15 ++-
 tools/virtio/ringtest/main.h   |   2 +
 tools/virtio/ringtest/ptr_ring.c   |   3 +
 tools/virtio/virtio_test.c |   4 +-
 tools/virtio/vringh_test.c |   7 +-
 32 files changed, 330 insertions(+), 146 deletions(-)


Re: [PATCH 1/3] ptr_ring: batch ring zeroing

2017-05-10 Thread Michael S. Tsirkin
On Wed, May 10, 2017 at 11:18:13AM +0200, Jesper Dangaard Brouer wrote:
> On Tue, 9 May 2017 16:33:14 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Sat, Apr 08, 2017 at 02:14:08PM +0200, Jesper Dangaard Brouer wrote:
> > > On Fri, 7 Apr 2017 08:49:57 +0300
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > A known weakness in ptr_ring design is that it does not handle well the
> > > > situation when ring is almost full: as entries are consumed they are
> > > > immediately used again by the producer, so consumer and producer are
> > > > writing to a shared cache line.
> > > > 
> > > > To fix this, add batching to consume calls: as entries are
> > > > consumed do not write NULL into the ring until we get
> > > > a multiple (in current implementation 2x) of cache lines
> > > > away from the producer. At that point, write them all out.
> > > > 
> > > > We do the write out in the reverse order to keep
> > > > producer from sharing cache with consumer for as long
> > > > as possible.
> > > > 
> > > > Writeout also triggers when ring wraps around - there's
> > > > no special reason to do this but it helps keep the code
> > > > a bit simpler.
> > > > 
> > > > What should we do if getting away from producer by 2 cache lines
> > > > would mean we are keeping the ring moe than half empty?
> > > > Maybe we should reduce the batching in this case,
> > > > current patch simply reduces the batching.
> > > > 
> > > > Notes:
> > > > - it is no longer true that a call to consume guarantees
> > > >   that the following call to produce will succeed.
> > > >   No users seem to assume that.
> > > > - batching can also in theory reduce the signalling rate:
> > > >   users that would previously send interrups to the producer
> > > >   to wake it up after consuming each entry would now only
> > > >   need to do this once in a batch.
> > > >   Doing this would be easy by returning a flag to the caller.
> > > >   No users seem to do signalling on consume yet so this was not
> > > >   implemented yet.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > > 
> > > > Jason, I am curious whether the following gives you some of
> > > > the performance boost that you see with vhost batching
> > > > patches. Is vhost batching on top still helpful?
> > > > 
> > > >  include/linux/ptr_ring.h | 63 
> > > > +---
> > > >  1 file changed, 54 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > > index 6c70444..6b2e0dd 100644
> > > > --- a/include/linux/ptr_ring.h
> > > > +++ b/include/linux/ptr_ring.h
> > > > @@ -34,11 +34,13 @@
> > > >  struct ptr_ring {
> > > > int producer cacheline_aligned_in_smp;
> > > > spinlock_t producer_lock;
> > > > -   int consumer cacheline_aligned_in_smp;
> > > > +   int consumer_head cacheline_aligned_in_smp; /* next valid 
> > > > entry */
> > > > +   int consumer_tail; /* next entry to invalidate */
> > > > spinlock_t consumer_lock;
> > > > /* Shared consumer/producer data */
> > > > /* Read-only by both the producer and the consumer */
> > > > int size cacheline_aligned_in_smp; /* max entries in queue 
> > > > */
> > > > +   int batch; /* number of entries to consume in a batch */
> > > > void **queue;
> > > >  };
> > > >  
> > > > @@ -170,7 +172,7 @@ static inline int ptr_ring_produce_bh(struct 
> > > > ptr_ring *r, void *ptr)
> > > >  static inline void *__ptr_ring_peek(struct ptr_ring *r)
> > > >  {
> > > > if (likely(r->size))
> > > > -   return r->queue[r->consumer];
> > > > +   return r->queue[r->consumer_head];
> > > > return NULL;
> > > >  }
> > > >  
> > > > @@ -231,9 +233,38 @@ static inline bool ptr_ring_empty_bh(struct 
> > > > ptr_ring *r)
> > > >  /* Must only be called after __ptr_ring_peek returned !NULL */
> > > >  static inline void __ptr_ring_discard_one(struct ptr_ring *r)
> > > >  {
> > > > -   r->queue[r->consumer++] = NULL;
> > > > -   if (unlikely(r->consumer >= r->size))
> > > > -   r->consumer = 0;
> > > > +   /* Fundamentally, what we want to do is update consumer
> > > > +* index and zero out the entry so producer can reuse it.
> > > > +* Doing it naively at each consume would be as simple as:
> > > > +*   r->queue[r->consumer++] = NULL;
> > > > +*   if (unlikely(r->consumer >= r->size))
> > > > +*   r->consumer = 0;
> > > > +* but that is suboptimal when the ring is full as producer is 
> > > > writing
> > > > +* out new entries in the same cache line.  Defer these updates 
> > > > until a
> > > > +* batch of entries has been consumed.
> > > > +*/
> > > > +   int head = r->consumer_head++;
> > > > +
> > > > +   /* Once we have processed enough entries 

Re: [PATCH 1/2] net: Added mtu parameter to dev_forward_skb calls

2017-05-10 Thread kbuild test robot
Hi Fredrik,

[auto build test ERROR on net/master]
[also build test ERROR on v4.11 next-20170510]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Fredrik-Markstrom/net-Added-mtu-parameter-to-dev_forward_skb-calls/20170509-231142
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net//bridge/br_forward.c: In function '__br_forward':
>> net//bridge/br_forward.c:99:46: error: 'struct sk_buff' has no member named 
>> 'dev_mtu'
   if (!is_skb_forwardable(skb->dev, skb, skb->dev_mtu)) {
 ^~

vim +99 net//bridge/br_forward.c

93  }
94  br_hook = NF_BR_FORWARD;
95  skb_forward_csum(skb);
96  net = dev_net(indev);
97  } else {
98  if (unlikely(netpoll_tx_running(to->br->dev))) {
  > 99  if (!is_skb_forwardable(skb->dev, skb, 
skb->dev_mtu)) {
   100  kfree_skb(skb);
   101  } else {
   102  skb_push(skb, ETH_HLEN);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


RE: bpf pointer alignment validation

2017-05-10 Thread David Laight
From: Alexei Starovoitov
> Sent: 10 May 2017 06:58
> > +static u32 calc_align(u32 imm)
> > +{
> > +   u32 align = 1;
> > +
> > +   if (!imm)
> > +   return 1U << 31;
> > +
> > +   while (!(imm & 1)) {
> > +   imm >>= 1;
> > +   align <<= 1;
> > +   }
> > +   return align;
> > +}
> 
> same question as in previous reply.
> Why not to use something like:
> static u32 calc_align(u32 n)
> {
> if (!n)
> return 1U << 31;
> return n - ((n - 1) & n);
> }

That function needs a comment saying what it returns.
Think I'd write it as:
return n & ~(n & (n - 1));
(even though that might be one more instruction)

David



Re: iproute2 ss outputs duplicate tcp sockets info on kernel 3.10.105

2017-05-10 Thread Phil Sutter
Hi,

Cc'ing Cyrill who wrote the code in question. Maybe he has an idea
what's going wrong here.

Cheers, Phil

On Mon, May 08, 2017 at 06:56:04PM -0700, Li Er wrote:
> i'm using v4.11.0 release of iproute2 and kernel 3.10.105, simply
> running
> 
> $ ss
> Netid  State  Recv-Q Send-Q Local Address:Port 
> Peer Address:Port
> tcpCLOSE-WAIT 4340  10.0.0.1:47931
> 65.49.18.136:https
> tcpCLOSE-WAIT 4320  10.0.0.1:47932
> 65.49.18.136:https
> tcpCLOSE-WAIT 4340  10.0.0.1:47931
> 65.49.18.136:https
> tcpCLOSE-WAIT 4320  10.0.0.1:47932
> 65.49.18.136:https
> 
> as you can see, there's one duplicate entry of each  tcp  socket,
> however,  if  i  explicitly  specify  tcp socket by adding the -t
> switch,
> 
> $ ss -t
> State  Recv-Q Send-Q Local Address:Port Peer 
> Address:Port
> CLOSE-WAIT 4340  10.0.0.1:47931
> 65.49.18.136:https
> CLOSE-WAIT 4320  10.0.0.1:47932
> 65.49.18.136:https
> 
> the duplication is gone.
> 
> this problem also occurs on  git  master,  but  not  on  iproute2
> v4.3.0,  so  i  did  a  git bisect and found out the commit which
> caused this is 9f66764e308e9c645b3fb2d1cfbb7fb207eb5de1,  and  by
> revert this commit on git master, i.e. removing
> 
> err = rtnl_dump_done(rth, h);
> if (err < 0)
> return -1;
> 
> these 3 lines of code of lib/libnetlink.c, the problem is gone.
> 
> since  i'm not familiar with the source code, i doubt this is the
> right way to solve the problem, what's your suggestions? thanks.


Re: [PATCH] ip: mpls: fix printing of mpls labels

2017-05-10 Thread Simon Horman
On Mon, May 08, 2017 at 11:04:13PM -0700, David Ahern wrote:
> If the kernel returns more labels than iproute2 expects, none of
> the labels are printed and (null) is shown instead:
> $ ip -f mpls ro ls
> 101 as to (null) via inet 172.16.2.2 dev virt12
> 201 as to 202/203 via inet6 2001:db8:2::2 dev virt12
> 
> Remove the use of MPLS_MAX_LABELS and rely on buffer length that is
> passed to mpls_ntop. With this change ip can print the label stack
> returned by the kernel up to 255 characters (limit is due to size of
> buf passed in) which amounts to 31 labels with a separator.
> 
> With this change the above is:
> $ ip/ip -f mpls ro ls
> 101 as to 102/103/104/105/106/107/108/109/110 via inet 172.16.2.2 dev 
> virt12
> 
> Signed-off-by: David Ahern 

Reviewed-by: Simon Horman 



Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")

2017-05-10 Thread Anton Ivanov

On 10/05/17 09:56, Jason Wang wrote:



On 2017年05月10日 13:28, Anton Ivanov wrote:

On 10/05/17 03:18, Jason Wang wrote:


On 2017年05月09日 23:11, Stefan Hajnoczi wrote:

On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:

I have figured it out. Two issues.

1) skb->xmit_more is hardly ever set under virtualization because
the qdisc
is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is
set a virtual NIC driver is not likely see skb->xmit_more (this
answers my
"how does this work at all" question).

2) If that flag is turned off (I patched sched_generic to turn it
off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If
the driver
is not DQL enabled xmit_more is never ever set. If the driver is DQL
enabled
the queue is adjusted to ensure xmit_more stops happening within
10-15 xmit
cycles.

That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There,
the BIG cost is telling the hypervisor that it needs to "kick" the
packets.
The cost of putting them into the vNIC buffers is negligible. You 
want
xmit_more to happen - it makes between 50% and 300% (depending on 
vNIC
design) difference. If there is no xmit_more the vNIC will 
immediately
"kick" the hypervisor and try to signal that  the packet needs to 
move

straight away (as for example in virtio_net).

How do you measure the performance? TCP or just measure pps?

In this particular case - tcp from guest. I have a couple of other
benchmarks (forwarding, etc).


One more question, is the number for virtio-net or other emulated vNIC?


Other for now - you are cc-ed to keep you in the loop.

Virtio is next on my list - I am revisiting the l2tpv3.c driver in QEMU 
and looking at how to preserve bulking by adding back sendmmsg (as well 
as a list of other features/transports).


We had sendmmsg removed for the final inclusion in QEMU 2.1, it 
presently uses only recvmmsg so for the time being it does not care. 
That will most likely change once it starts using sendmmsg as well.







In addition to that, the perceived line rate is proportional to this
cost,
so I am not sure that the current dql math holds. In fact, I think
it does
not - it is trying to adjust something which influences the
perceived line
rate.

So - how do we turn BOTH bypass and DQL adjustment while under
virtualization and set them to be "always qdisc" + "always xmit_more
allowed"

Virtio-net net does not support BQL. Before commit ea7735d97ba9
("virtio-net: move free_old_xmit_skbs"), it's even impossible to
support that since we don't have tx interrupt for each packet.  I
haven't measured the impact of xmit_more, maybe I was wrong but I
think it may help in some cases since it may improve the batching on
host more or less.

If you do not support BQL, you might as well look the xmit_more part
kick code path. Line 1127.

bool kick = !skb->xmit_more; effectively means kick = true;

It will never be triggered. You will be kicking each packet and per
packet.


Probably not, we have several ways to try to suppress this on the 
virtio layer, host can give hints to disable the kicks through:


- explicitly set a flag
- implicitly by not publishing a new event idx

FYI, I can get 100-200 packets per vm exit when testing 64 byte 
TCP_STREAM using netperf.


I am aware of that. If, however, the host is providing a hint we might 
as well use it.





xmit_more is now set only out of BQL. If BQL is not enabled you
never get it. Now, will the current dql code work correctly if you do
not have a defined line rate and completion interrupts - no idea.
Probably not. IMHO instead of trying to fix it there should be a way for
a device or architecture to turn it off.


In fact BQL is not the only user for xmit_more. Pktgen with burst is 
another. Test does not show obvious difference if I set burst from 0 
to 64 since we already had other ways to avoid kicking host.


That, as well as this not being wired to bulk transport.





To be clear - I ran into this working on my own drivers for UML, you are
cc-ed because you are likely to be one of the most affected.


I'm still not quite sure the issue. Looks like virtio-net is ok since 
BQL is not supported and the impact of xmit_more could be ignored.


Presently - yes. If you have bulk aware transports to wire into that is 
likely to make a difference.




Thanks



A.


Thanks


A.

P.S. Cc-ing virtio maintainer

CCing Michael Tsirkin and Jason Wang, who are the core virtio and
virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
to this discussion.)


A.


On 08/05/17 08:15, Anton Ivanov wrote:

Hi all,

I was revising some of my old work for UML to prepare it for
submission
and I noticed that skb->xmit_more does not seem to be set any more.

I traced the issue as far as net/sched/sched_generic.c

try_bulk_dequeue_skb() is never invoked (the drivers I am working
on are
dql enabled so that is not the problem).

More interestingly, if I put a breakpoint and debug output into

Re: [PATCH net 2/2] xdp: disallow use of native and generic hook at once

2017-05-10 Thread Daniel Borkmann

On 05/10/2017 05:18 AM, Jakub Kicinski wrote:

On Wed, 10 May 2017 03:31:31 +0200, Daniel Borkmann wrote:

While working on the iproute2 generic XDP frontend, I noticed that
as of right now it's possible to have native *and* generic XDP
programs loaded both at the same time for the case when a driver
supports native XDP.


Nice improvement!  A couple of absolute nitpicks below..


The intended model for generic XDP from b5cdae3291f7 ("net: Generic
XDP") is, however, that only one out of the two can be present at
once which is also indicated as such in the XPD netlink dump part.

   ^^^
  XDP


Good point.


@@ -6851,6 +6851,32 @@ int dev_change_proto_down(struct net_device *dev, bool 
proto_down)
  }
  EXPORT_SYMBOL(dev_change_proto_down);

+bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op)


Out of curiosity - the leading underscores refer to caller having to
hold rtnl?  I assume they are not needed in the function below because
it's static?


I think I don't quite follow the last question, but it probably makes
sense to add an ASSERT_RTNL() into dev_xdp_attached() inline helper to
make it clearly visible to callers of this api.


+{
+   struct netdev_xdp xdp;
+
+   memset(, 0, sizeof(xdp));
+   xdp.command = XDP_QUERY_PROG;


Probably personal preference, but seems like designated struct
initializer would do quite nicely here and save the memset :)


I had that initially, but I recalled that gcc < 4.6 does not handle this
style for the initialization of anonymous struct/union properly (e.g.,
we fixed that in iproute2 as well). Andrew Morton still uses gcc 4.4.4
and occasionally sends kernel fixes, so we might end up like this anyway.


diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index dda9f16..99320f0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1251,24 +1251,20 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct 
net_device *dev)
  {
struct nlattr *xdp;
u32 xdp_flags = 0;
-   u8 val = 0;
int err;
+   u8 val;

xdp = nla_nest_start(skb, IFLA_XDP);
if (!xdp)
return -EMSGSIZE;
+
if (rcu_access_pointer(dev->xdp_prog)) {
xdp_flags = XDP_FLAGS_SKB_MODE;
val = 1;
-   } else if (dev->netdev_ops->ndo_xdp) {
-   struct netdev_xdp xdp_op = {};
-
-   xdp_op.command = XDP_QUERY_PROG;
-   err = dev->netdev_ops->ndo_xdp(dev, _op);
-   if (err)
-   goto err_cancel;
-   val = xdp_op.prog_attached;
+   } else {
+   val = dev_xdp_attached(dev);
}


Would it make sense to set xdp_flags to XDP_FLAGS_DRV_MODE here to keep
things symmetrical?  I know you are just preserving existing behaviour
but it may seem slightly arbitrary to a new comer to report one of the
very similarly named flags in the dump but not the other.


I thought about it, it's kind of redundant information since
IFLA_XDP_ATTACHED attribute w/o IFLA_XDP_FLAGS attribute today
says that it's native already. It might look strange if we add
also XDP_FLAGS_DRV_MODE there, since it doesn't give anything
new. I rather see it similar to XDP_FLAGS_UPDATE_IF_NOEXIST flag
that is for updating fd only, but I don't really have a strong
opinion on this though. I could add it to the respin if preferred.


Re: [PATCH 1/3] ptr_ring: batch ring zeroing

2017-05-10 Thread Jesper Dangaard Brouer
On Tue, 9 May 2017 16:33:14 +0300
"Michael S. Tsirkin"  wrote:

> On Sat, Apr 08, 2017 at 02:14:08PM +0200, Jesper Dangaard Brouer wrote:
> > On Fri, 7 Apr 2017 08:49:57 +0300
> > "Michael S. Tsirkin"  wrote:
> >   
> > > A known weakness in ptr_ring design is that it does not handle well the
> > > situation when ring is almost full: as entries are consumed they are
> > > immediately used again by the producer, so consumer and producer are
> > > writing to a shared cache line.
> > > 
> > > To fix this, add batching to consume calls: as entries are
> > > consumed do not write NULL into the ring until we get
> > > a multiple (in current implementation 2x) of cache lines
> > > away from the producer. At that point, write them all out.
> > > 
> > > We do the write out in the reverse order to keep
> > > producer from sharing cache with consumer for as long
> > > as possible.
> > > 
> > > Writeout also triggers when ring wraps around - there's
> > > no special reason to do this but it helps keep the code
> > > a bit simpler.
> > > 
> > > What should we do if getting away from producer by 2 cache lines
> > > would mean we are keeping the ring moe than half empty?
> > > Maybe we should reduce the batching in this case,
> > > current patch simply reduces the batching.
> > > 
> > > Notes:
> > > - it is no longer true that a call to consume guarantees
> > >   that the following call to produce will succeed.
> > >   No users seem to assume that.
> > > - batching can also in theory reduce the signalling rate:
> > >   users that would previously send interrups to the producer
> > >   to wake it up after consuming each entry would now only
> > >   need to do this once in a batch.
> > >   Doing this would be easy by returning a flag to the caller.
> > >   No users seem to do signalling on consume yet so this was not
> > >   implemented yet.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > > 
> > > Jason, I am curious whether the following gives you some of
> > > the performance boost that you see with vhost batching
> > > patches. Is vhost batching on top still helpful?
> > > 
> > >  include/linux/ptr_ring.h | 63 
> > > +---
> > >  1 file changed, 54 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > index 6c70444..6b2e0dd 100644
> > > --- a/include/linux/ptr_ring.h
> > > +++ b/include/linux/ptr_ring.h
> > > @@ -34,11 +34,13 @@
> > >  struct ptr_ring {
> > >   int producer cacheline_aligned_in_smp;
> > >   spinlock_t producer_lock;
> > > - int consumer cacheline_aligned_in_smp;
> > > + int consumer_head cacheline_aligned_in_smp; /* next valid entry */
> > > + int consumer_tail; /* next entry to invalidate */
> > >   spinlock_t consumer_lock;
> > >   /* Shared consumer/producer data */
> > >   /* Read-only by both the producer and the consumer */
> > >   int size cacheline_aligned_in_smp; /* max entries in queue */
> > > + int batch; /* number of entries to consume in a batch */
> > >   void **queue;
> > >  };
> > >  
> > > @@ -170,7 +172,7 @@ static inline int ptr_ring_produce_bh(struct ptr_ring 
> > > *r, void *ptr)
> > >  static inline void *__ptr_ring_peek(struct ptr_ring *r)
> > >  {
> > >   if (likely(r->size))
> > > - return r->queue[r->consumer];
> > > + return r->queue[r->consumer_head];
> > >   return NULL;
> > >  }
> > >  
> > > @@ -231,9 +233,38 @@ static inline bool ptr_ring_empty_bh(struct ptr_ring 
> > > *r)
> > >  /* Must only be called after __ptr_ring_peek returned !NULL */
> > >  static inline void __ptr_ring_discard_one(struct ptr_ring *r)
> > >  {
> > > - r->queue[r->consumer++] = NULL;
> > > - if (unlikely(r->consumer >= r->size))
> > > - r->consumer = 0;
> > > + /* Fundamentally, what we want to do is update consumer
> > > +  * index and zero out the entry so producer can reuse it.
> > > +  * Doing it naively at each consume would be as simple as:
> > > +  *   r->queue[r->consumer++] = NULL;
> > > +  *   if (unlikely(r->consumer >= r->size))
> > > +  *   r->consumer = 0;
> > > +  * but that is suboptimal when the ring is full as producer is writing
> > > +  * out new entries in the same cache line.  Defer these updates until a
> > > +  * batch of entries has been consumed.
> > > +  */
> > > + int head = r->consumer_head++;
> > > +
> > > + /* Once we have processed enough entries invalidate them in
> > > +  * the ring all at once so producer can reuse their space in the ring.
> > > +  * We also do this when we reach end of the ring - not mandatory
> > > +  * but helps keep the implementation simple.
> > > +  */
> > > + if (unlikely(r->consumer_head - r->consumer_tail >= r->batch ||
> > > +  r->consumer_head >= r->size)) {
> > > + /* Zero out entries in the reverse order: this way we touch the
> > > +  * cache line that producer might currently be 

[PATCH v5 11/17] net: qualcomm: make qca_7k_common a separate kernel module

2017-05-10 Thread Stefan Wahren
In order to share common functions between QCA7000 SPI and UART protocol
driver the qca_7k_common needs to be a separate kernel module.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/Kconfig |  8 +++-
 drivers/net/ethernet/qualcomm/Makefile|  5 +++--
 drivers/net/ethernet/qualcomm/qca_7k_common.c | 10 ++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/Kconfig 
b/drivers/net/ethernet/qualcomm/Kconfig
index d7720bf..b4c369d 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -16,7 +16,13 @@ config NET_VENDOR_QUALCOMM
 if NET_VENDOR_QUALCOMM
 
 config QCA7000
-   tristate "Qualcomm Atheros QCA7000 support"
+   tristate
+   help
+ This enables support for the Qualcomm Atheros QCA7000.
+
+config QCA7000_SPI
+   tristate "Qualcomm Atheros QCA7000 SPI support"
+   select QCA7000
depends on SPI_MASTER && OF
---help---
  This SPI protocol driver supports the Qualcomm Atheros QCA7000.
diff --git a/drivers/net/ethernet/qualcomm/Makefile 
b/drivers/net/ethernet/qualcomm/Makefile
index 5e17bf1..65556ca 100644
--- a/drivers/net/ethernet/qualcomm/Makefile
+++ b/drivers/net/ethernet/qualcomm/Makefile
@@ -2,7 +2,8 @@
 # Makefile for the Qualcomm network device drivers.
 #
 
-obj-$(CONFIG_QCA7000) += qcaspi.o
-qcaspi-objs := qca_spi.o qca_7k_common.o qca_7k.o qca_debug.o
+obj-$(CONFIG_QCA7000) += qca_7k_common.o
+obj-$(CONFIG_QCA7000_SPI) += qcaspi.o
+qcaspi-objs := qca_7k.o qca_debug.o qca_spi.o
 
 obj-y += emac/
diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.c 
b/drivers/net/ethernet/qualcomm/qca_7k_common.c
index 0d3daa9..6b511f0 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k_common.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c
@@ -21,7 +21,9 @@
  *   by an atheros frame while transmitted over a serial channel;
  */
 
+#include 
 #include 
+#include 
 
 #include "qca_7k_common.h"
 
@@ -46,6 +48,7 @@ qcafrm_create_header(u8 *buf, u16 length)
 
return QCAFRM_HEADER_LEN;
 }
+EXPORT_SYMBOL_GPL(qcafrm_create_header);
 
 u16
 qcafrm_create_footer(u8 *buf)
@@ -57,6 +60,7 @@ qcafrm_create_footer(u8 *buf)
buf[1] = 0x55;
return QCAFRM_FOOTER_LEN;
 }
+EXPORT_SYMBOL_GPL(qcafrm_create_footer);
 
 /*   Gather received bytes and try to extract a full ethernet frame by
  *   following a simple state machine.
@@ -154,3 +158,9 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, 
u16 buf_len, u8 recv_by
 
return ret;
 }
+EXPORT_SYMBOL_GPL(qcafrm_fsm_decode);
+
+MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 common");
+MODULE_AUTHOR("Qualcomm Atheros Communications");
+MODULE_AUTHOR("Stefan Wahren ");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.1.4



[PATCH v5 04/17] net: qualcomm: use net_device_ops instead of direct call

2017-05-10 Thread Stefan Wahren
There is no need to export qcaspi_netdev_open and qcaspi_netdev_close
because they are also accessible via the net_device_ops.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_debug.c | 5 +++--
 drivers/net/ethernet/qualcomm/qca_spi.c   | 4 ++--
 drivers/net/ethernet/qualcomm/qca_spi.h   | 3 ---
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_debug.c 
b/drivers/net/ethernet/qualcomm/qca_debug.c
index d145df9..92b6be9 100644
--- a/drivers/net/ethernet/qualcomm/qca_debug.c
+++ b/drivers/net/ethernet/qualcomm/qca_debug.c
@@ -275,6 +275,7 @@ qcaspi_get_ringparam(struct net_device *dev, struct 
ethtool_ringparam *ring)
 static int
 qcaspi_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ring)
 {
+   const struct net_device_ops *ops = dev->netdev_ops;
struct qcaspi *qca = netdev_priv(dev);
 
if ((ring->rx_pending) ||
@@ -283,13 +284,13 @@ qcaspi_set_ringparam(struct net_device *dev, struct 
ethtool_ringparam *ring)
return -EINVAL;
 
if (netif_running(dev))
-   qcaspi_netdev_close(dev);
+   ops->ndo_stop(dev);
 
qca->txr.count = max_t(u32, ring->tx_pending, TX_RING_MIN_LEN);
qca->txr.count = min_t(u16, qca->txr.count, TX_RING_MAX_LEN);
 
if (netif_running(dev))
-   qcaspi_netdev_open(dev);
+   ops->ndo_open(dev);
 
return 0;
 }
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index 8590109..5c79612 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -602,7 +602,7 @@ qcaspi_intr_handler(int irq, void *data)
return IRQ_HANDLED;
 }
 
-int
+static int
 qcaspi_netdev_open(struct net_device *dev)
 {
struct qcaspi *qca = netdev_priv(dev);
@@ -639,7 +639,7 @@ qcaspi_netdev_open(struct net_device *dev)
return 0;
 }
 
-int
+static int
 qcaspi_netdev_close(struct net_device *dev)
 {
struct qcaspi *qca = netdev_priv(dev);
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h 
b/drivers/net/ethernet/qualcomm/qca_spi.h
index 6e31a0e..064853d 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.h
+++ b/drivers/net/ethernet/qualcomm/qca_spi.h
@@ -108,7 +108,4 @@ struct qcaspi {
u16 burst_len;
 };
 
-int qcaspi_netdev_open(struct net_device *dev);
-int qcaspi_netdev_close(struct net_device *dev);
-
 #endif /* _QCA_SPI_H */
-- 
2.1.4



[PATCH v5 07/17] net: qualcomm: move qcaspi_tx_cmd to qca_spi.c

2017-05-10 Thread Stefan Wahren
The function qcaspi_tx_cmd() is only called from qca_spi.c. So we better
move it there.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_7k.c  | 24 
 drivers/net/ethernet/qualcomm/qca_7k.h  |  1 -
 drivers/net/ethernet/qualcomm/qca_spi.c | 24 
 3 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c 
b/drivers/net/ethernet/qualcomm/qca_7k.c
index 557d53c..aa90a1d 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k.c
@@ -119,27 +119,3 @@ qcaspi_write_register(struct qcaspi *qca, u16 reg, u16 
value)
 
return ret;
 }
-
-int
-qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd)
-{
-   __be16 tx_data;
-   struct spi_message *msg = >spi_msg1;
-   struct spi_transfer *transfer = >spi_xfer1;
-   int ret;
-
-   tx_data = cpu_to_be16(cmd);
-   transfer->len = sizeof(tx_data);
-   transfer->tx_buf = _data;
-   transfer->rx_buf = NULL;
-
-   ret = spi_sync(qca->spi_dev, msg);
-
-   if (!ret)
-   ret = msg->status;
-
-   if (ret)
-   qcaspi_spi_error(qca);
-
-   return ret;
-}
diff --git a/drivers/net/ethernet/qualcomm/qca_7k.h 
b/drivers/net/ethernet/qualcomm/qca_7k.h
index 4047f0a..27124c2 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k.h
+++ b/drivers/net/ethernet/qualcomm/qca_7k.h
@@ -67,6 +67,5 @@
 void qcaspi_spi_error(struct qcaspi *qca);
 int qcaspi_read_register(struct qcaspi *qca, u16 reg, u16 *result);
 int qcaspi_write_register(struct qcaspi *qca, u16 reg, u16 value);
-int qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd);
 
 #endif /* _QCA_7K_H */
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index a239ac4..c727357 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -192,6 +192,30 @@ qcaspi_read_legacy(struct qcaspi *qca, u8 *dst, u32 len)
 }
 
 static int
+qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd)
+{
+   __be16 tx_data;
+   struct spi_message *msg = >spi_msg1;
+   struct spi_transfer *transfer = >spi_xfer1;
+   int ret;
+
+   tx_data = cpu_to_be16(cmd);
+   transfer->len = sizeof(tx_data);
+   transfer->tx_buf = _data;
+   transfer->rx_buf = NULL;
+
+   ret = spi_sync(qca->spi_dev, msg);
+
+   if (!ret)
+   ret = msg->status;
+
+   if (ret)
+   qcaspi_spi_error(qca);
+
+   return ret;
+}
+
+static int
 qcaspi_tx_frame(struct qcaspi *qca, struct sk_buff *skb)
 {
u32 count;
-- 
2.1.4



[PATCH v5 00/17] net: qualcomm: add QCA7000 UART driver

2017-05-10 Thread Stefan Wahren
The Qualcomm QCA7000 HomePlug GreenPHY supports two interfaces:
UART and SPI. This patch series adds the missing support for UART.

This driver based on the Qualcomm code [1], but contains some changes:
* use random MAC address per default
* use net_device_stats from device
* share frame decoding between SPI and UART driver
* improve error handling
* reimplement tty_wakeup with work queue (based on slcan)
* use new serial device bus instead of ldisc

The patches 1 - 3 are just for clean up and are not related to
the UART support. Patches 4 - 15 prepare the existing QCA7000
code for UART support. Patch 16 is a improvement for serial device
bus. Patch 17 contains the new driver.

Cherry picking of the dt-bindings and serdev patch is okay. The
UART driver functionally (not compile) depends on this unmerged
patch [2].

The code itself has been tested on a Freescale i.MX28 board and
a Raspberry Pi Zero.

Changes in v5:
  * rebase to current linux-next
  * fix alignment issues in rx path
  * fix buffer overrun with big ethernet frames
  * fix init of UART decoding fsm
  * add device UART settings to Kconfig help
  * add current-speed to slave-device binding
  * merge SPI and UART binding document
  * rename qca_common to qca_7k_common
  * drop patch for retrieving UART settings
  * several cleanups

Changes in v4:
  * rebase to current linux-next
  * use parameter -M for git format-patch
  * change order of local variables where possible
  * implement basic serdev support (without hardware flow control)

Changes in v3:
  * rebase to current net-next

Changes in v2:
  * fix build issue by using netif_trans_update() and dev_trans_start()

[1] - https://github.com/IoE/qca7000
[2] - http://marc.info/?l=linux-serial=149338017301309=2

Stefan Wahren (17):
  net: qualcomm: remove unnecessary includes
  net: qca_framing: use u16 for frame offset
  net: qca_7k: Use BIT macro
  net: qualcomm: use net_device_ops instead of direct call
  net: qualcomm: Improve readability of length defines
  net: qca_spi: remove QCASPI_MTU
  net: qualcomm: move qcaspi_tx_cmd to qca_spi.c
  net: qca_spi: Clarify MODULE_DESCRIPTION
  net: qualcomm: rename qca_framing.c to qca_7k_common.c
  net: qualcomm: prepare frame decoding for UART driver
  net: qualcomm: make qca_7k_common a separate kernel module
  dt-bindings: qca7000-spi: Rework binding
  dt-bindings: qca7000: rename binding
  dt-bindings: slave-device: add current-speed property
  dt-bindings: qca7000: append UART interface to binding
  tty: serdev-ttyport: return actual baudrate from ttyport_set_baudrate
  net: qualcomm: add QCA7000 UART driver

 .../devicetree/bindings/net/qca-qca7000-spi.txt|  47 ---
 .../devicetree/bindings/net/qca-qca7000.txt|  88 +
 .../devicetree/bindings/serial/slave-device.txt|   9 +
 drivers/net/ethernet/qualcomm/Kconfig  |  24 +-
 drivers/net/ethernet/qualcomm/Makefile |   7 +-
 drivers/net/ethernet/qualcomm/qca_7k.c |  28 --
 drivers/net/ethernet/qualcomm/qca_7k.h |  15 +-
 .../qualcomm/{qca_framing.c => qca_7k_common.c}|  26 +-
 .../qualcomm/{qca_framing.h => qca_7k_common.h}|  24 +-
 drivers/net/ethernet/qualcomm/qca_debug.c  |   5 +-
 drivers/net/ethernet/qualcomm/qca_spi.c|  47 ++-
 drivers/net/ethernet/qualcomm/qca_spi.h|   5 +-
 drivers/net/ethernet/qualcomm/qca_uart.c   | 423 +
 drivers/tty/serdev/serdev-ttyport.c|   2 +-
 14 files changed, 630 insertions(+), 120 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/net/qca-qca7000-spi.txt
 create mode 100644 Documentation/devicetree/bindings/net/qca-qca7000.txt
 rename drivers/net/ethernet/qualcomm/{qca_framing.c => qca_7k_common.c} (85%)
 rename drivers/net/ethernet/qualcomm/{qca_framing.h => qca_7k_common.h} (86%)
 create mode 100644 drivers/net/ethernet/qualcomm/qca_uart.c

-- 
2.1.4



[PATCH v5 10/17] net: qualcomm: prepare frame decoding for UART driver

2017-05-10 Thread Stefan Wahren
Unfortunately the frame format is not exactly identical between SPI
and UART. In case of SPI there is an additional HW length at the
beginning. So store the initial state to make the decoding state machine
more flexible and easy to extend for UART support.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_7k_common.c | 12 ++--
 drivers/net/ethernet/qualcomm/qca_7k_common.h |  8 ++--
 drivers/net/ethernet/qualcomm/qca_spi.c   |  2 +-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.c 
b/drivers/net/ethernet/qualcomm/qca_7k_common.c
index 6d17fbd..0d3daa9 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k_common.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c
@@ -83,7 +83,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 
buf_len, u8 recv_by
 
if (recv_byte != 0x00) {
/* first two bytes of length must be 0 */
-   handle->state = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
}
break;
case QCAFRM_HW_LEN2:
@@ -97,7 +97,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, u16 
buf_len, u8 recv_by
case QCAFRM_WAIT_AA4:
if (recv_byte != 0xAA) {
ret = QCAFRM_NOHEAD;
-   handle->state = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
} else {
handle->state--;
}
@@ -119,7 +119,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, 
u16 buf_len, u8 recv_by
len = handle->offset;
if (len > buf_len || len < QCAFRM_MIN_LEN) {
ret = QCAFRM_INVLEN;
-   handle->state = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
} else {
handle->state = (enum qcafrm_state)(len + 1);
/* Remaining number of bytes. */
@@ -135,7 +135,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, 
u16 buf_len, u8 recv_by
case QCAFRM_WAIT_551:
if (recv_byte != 0x55) {
ret = QCAFRM_NOTAIL;
-   handle->state = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
} else {
handle->state = QCAFRM_WAIT_552;
}
@@ -143,11 +143,11 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, 
u16 buf_len, u8 recv_by
case QCAFRM_WAIT_552:
if (recv_byte != 0x55) {
ret = QCAFRM_NOTAIL;
-   handle->state = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
} else {
ret = handle->offset;
/* Frame is fully received. */
-   handle->state = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
}
break;
}
diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.h 
b/drivers/net/ethernet/qualcomm/qca_7k_common.h
index 5df7c65..07bdd6c 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k_common.h
+++ b/drivers/net/ethernet/qualcomm/qca_7k_common.h
@@ -61,6 +61,7 @@
 #define QCAFRM_ERR_BASE -1000
 
 enum qcafrm_state {
+   /* HW length is only available on SPI */
QCAFRM_HW_LEN0 = 0x8000,
QCAFRM_HW_LEN1 = QCAFRM_HW_LEN0 - 1,
QCAFRM_HW_LEN2 = QCAFRM_HW_LEN1 - 1,
@@ -101,6 +102,8 @@ enum qcafrm_state {
 struct qcafrm_handle {
/*  Current decoding state */
enum qcafrm_state state;
+   /* Initial state depends on connection type */
+   enum qcafrm_state init;
 
/* Offset in buffer (borrowed for length too) */
u16 offset;
@@ -113,9 +116,10 @@ u16 qcafrm_create_header(u8 *buf, u16 len);
 
 u16 qcafrm_create_footer(u8 *buf);
 
-static inline void qcafrm_fsm_init(struct qcafrm_handle *handle)
+static inline void qcafrm_fsm_init_spi(struct qcafrm_handle *handle)
 {
-   handle->state = QCAFRM_HW_LEN0;
+   handle->init = QCAFRM_HW_LEN0;
+   handle->state = handle->init;
 }
 
 /*   Gather received bytes and try to extract a full Ethernet frame
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index 2bc3ba4..b7073a9 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -637,7 +637,7 @@ qcaspi_netdev_open(struct net_device *dev)
qca->intr_req = 1;
qca->intr_svc = 0;
qca->sync = QCASPI_SYNC_UNKNOWN;
-   qcafrm_fsm_init(>frm_handle);
+   qcafrm_fsm_init_spi(>frm_handle);
 
qca->spi_thread = kthread_run((void *)qcaspi_spi_thread,
  qca, "%s", dev->name);
-- 
2.1.4



[PATCH v5 08/17] net: qca_spi: Clarify MODULE_DESCRIPTION

2017-05-10 Thread Stefan Wahren
Since this driver is specific to the QCA7000, we should make the module
description more precisely.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_spi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index c727357..deec70f 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -997,7 +997,7 @@ static struct spi_driver qca_spi_driver = {
 };
 module_spi_driver(qca_spi_driver);
 
-MODULE_DESCRIPTION("Qualcomm Atheros SPI Driver");
+MODULE_DESCRIPTION("Qualcomm Atheros QCA7000 SPI Driver");
 MODULE_AUTHOR("Qualcomm Atheros Communications");
 MODULE_AUTHOR("Stefan Wahren ");
 MODULE_LICENSE("Dual BSD/GPL");
-- 
2.1.4



[PATCH v5 13/17] dt-bindings: qca7000: rename binding

2017-05-10 Thread Stefan Wahren
Before we can merge the QCA7000 UART binding the document needs to be
renamed.

Signed-off-by: Stefan Wahren 
---
 .../devicetree/bindings/net/{qca-qca7000-spi.txt => qca-qca7000.txt}  | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/devicetree/bindings/net/{qca-qca7000-spi.txt => 
qca-qca7000.txt} (100%)

diff --git a/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt 
b/Documentation/devicetree/bindings/net/qca-qca7000.txt
similarity index 100%
rename from Documentation/devicetree/bindings/net/qca-qca7000-spi.txt
rename to Documentation/devicetree/bindings/net/qca-qca7000.txt
-- 
2.1.4



[PATCH v5 12/17] dt-bindings: qca7000-spi: Rework binding

2017-05-10 Thread Stefan Wahren
In preparation for the QCA7000 UART binding rework the binding document.

Signed-off-by: Stefan Wahren 
---
 .../devicetree/bindings/net/qca-qca7000-spi.txt| 49 +-
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt 
b/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt
index c74989c..a37f656 100644
--- a/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt
+++ b/Documentation/devicetree/bindings/net/qca-qca7000-spi.txt
@@ -1,29 +1,38 @@
-* Qualcomm QCA7000 (Ethernet over SPI protocol)
+* Qualcomm QCA7000
 
-Note: The QCA7000 is useable as a SPI device. In this case it must be defined
-as a child of a SPI master in the device tree.
+The QCA7000 is a serial-to-powerline bridge with a host interface which could
+be configured either as SPI or UART slave. This configuration is done by
+the QCA7000 firmware.
+
+(a) Ethernet over SPI
+
+In order to use the QCA7000 as SPI device it must be defined as a child of a
+SPI master in the device tree.
 
 Required properties:
-- compatible : Should be "qca,qca7000"
-- reg : Should specify the SPI chip select
-- interrupts : The first cell should specify the index of the source interrupt
-  and the second cell should specify the trigger type as rising edge
-- spi-cpha : Must be set
-- spi-cpol: Must be set
+- compatible   : Should be "qca,qca7000"
+- reg  : Should specify the SPI chip select
+- interrupts   : The first cell should specify the index of the source
+ interrupt and the second cell should specify the trigger
+ type as rising edge
+- spi-cpha : Must be set
+- spi-cpol : Must be set
 
 Optional properties:
-- interrupt-parent : Specify the pHandle of the source interrupt
+- interrupt-parent  : Specify the pHandle of the source interrupt
 - spi-max-frequency : Maximum frequency of the SPI bus the chip can operate at.
-  Numbers smaller than 100 or greater than 1600 are invalid. Missing
-  the property will set the SPI frequency to 800 Hertz.
-- local-mac-address: 6 bytes, MAC address
-- qca,legacy-mode : Set the SPI data transfer of the QCA7000 to legacy mode.
-  In this mode the SPI master must toggle the chip select between each data
-  word. In burst mode these gaps aren't necessary, which is faster.
-  This setting depends on how the QCA7000 is setup via GPIO pin strapping.
-  If the property is missing the driver defaults to burst mode.
-
-Example:
+ Numbers smaller than 100 or greater than 1600
+ are invalid. Missing the property will set the SPI
+ frequency to 800 Hertz.
+- local-mac-address : see ./ethernet.txt
+- qca,legacy-mode   : Set the SPI data transfer of the QCA7000 to legacy mode.
+ In this mode the SPI master must toggle the chip select
+ between each data word. In burst mode these gaps aren't
+ necessary, which is faster. This setting depends on how
+ the QCA7000 is setup via GPIO pin strapping. If the
+ property is missing the driver defaults to burst mode.
+
+SPI Example:
 
 /* Freescale i.MX28 SPI master*/
 ssp2: spi@80014000 {
-- 
2.1.4



[PATCH v5 17/17] net: qualcomm: add QCA7000 UART driver

2017-05-10 Thread Stefan Wahren
This patch adds the Ethernet over UART driver for the
Qualcomm QCA7000 HomePlug GreenPHY.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/Kconfig |  16 +
 drivers/net/ethernet/qualcomm/Makefile|   2 +
 drivers/net/ethernet/qualcomm/qca_7k_common.h |   6 +
 drivers/net/ethernet/qualcomm/qca_uart.c  | 423 ++
 4 files changed, 447 insertions(+)
 create mode 100644 drivers/net/ethernet/qualcomm/qca_uart.c

diff --git a/drivers/net/ethernet/qualcomm/Kconfig 
b/drivers/net/ethernet/qualcomm/Kconfig
index b4c369d..877675a 100644
--- a/drivers/net/ethernet/qualcomm/Kconfig
+++ b/drivers/net/ethernet/qualcomm/Kconfig
@@ -30,6 +30,22 @@ config QCA7000_SPI
  To compile this driver as a module, choose M here. The module
  will be called qcaspi.
 
+config QCA7000_UART
+   tristate "Qualcomm Atheros QCA7000 UART support"
+   select QCA7000
+   depends on SERIAL_DEV_BUS && OF
+   ---help---
+ This UART protocol driver supports the Qualcomm Atheros QCA7000.
+
+ Currently the driver assumes these device UART settings:
+   Data bits: 8
+   Parity: None
+   Stop bits: 1
+   Flow control: None
+
+ To compile this driver as a module, choose M here. The module
+ will be called qcauart.
+
 config QCOM_EMAC
tristate "Qualcomm Technologies, Inc. EMAC Gigabit Ethernet support"
depends on HAS_DMA && HAS_IOMEM
diff --git a/drivers/net/ethernet/qualcomm/Makefile 
b/drivers/net/ethernet/qualcomm/Makefile
index 65556ca..92fa7c4 100644
--- a/drivers/net/ethernet/qualcomm/Makefile
+++ b/drivers/net/ethernet/qualcomm/Makefile
@@ -5,5 +5,7 @@
 obj-$(CONFIG_QCA7000) += qca_7k_common.o
 obj-$(CONFIG_QCA7000_SPI) += qcaspi.o
 qcaspi-objs := qca_7k.o qca_debug.o qca_spi.o
+obj-$(CONFIG_QCA7000_UART) += qcauart.o
+qcauart-objs := qca_uart.o
 
 obj-y += emac/
diff --git a/drivers/net/ethernet/qualcomm/qca_7k_common.h 
b/drivers/net/ethernet/qualcomm/qca_7k_common.h
index 07bdd6c..928554f 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k_common.h
+++ b/drivers/net/ethernet/qualcomm/qca_7k_common.h
@@ -122,6 +122,12 @@ static inline void qcafrm_fsm_init_spi(struct 
qcafrm_handle *handle)
handle->state = handle->init;
 }
 
+static inline void qcafrm_fsm_init_uart(struct qcafrm_handle *handle)
+{
+   handle->init = QCAFRM_WAIT_AA1;
+   handle->state = handle->init;
+}
+
 /*   Gather received bytes and try to extract a full Ethernet frame
  *   by following a simple state machine.
  *
diff --git a/drivers/net/ethernet/qualcomm/qca_uart.c 
b/drivers/net/ethernet/qualcomm/qca_uart.c
new file mode 100644
index 000..1f6514c
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/qca_uart.c
@@ -0,0 +1,423 @@
+/*
+ *   Copyright (c) 2011, 2012, Qualcomm Atheros Communications Inc.
+ *   Copyright (c) 2017, I2SE GmbH
+ *
+ *   Permission to use, copy, modify, and/or distribute this software
+ *   for any purpose with or without fee is hereby granted, provided
+ *   that the above copyright notice and this permission notice appear
+ *   in all copies.
+ *
+ *   THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
+ *   WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
+ *   WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
+ *   THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
+ *   CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
+ *   LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
+ *   NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+ *   CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+/*   This module implements the Qualcomm Atheros UART protocol for
+ *   kernel-based UART device; it is essentially an Ethernet-to-UART
+ *   serial converter;
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qca_7k_common.h"
+
+#define QCAUART_DRV_VERSION "0.1.0"
+#define QCAUART_DRV_NAME "qcauart"
+#define QCAUART_TX_TIMEOUT (1 * HZ)
+
+struct qcauart {
+   struct net_device *net_dev;
+   spinlock_t lock;/* transmit lock */
+   struct work_struct tx_work; /* Flushes transmit buffer   */
+
+   struct serdev_device *serdev;
+   struct qcafrm_handle frm_handle;
+   struct sk_buff *rx_skb;
+
+   unsigned char *tx_head; /* pointer to next XMIT byte */
+   int tx_left;/* bytes left in XMIT queue  */
+   unsigned char *tx_buffer;
+};
+
+static int
+qca_tty_receive(struct serdev_device *serdev, const unsigned char *data,
+   size_t count)
+{
+   struct qcauart *qca = serdev_device_get_drvdata(serdev);
+   struct net_device *netdev = qca->net_dev;
+   struct net_device_stats *n_stats = 

[PATCH v5 14/17] dt-bindings: slave-device: add current-speed property

2017-05-10 Thread Stefan Wahren
This adds a new DT property to define the current baud rate of the
slave device.

Signed-off-by: Stefan Wahren 
---
 Documentation/devicetree/bindings/serial/slave-device.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/devicetree/bindings/serial/slave-device.txt 
b/Documentation/devicetree/bindings/serial/slave-device.txt
index f660379..40110e0 100644
--- a/Documentation/devicetree/bindings/serial/slave-device.txt
+++ b/Documentation/devicetree/bindings/serial/slave-device.txt
@@ -21,6 +21,15 @@ Optional Properties:
  can support. For example, a particular board has some signal
  quality issue or the host processor can't support higher
  baud rates.
+- current-speed: The current baud rate the device operates at. This 
should
+ only be present in case a driver has no chance to know
+ the baud rate of the slave device.
+ Examples:
+   * device supports auto-baud
+   * the rate is setup by a bootloader and there is no
+ way to reset the device
+   * device baud rate is configured by its firmware but
+ there is no way to request the actual settings
 
 Example:
 
-- 
2.1.4



[PATCH v5 15/17] dt-bindings: qca7000: append UART interface to binding

2017-05-10 Thread Stefan Wahren
This merges the serdev binding for the QCA7000 UART driver (Ethernet over
UART) into the existing document.

Signed-off-by: Stefan Wahren 
---
 .../devicetree/bindings/net/qca-qca7000.txt| 32 ++
 1 file changed, 32 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/qca-qca7000.txt 
b/Documentation/devicetree/bindings/net/qca-qca7000.txt
index a37f656..08364c3 100644
--- a/Documentation/devicetree/bindings/net/qca-qca7000.txt
+++ b/Documentation/devicetree/bindings/net/qca-qca7000.txt
@@ -54,3 +54,35 @@ ssp2: spi@80014000 {
local-mac-address = [ A0 B0 C0 D0 E0 F0 ];
};
 };
+
+(b) Ethernet over UART
+
+In order to use the QCA7000 as UART slave it must be defined as a child of a
+UART master in the device tree. It is possible to preconfigure the UART
+settings of the QCA7000 firmware, but it's not possible to change them during
+runtime.
+
+Required properties:
+- compatible: Should be "qca,qca7000-uart"
+
+Optional properties:
+- local-mac-address : see ./ethernet.txt
+- current-speed : current baud rate of QCA7000 which defaults to 115200
+ if absent, see also ../serial/slave-device.txt
+
+UART Example:
+
+/* Freescale i.MX28 UART */
+auart0: serial@8006a000 {
+   compatible = "fsl,imx28-auart", "fsl,imx23-auart";
+   reg = <0x8006a000 0x2000>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_2pins_a>;
+   status = "okay";
+
+   qca7000: ethernet {
+   compatible = "qca,qca7000-uart";
+   local-mac-address = [ A0 B0 C0 D0 E0 F0 ];
+   current-speed = <38400>;
+   };
+};
-- 
2.1.4



[PATCH v5 02/17] net: qca_framing: use u16 for frame offset

2017-05-10 Thread Stefan Wahren
It doesn't make sense to use a signed variable for offset here, so
fix it up.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_framing.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_framing.h 
b/drivers/net/ethernet/qualcomm/qca_framing.h
index d5e795d..8b385e6 100644
--- a/drivers/net/ethernet/qualcomm/qca_framing.h
+++ b/drivers/net/ethernet/qualcomm/qca_framing.h
@@ -103,7 +103,7 @@ struct qcafrm_handle {
enum qcafrm_state state;
 
/* Offset in buffer (borrowed for length too) */
-   s16 offset;
+   u16 offset;
 
/* Frame length as kept by this module */
u16 len;
-- 
2.1.4



[PATCH v5 01/17] net: qualcomm: remove unnecessary includes

2017-05-10 Thread Stefan Wahren
Most of the includes in qca_7k.c are unnecessary so we better remove them.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_7k.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c 
b/drivers/net/ethernet/qualcomm/qca_7k.c
index f0066fb..557d53c 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k.c
@@ -23,11 +23,7 @@
  *   kernel-based SPI device.
  */
 
-#include 
-#include 
-#include 
 #include 
-#include 
 
 #include "qca_7k.h"
 
-- 
2.1.4



[PATCH v5 06/17] net: qca_spi: remove QCASPI_MTU

2017-05-10 Thread Stefan Wahren
There is no need for an additional MTU define.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_spi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index a0dbb92..a239ac4 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -69,7 +69,6 @@ static int qcaspi_pluggable = QCASPI_PLUGGABLE_MIN;
 module_param(qcaspi_pluggable, int, 0);
 MODULE_PARM_DESC(qcaspi_pluggable, "Pluggable SPI connection (yes/no).");
 
-#define QCASPI_MTU QCAFRM_MAX_MTU
 #define QCASPI_TX_TIMEOUT (1 * HZ)
 #define QCASPI_QCA7K_REBOOT_TIME_MS 1000
 
@@ -745,7 +744,7 @@ qcaspi_netdev_init(struct net_device *dev)
 {
struct qcaspi *qca = netdev_priv(dev);
 
-   dev->mtu = QCASPI_MTU;
+   dev->mtu = QCAFRM_MAX_MTU;
dev->type = ARPHRD_ETHER;
qca->clkspeed = qcaspi_clkspeed;
qca->burst_len = qcaspi_burst_len;
-- 
2.1.4



Re: DQL and TCQ_F_CAN_BYPASS destroy performance under virtualizaiton (Was: "Re: net_sched strange in 4.11")

2017-05-10 Thread Jason Wang



On 2017年05月10日 13:28, Anton Ivanov wrote:

On 10/05/17 03:18, Jason Wang wrote:


On 2017年05月09日 23:11, Stefan Hajnoczi wrote:

On Tue, May 09, 2017 at 08:46:46AM +0100, Anton Ivanov wrote:

I have figured it out. Two issues.

1) skb->xmit_more is hardly ever set under virtualization because
the qdisc
is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is
set a virtual NIC driver is not likely see skb->xmit_more (this
answers my
"how does this work at all" question).

2) If that flag is turned off (I patched sched_generic to turn it
off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If
the driver
is not DQL enabled xmit_more is never ever set. If the driver is DQL
enabled
the queue is adjusted to ensure xmit_more stops happening within
10-15 xmit
cycles.

That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There,
the BIG cost is telling the hypervisor that it needs to "kick" the
packets.
The cost of putting them into the vNIC buffers is negligible. You want
xmit_more to happen - it makes between 50% and 300% (depending on vNIC
design) difference. If there is no xmit_more the vNIC will immediately
"kick" the hypervisor and try to signal that  the packet needs to move
straight away (as for example in virtio_net).

How do you measure the performance? TCP or just measure pps?

In this particular case - tcp from guest. I have a couple of other
benchmarks (forwarding, etc).


One more question, is the number for virtio-net or other emulated vNIC?




In addition to that, the perceived line rate is proportional to this
cost,
so I am not sure that the current dql math holds. In fact, I think
it does
not - it is trying to adjust something which influences the
perceived line
rate.

So - how do we turn BOTH bypass and DQL adjustment while under
virtualization and set them to be "always qdisc" + "always xmit_more
allowed"

Virtio-net net does not support BQL. Before commit ea7735d97ba9
("virtio-net: move free_old_xmit_skbs"), it's even impossible to
support that since we don't have tx interrupt for each packet.  I
haven't measured the impact of xmit_more, maybe I was wrong but I
think it may help in some cases since it may improve the batching on
host more or less.

If you do not support BQL, you might as well look the xmit_more part
kick code path. Line 1127.

bool kick = !skb->xmit_more; effectively means kick = true;

It will never be triggered. You will be kicking each packet and per
packet.


Probably not, we have several ways to try to suppress this on the virtio 
layer, host can give hints to disable the kicks through:


- explicitly set a flag
- implicitly by not publishing a new event idx

FYI, I can get 100-200 packets per vm exit when testing 64 byte 
TCP_STREAM using netperf.



xmit_more is now set only out of BQL. If BQL is not enabled you
never get it. Now, will the current dql code work correctly if you do
not have a defined line rate and completion interrupts - no idea.
Probably not. IMHO instead of trying to fix it there should be a way for
a device or architecture to turn it off.


In fact BQL is not the only user for xmit_more. Pktgen with burst is 
another. Test does not show obvious difference if I set burst from 0 to 
64 since we already had other ways to avoid kicking host.




To be clear - I ran into this working on my own drivers for UML, you are
cc-ed because you are likely to be one of the most affected.


I'm still not quite sure the issue. Looks like virtio-net is ok since 
BQL is not supported and the impact of xmit_more could be ignored.


Thanks



A.


Thanks


A.

P.S. Cc-ing virtio maintainer

CCing Michael Tsirkin and Jason Wang, who are the core virtio and
virtio-net maintainers.  (I maintain the vsock driver - it's unrelated
to this discussion.)


A.


On 08/05/17 08:15, Anton Ivanov wrote:

Hi all,

I was revising some of my old work for UML to prepare it for
submission
and I noticed that skb->xmit_more does not seem to be set any more.

I traced the issue as far as net/sched/sched_generic.c

try_bulk_dequeue_skb() is never invoked (the drivers I am working
on are
dql enabled so that is not the problem).

More interestingly, if I put a breakpoint and debug output into
dequeue_skb() around line 147 - right before the bulk: tag that skb
there is always NULL. ???

Similarly, debug in pfifo_fast_dequeue shows only NULLs being
dequeued.
Again - ???

First and foremost, I apologize for the silly question, but how can
this
work at all? I see the skbs showing up at the driver level, why are
NULLs being returned at qdisc dequeue and where do the skbs at the
driver level come from?

Second, where should I look to fix it?

A.


--
Anton R. Ivanov

Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/









[PATCH v5 05/17] net: qualcomm: Improve readability of length defines

2017-05-10 Thread Stefan Wahren
In order to avoid mixing things up, make the MTU and frame length
defines easier to read.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_framing.c |  2 +-
 drivers/net/ethernet/qualcomm/qca_framing.h |  8 
 drivers/net/ethernet/qualcomm/qca_spi.c | 12 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_framing.c 
b/drivers/net/ethernet/qualcomm/qca_framing.c
index faa924c..2341f2b 100644
--- a/drivers/net/ethernet/qualcomm/qca_framing.c
+++ b/drivers/net/ethernet/qualcomm/qca_framing.c
@@ -117,7 +117,7 @@ qcafrm_fsm_decode(struct qcafrm_handle *handle, u8 *buf, 
u16 buf_len, u8 recv_by
break;
case QCAFRM_WAIT_RSVD_BYTE2:
len = handle->offset;
-   if (len > buf_len || len < QCAFRM_ETHMINLEN) {
+   if (len > buf_len || len < QCAFRM_MIN_LEN) {
ret = QCAFRM_INVLEN;
handle->state = QCAFRM_HW_LEN0;
} else {
diff --git a/drivers/net/ethernet/qualcomm/qca_framing.h 
b/drivers/net/ethernet/qualcomm/qca_framing.h
index 8b385e6..5df7c65 100644
--- a/drivers/net/ethernet/qualcomm/qca_framing.h
+++ b/drivers/net/ethernet/qualcomm/qca_framing.h
@@ -44,12 +44,12 @@
 #define QCAFRM_INVFRAME (QCAFRM_ERR_BASE - 4)
 
 /* Min/Max Ethernet MTU: 46/1500 */
-#define QCAFRM_ETHMINMTU (ETH_ZLEN - ETH_HLEN)
-#define QCAFRM_ETHMAXMTU ETH_DATA_LEN
+#define QCAFRM_MIN_MTU (ETH_ZLEN - ETH_HLEN)
+#define QCAFRM_MAX_MTU ETH_DATA_LEN
 
 /* Min/Max frame lengths */
-#define QCAFRM_ETHMINLEN (QCAFRM_ETHMINMTU + ETH_HLEN)
-#define QCAFRM_ETHMAXLEN (QCAFRM_ETHMAXMTU + VLAN_ETH_HLEN)
+#define QCAFRM_MIN_LEN (QCAFRM_MIN_MTU + ETH_HLEN)
+#define QCAFRM_MAX_LEN (QCAFRM_MAX_MTU + VLAN_ETH_HLEN)
 
 /* QCA7K header len */
 #define QCAFRM_HEADER_LEN 8
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index 5c79612..a0dbb92 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -69,7 +69,7 @@ static int qcaspi_pluggable = QCASPI_PLUGGABLE_MIN;
 module_param(qcaspi_pluggable, int, 0);
 MODULE_PARM_DESC(qcaspi_pluggable, "Pluggable SPI connection (yes/no).");
 
-#define QCASPI_MTU QCAFRM_ETHMAXMTU
+#define QCASPI_MTU QCAFRM_MAX_MTU
 #define QCASPI_TX_TIMEOUT (1 * HZ)
 #define QCASPI_QCA7K_REBOOT_TIME_MS 1000
 
@@ -402,7 +402,7 @@ qcaspi_tx_ring_has_space(struct tx_ring *txr)
if (txr->skb[txr->tail])
return 0;
 
-   return (txr->size + QCAFRM_ETHMAXLEN < QCASPI_HW_BUF_LEN) ? 1 : 0;
+   return (txr->size + QCAFRM_MAX_LEN < QCASPI_HW_BUF_LEN) ? 1 : 0;
 }
 
 /*   Flush the tx ring. This function is only safe to
@@ -666,8 +666,8 @@ qcaspi_netdev_xmit(struct sk_buff *skb, struct net_device 
*dev)
struct sk_buff *tskb;
u8 pad_len = 0;
 
-   if (skb->len < QCAFRM_ETHMINLEN)
-   pad_len = QCAFRM_ETHMINLEN - skb->len;
+   if (skb->len < QCAFRM_MIN_LEN)
+   pad_len = QCAFRM_MIN_LEN - skb->len;
 
if (qca->txr.skb[qca->txr.tail]) {
netdev_warn(qca->net_dev, "queue was unexpectedly full!\n");
@@ -804,8 +804,8 @@ qcaspi_netdev_setup(struct net_device *dev)
dev->tx_queue_len = 100;
 
/* MTU range: 46 - 1500 */
-   dev->min_mtu = QCAFRM_ETHMINMTU;
-   dev->max_mtu = QCAFRM_ETHMAXMTU;
+   dev->min_mtu = QCAFRM_MIN_MTU;
+   dev->max_mtu = QCAFRM_MAX_MTU;
 
qca = netdev_priv(dev);
memset(qca, 0, sizeof(struct qcaspi));
-- 
2.1.4



[PATCH v5 09/17] net: qualcomm: rename qca_framing.c to qca_7k_common.c

2017-05-10 Thread Stefan Wahren
As preparation for the upcoming UART driver we need a module
which contains common functions for both interfaces. The module
qca_framing is a good candidate but renaming to qca_7k_common would
make it clear.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/Makefile   | 2 +-
 drivers/net/ethernet/qualcomm/{qca_framing.c => qca_7k_common.c} | 2 +-
 drivers/net/ethernet/qualcomm/{qca_framing.h => qca_7k_common.h} | 0
 drivers/net/ethernet/qualcomm/qca_spi.c  | 2 +-
 drivers/net/ethernet/qualcomm/qca_spi.h  | 2 +-
 5 files changed, 4 insertions(+), 4 deletions(-)
 rename drivers/net/ethernet/qualcomm/{qca_framing.c => qca_7k_common.c} (99%)
 rename drivers/net/ethernet/qualcomm/{qca_framing.h => qca_7k_common.h} (100%)

diff --git a/drivers/net/ethernet/qualcomm/Makefile 
b/drivers/net/ethernet/qualcomm/Makefile
index aacb0a5..5e17bf1 100644
--- a/drivers/net/ethernet/qualcomm/Makefile
+++ b/drivers/net/ethernet/qualcomm/Makefile
@@ -3,6 +3,6 @@
 #
 
 obj-$(CONFIG_QCA7000) += qcaspi.o
-qcaspi-objs := qca_spi.o qca_framing.o qca_7k.o qca_debug.o
+qcaspi-objs := qca_spi.o qca_7k_common.o qca_7k.o qca_debug.o
 
 obj-y += emac/
diff --git a/drivers/net/ethernet/qualcomm/qca_framing.c 
b/drivers/net/ethernet/qualcomm/qca_7k_common.c
similarity index 99%
rename from drivers/net/ethernet/qualcomm/qca_framing.c
rename to drivers/net/ethernet/qualcomm/qca_7k_common.c
index 2341f2b..6d17fbd 100644
--- a/drivers/net/ethernet/qualcomm/qca_framing.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k_common.c
@@ -23,7 +23,7 @@
 
 #include 
 
-#include "qca_framing.h"
+#include "qca_7k_common.h"
 
 u16
 qcafrm_create_header(u8 *buf, u16 length)
diff --git a/drivers/net/ethernet/qualcomm/qca_framing.h 
b/drivers/net/ethernet/qualcomm/qca_7k_common.h
similarity index 100%
rename from drivers/net/ethernet/qualcomm/qca_framing.h
rename to drivers/net/ethernet/qualcomm/qca_7k_common.h
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index deec70f..2bc3ba4 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -43,8 +43,8 @@
 #include 
 
 #include "qca_7k.h"
+#include "qca_7k_common.h"
 #include "qca_debug.h"
-#include "qca_framing.h"
 #include "qca_spi.h"
 
 #define MAX_DMA_BURST_LEN 5000
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h 
b/drivers/net/ethernet/qualcomm/qca_spi.h
index 064853d..fc4beb1 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.h
+++ b/drivers/net/ethernet/qualcomm/qca_spi.h
@@ -32,7 +32,7 @@
 #include 
 #include 
 
-#include "qca_framing.h"
+#include "qca_7k_common.h"
 
 #define QCASPI_DRV_VERSION "0.2.7-i"
 #define QCASPI_DRV_NAME"qcaspi"
-- 
2.1.4



[PATCH v5 03/17] net: qca_7k: Use BIT macro

2017-05-10 Thread Stefan Wahren
Use the BIT macro for the CONFIG and INT register values.

Signed-off-by: Stefan Wahren 
---
 drivers/net/ethernet/qualcomm/qca_7k.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_7k.h 
b/drivers/net/ethernet/qualcomm/qca_7k.h
index 1cad851..4047f0a 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k.h
+++ b/drivers/net/ethernet/qualcomm/qca_7k.h
@@ -54,15 +54,15 @@
 #define SPI_REG_ACTION_CTRL 0x1B00
 
 /*   SPI_CONFIG register definition; */
-#define QCASPI_SLAVE_RESET_BIT (1 << 6)
+#define QCASPI_SLAVE_RESET_BIT  BIT(6)
 
 /*   INTR_CAUSE/ENABLE register definition.  */
-#define SPI_INT_WRBUF_BELOW_WM (1 << 10)
-#define SPI_INT_CPU_ON (1 << 6)
-#define SPI_INT_ADDR_ERR   (1 << 3)
-#define SPI_INT_WRBUF_ERR  (1 << 2)
-#define SPI_INT_RDBUF_ERR  (1 << 1)
-#define SPI_INT_PKT_AVLBL  (1 << 0)
+#define SPI_INT_WRBUF_BELOW_WM  BIT(10)
+#define SPI_INT_CPU_ON  BIT(6)
+#define SPI_INT_ADDR_ERRBIT(3)
+#define SPI_INT_WRBUF_ERR   BIT(2)
+#define SPI_INT_RDBUF_ERR   BIT(1)
+#define SPI_INT_PKT_AVLBL   BIT(0)
 
 void qcaspi_spi_error(struct qcaspi *qca);
 int qcaspi_read_register(struct qcaspi *qca, u16 reg, u16 *result);
-- 
2.1.4



[PATCH v5 16/17] tty: serdev-ttyport: return actual baudrate from ttyport_set_baudrate

2017-05-10 Thread Stefan Wahren
Instead of returning the requested baudrate, we better return the
actual one because it isn't always the same.

Signed-off-by: Stefan Wahren 
Acked-by: Rob Herring 
---
 drivers/tty/serdev/serdev-ttyport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/serdev/serdev-ttyport.c 
b/drivers/tty/serdev/serdev-ttyport.c
index 487c88f..2cfdf34 100644
--- a/drivers/tty/serdev/serdev-ttyport.c
+++ b/drivers/tty/serdev/serdev-ttyport.c
@@ -151,7 +151,7 @@ static unsigned int ttyport_set_baudrate(struct 
serdev_controller *ctrl, unsigne
 
/* tty_set_termios() return not checked as it is always 0 */
tty_set_termios(tty, );
-   return speed;
+   return ktermios.c_ospeed;
 }
 
 static void ttyport_set_flow_control(struct serdev_controller *ctrl, bool 
enable)
-- 
2.1.4



RE: [PATCH 1/4] hamradio: Combine two seq_printf() calls into one in yam_seq_show()

2017-05-10 Thread David Laight
From: SF Markus Elfring
> Sent: 09 May 2017 15:22
> A bit of data was put into a sequence by two separate function calls.
> Print the same data by a single function call instead.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/net/hamradio/yam.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
> index b6891ada1d7b..542f1e511df1 100644
> --- a/drivers/net/hamradio/yam.c
> +++ b/drivers/net/hamradio/yam.c
> @@ -830,8 +830,7 @@ static int yam_seq_show(struct seq_file *seq, void *v)
>   seq_printf(seq, "  RxFrames %lu\n", dev->stats.rx_packets);
>   seq_printf(seq, "  TxInt%u\n", yp->nb_mdint);
>   seq_printf(seq, "  RxInt%u\n", yp->nb_rxint);
> - seq_printf(seq, "  RxOver   %lu\n", dev->stats.rx_fifo_errors);
> - seq_printf(seq, "\n");
> + seq_printf(seq, "  RxOver   %lu\n\n", dev->stats.rx_fifo_errors);
>   return 0;

The code was consistently (and probably deliberately) using one seq_printf()
call for each line so that the source looks 'a bit like' the output.

These changes are all stupid.

David



  1   2   >