from:"David B. Robins"

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread David B. Robins


On 2016-05-04 03:58, Dean Jenkins wrote:

On 04/05/16 01:28, David B. Robins wrote:




Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the 
Ethernet

frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more 
detailed logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent 
now. I am curious to know how often the device you are using benefits 
from this block of code.


The issue is that the driver should be robust to cope with missing
URBs. Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet
adaptor in our ARM embedded system which runs in hostile environments,
it was noticed that URBs could be lost (probably due to a bug
elsewhere or low memory issue). Without this patch, a missing URB
causes bad Ethernet frames to be passed up to the IP stack because
rx->remaining spans multiple URBs.

In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is
processed and copies the 1st part of the Ethernet frame into the
netdev buffer, for the 2nd URB the remaining part of the Ethernet
frame is copied into the same netdev buffer to complete the Ethernet
frame. The netdev buffer is then sent up to the IP stack.

In the case of a missing URB, a bad Ethernet frame is created as 
follows:

The 1st URB is processed and copies the 1st part of the Ethernet frame
into the netdev buffer, the 2nd URB is lost (somehow),  the 3rd URB is
processed and blindly copies what it thinks is the remaining part of
the Ethernet frame in the same netdev buffer which corrupts the
Ethernet frame. The netdev buffer is then sent up to the IP stack. The
3rd URB and subsequent URBs are processed but synchronisation has been
lost so can misread data as a 32-bit header word. It is likely that
some good Ethernet frames get discarded whilst trying to
resynchronise.

A recovery strategy for regaining lock with the 32-bit header word is
necessary otherwise the driver will have difficulty in recovering from
a lost URB.

In the "olden days", the 32-bit header word was always at the start of
the URB buffer so previous URBs did not influence the current URB. So
no recovery strategy was needed at that time. But now we have to
remember what happened in the previous URB and a lost URB can cause a
discontinuity in the data stream because the data is not always
aligned to the start of the URB buffer.

I agree that your environment may never suffer from lost URBs so
removal of the patch would work OK.

I will try to find some time to setup a test environment.


I got the chance to add some more logging (and add back the header 
synchronization code) to a system which is receiving video data on an 
8-port POE device using AX88772Bs - it replaces the "Data header 
synchronization was lost" log and logs a few more local items ("u32" = 
rx->header):


[3105478.073908] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3105478.073918] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xf05b7c5f, offset 4
[3105478.074022] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074032] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xffaae2a5, offset 4
[3105478.074146] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518
[3105478.074280] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 
2048
[3105478.074291] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0xe1478400, offset 4

[3105478.074399] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 916
[3105478.074409] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header 
Length 0x9fa84eb5, offset 4
[3105478.074657] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 
1518


Grepping for just the SYNC_LOSS messages shows a similar pattern to that 
seen elsewhere in the thread:


[3104761.056033] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS 
remain 988 len 2048 offset 992 u32 0xecf95000
[3104761.058035] asix 1-1.

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread David B. Robins


On 2016-05-03 17:16, Dean Jenkins wrote:

On 03/05/16 15:42, David B. Robins wrote:


I don't think the first one is giving you problems (except as 
triggered by the second) but I had concerns about the second myself 
(and emailed the author off-list, but received no reply), and we did 
not take that commit for our own product.



Sorry, I might have missed your original E-mail.

Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at 
following code that only reads it for remaining == 0).


David, I think that your interpretation is incorrect. Please see below.

Here is the code snippet from the patch with my annotations between #
#, I will try to explain my intentions. Feel free to point out any
flaws:

if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
# Only runs when rx->remaining !=0 and the end of the Ethernet
frame + next 32-bit header word is within the URB buffer. #
# Therefore, this code does not run when the end of an
Ethernet frame has been reached in the previous URB #
# or when the end of the Ethernet frame + next 32-bit header
word will be in a later URB buffer #


It may well be. I don't have the setup with me now, but I can try 
tomorrow to reproduce an environment where I can add some more detailed 
logging.


Since the URB length has to be >= than the remaining data plus a u32, 
the devices that John Stultz and I are using (AX88772B in my case) may 
be adding some additional data/padding after an Ethernet frame, 
expecting it to be discarded, and running into this check and its 
consequences. This may mean the device is badly behaved, if it is 
specified not to send anything extra; in any case, a well-intentioned 
error correction has gone badly, but I better understand the intent now. 
I am curious to know how often the device you are using benefits from 
this block of code.



Regards,
Dean


David

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread David B. Robins


On 2016-05-03 00:55, John Stultz wrote:

In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4

And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).


I don't think the first one is giving you problems (except as triggered 
by the second) but I had concerns about the second myself (and emailed 
the author off-list, but received no reply), and we did not take that 
commit for our own product.


Specifically, the second change, 3f30... (original patch: 
https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) 
appears to do the exact opposite of what it claims, i.e., instead of 
"resync if this looks like a header", it does "resync if this does NOT 
look like a (packet) header", where "looks like a header" means "bits 
0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can 
happen by coincidence for 1/2048 32-bit values starting a continuation 
URB (easy to hit dealing with large volumes of video data as we were). 
It appears to expect the header for every URB whereas the rest of the 
code at least expects it only once per network packet (look at following 
code that only reads it for remaining == 0).


So that change made no sense to me, but I don't have significant kernel 
dev experience. Effectively it will drop/truncate every (2047/2048) 
split (longer than an URB) packet, and report an error for the second 
URB and then again for treating said second URB as a first URB for a 
packet. I would expect your problems will go away just removing the 
second change. You could also change the != to == in "if (size != ...)" 
but then you'd still have 1/2048 (depending on data patterns) false 
positives.



This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john


David

Re: [PATCH] net: usb: asix: Fix crash on skb alloc failure

2015-10-05 Thread David B. Robins

On 2015-10-05 06:31, David Miller wrote:

From: "David B. Robins" <li...@davidrobins.net>
Date: Wed, 30 Sep 2015 16:20:04 -0400

If asix_rx_fixup_internal() fails to allocate rx->ax_skb, it will 
return

but not clear rx->size. rx points to driver private data. A later call
assumes that nonzero size means ax_skb was allocated and passes a null
ax_skb to skb_put. Changed allocation failure return to clear size 
first.

Found testing board with AX88772B devices.

Signed-off-by: David B. Robins <li...@davidrobins.net>

Applied, thanks.

While I am happy for the patch I submitted to be applied, and it is 
consistent with existing error handling and does fix the specific issue, 
I believe a later series of 5 patches by Dean Jenkins that more 
comprehensively address the issue (grouped under subject "Improve ASIX 
RX memory allocation error handling") should be preferred.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: usb: asix: Fix crash on skb alloc failure

2015-09-30 Thread David B. Robins

If asix_rx_fixup_internal() fails to allocate rx->ax_skb, it will return
but not clear rx->size. rx points to driver private data. A later call
assumes that nonzero size means ax_skb was allocated and passes a null
ax_skb to skb_put. Changed allocation failure return to clear size first.

Found testing board with AX88772B devices.

Signed-off-by: David B. Robins <li...@davidrobins.net>
---
 drivers/net/usb/asix_common.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
index 75d6f26..079069a 100644
--- a/drivers/net/usb/asix_common.c
+++ b/drivers/net/usb/asix_common.c
@@ -91,8 +91,10 @@ int asix_rx_fixup_internal(struct usbnet *dev, struct 
sk_buff *skb,
}
rx->ax_skb = netdev_alloc_skb_ip_align(dev->net,
   rx->size);
-   if (!rx->ax_skb)
+   if (!rx->ax_skb) {
+   rx->size = 0;
return 0;
+   }
}
 
if (rx->size > dev->net->mtu + ETH_HLEN + VLAN_HLEN) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

Re: [PATCH] net: usb: asix: Fix crash on skb alloc failure

[PATCH] net: usb: asix: Fix crash on skb alloc failure

5 matches

Site Navigation

Mail list logo

Footer information