Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Wed, May 11, 2016 at 3:00 PM, Dean Jenkins wrote: > > Your observations are consistent with missing URBs from the USB host > controller. > > Here is a summary of what I think is happening in your case: > > Good case: > URB #1: 1514 octets of 1514 Ethernet frame (A) > URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet > frame (C) > URB #3: 988 octets of 1514 Ethernet frame (C) > URB #4: 1514 octets of 1514 Ethernet frame (D) > > Therefore, Ethernet frame (C) is spanning URBs #2 and #3. > > Bad case, URB #3 is lost: > URB #1: 1514 octets of 1514 Ethernet frame (A) > URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet > frame (C) > Remaining is 988 > URB #4: 1514 octets of 1514 Ethernet frame (D) > > But when URB #4 is analysed the 32-bit Header word is not found after 988 > octets in the URB buffer so "sync lost". > The end of Ethernet frame (C) is missing so drop the Ethernet frame. > Now look at the start of the URB #4 buffer and find a 32-bit header word so > Ethernet frame (D) can be consumed. > > So I think the commit is acting as intended and you are suffering from lost > URBs. No. I went digging on this for a bit longer, and it looks like its just that you're calculating the offset wrong in your check. I was wondering why without your patch we wouldn't see "Bad Header Length" messages, since if the remaining was 988 and the skb->len was 2048 as seen in my logs, without your patch we should copy the 988 bytes out clear remaining and then continue processing the rest of the skb, which calculates the header and checks the size. If we really lost the URB, we should throw an error at that point, since really we'd be midway through the following frame. But we just don't see that with your patch removed. Looking more closely, in the main loop, we do: (where offset is zero, or set to "offset += (copy_length + 1) & 0xfffe" in the previous loop) rx->header = get_unaligned_le32(skb->data + offset); offset += sizeof(u32); But your check calculates: offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32); rx->header = get_unaligned_le32(skb->data + offset); Adding some debug logic to check those offset calculation used to find rx->header, the one in your code is always too large by sizeof(u32). So removing the extra addition in your offset calculation seems to solve this for me. I'll send out a patch here shortly. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
Hi John, I have purchased a "uGreen" USB Ethernet Adaptor which was reported as showing the issue: lsusb shows: ID 0b95:772b ASIX Electronics Corp. AX88772B dmesg shows: [119591.413298] usb 2-1: new high-speed USB device number 12 using ci_hdrc [119591.576970] usb 2-1: New USB device found, idVendor=0b95, idProduct=772b [119591.576994] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [119591.577010] usb 2-1: Product: AX88772C [119591.577025] usb 2-1: Manufacturer: ASIX Elec. Corp. Strangely the product string says "AX88772C" and lsusb shows "AX88772B" I used our ARM (32-bit 2 core) board running our highly customised 3.14 kernel and ran a ping test that slowly increments the ping payload size so forcing the Ethernet frames to slowly extend in length and eventually forcing IPv4 fragmentation to occur due to the MTU limit of 1500. In my test the ICMP ping payload lengths ranged from 1 to 5000. During the test run I saw (only 3 errors): [27455.113010] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 23 [27455.113037] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x77767574, offset 4 [27456.113269] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 27 [27456.113329] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x77767574, offset 4 [27457.113271] asix 2-1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 30 [27457.113328] asix 2-1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x77767574, offset 4 This meets my expectation of "sync lost" followed immediately by "Bad Header Length". A close look at the timestamps shows gaps of around 20us to 50us which suggests the code is processing the same URB eg. "sync lost" and "Bad Header Length" are written from the same instance of asix_rx_fixup_internal(). My example suggests that the previous URB went missing so data was lost causing a discontinuity in the data stream. This was the intended purpose of the commit to prevent bad Ethernet frames being sent up the IP stack when an URB went missing. A bad Ethernet frame would otherwise be created by having the start of an Ethernet frame appended with data from the current URB causing a corrupted Ethernet frame to be generated and sent up the IP stack. Also the failure seems to be independent of the ping payload length but longer test periods of specific payload lengths would be needed to allow the 32 bit header word to move around relative to the start of the URB buffer. In my example, the "Bad Header Length 0x77767574" is reading the ping payload data of 0x74, 0x75, 0x76, 077 which is located at the start of the URB buffer. The remaining values are low at 23 to 30 which suggests the end of the Ethernet frame was in the missing URB. The ICMP ping data of 0x74, 0x75, 0x76, 077 is from the next Ethernet frame meaning the end of the current Ethernet frame is missing and the next frame has a missing start of Ethernet frame. Note that due to IPv4 fragmentation "consecutive" Ethernet frames will contain payloads of 1500 (MTU size) octets typically followed by a short Ethernet frame. The payloads are fragmented IP packets. So I've been trying to add some print messages here to better understand whats going on. Again, I'm a bit new to this code, so forgive me for my lack of understanding things. Since the remaining value seems to be key, I tried to look around and figure out where it was being set. It seems like its only set in this function, is that right? So this made me guess something might be happening in a previous iteration that was causing this to trigger. I added some debug prints to every time we set the remaining value, or modify it, as well as to print the value if we enter the fixup function with a non-zero remaining value. When we set the remaining value, usually its to 1514, when the skblen is 1518. 1514 is Ethernet header length + payload at MTU size of 1500. skblen of 1518 is 32-bit header word + maximum Ethernet frame length (for your Network). However, right before we catch the problem, I see this: I am guessing where your debug is located in the code so I may have ms-interpreted your information. [ 84.844337] JDB set remaining to 1514 (skblen: 1518) This suggests 1 maximum length Ethernet frame of 1514 octets in the URB buffer. [ 84.844379] JDB set remaining to 1514 (skblen: 1518) [ 84.844429] JDB set remaining to 1514 (skblen: 1518) [ 84.844458] JDB set remaining to 1514 (skblen: 1518) [ 84.844483] JDB set remaining to 1514 (skblen: 1518) [ 84.844507] JDB set remaining to 1514 (skblen: 1518) [ 84.844559] JDB set remaining to 1514 (skblen: 2048) This URB probably has 2 Ethernet frames; 1 complete frame plus the start of the next Ethernet frame. I think 2048 could be the maximum URB transfer length for the USB bulk transfer. 2048 seems to be a low value so should be investigated. [ 84.844583] JDB set rem
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins wrote: > > [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header > synchronisation was lost, remaining 988 > > This error message consistently shows the remaining value to be 988, at > least for the 3 examples provided by John. This does not suggest a random > failure unless there are other examples of a non 988 remaining value error > message. 988 is well within a Ethernet frame length so seems to be valid. > > I think a good step would be to add some debug to print the rx->remaining > value at entry to asix_rx_fixup_internal(). This would generate a lot of > debug but a pattern of the values might emerge. So I've been trying to add some print messages here to better understand whats going on. Again, I'm a bit new to this code, so forgive me for my lack of understanding things. Since the remaining value seems to be key, I tried to look around and figure out where it was being set. It seems like its only set in this function, is that right? So this made me guess something might be happening in a previous iteration that was causing this to trigger. I added some debug prints to every time we set the remaining value, or modify it, as well as to print the value if we enter the fixup function with a non-zero remaining value. When we set the remaining value, usually its to 1514, when the skblen is 1518. However, right before we catch the problem, I see this: [ 84.844337] JDB set remaining to 1514 (skblen: 1518) [ 84.844379] JDB set remaining to 1514 (skblen: 1518) [ 84.844429] JDB set remaining to 1514 (skblen: 1518) [ 84.844458] JDB set remaining to 1514 (skblen: 1518) [ 84.844483] JDB set remaining to 1514 (skblen: 1518) [ 84.844507] JDB set remaining to 1514 (skblen: 1518) [ 84.844559] JDB set remaining to 1514 (skblen: 2048) [ 84.844583] JDB set remaining to 1514 (skblen: 2048) [ 84.844595] JDB: 1514 > 2048 - 1522 [ 84.844606] JDB: dropping remaining by 526 [ 84.844624] asix_rx_fixup_internal() remaining: 988, skb->len: 2048 [ 84.844672] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 84.844945] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0xdd5f8f9b, offset 4 [ 84.845217] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x3ce1ad3c, offset 4 [ 84.845451] JDB set remaining to 1514 (skblen: 1518) [ 84.845485] JDB set remaining to 1514 (skblen: 1518) [ 84.845511] JDB set remaining to 1514 (skblen: 1518) [ 84.851003] JDB set remaining to 1514 (skblen: 1518) So when the issue happens, it seems to be due to an larger then usual skb (2048). The first time through the wile loop we set the remaining to 1514, but offset is fairly small, so we set copy_length to 1514, and clear remaining. The offset is bumped by a little more then the copy length and we loop again. Then the second time through we set remaining to 1514, but since offset is bigger now, the if (rx->remaining > skb->len - offset) case is true.. This is where it feels a little strange.. We calculate the copy_length as the difference between the offset and the skb->len (so how much is left in the skb, which is 526), then decrement remaining by that amount. Not really sure what remaining (now 988) is supposed to represent here. We copy the 526 bytes, and then exit the loop. Now the next time we are called, we enter and we have a remaining value still of 988, which triggers the header synchronization error path. Now, I'm not sure if the remainder handling logic is sane, or if the skb->len being 2048 is problematic, or what. The skb->lens can vary in sizes, usually 1518 during high throughput, but I've seen 1588, 1640, and other larger numbers that don't trigger the same problem. ie: [ 106.946473] JDB set remaining to 1514 (skblen: 1518) [ 106.946525] JDB set remaining to 1514 (skblen: 1640) [ 106.946546] JDB set remaining to 118 (skblen: 1640) [ 106.946586] JDB set remaining to 1514 (skblen: 1518) So yea.. maybe that will help clue things in a bit? I'm still a bit lost. :) thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 2016-05-04 03:58, Dean Jenkins wrote: On 04/05/16 01:28, David B. Robins wrote: Here is the code snippet from the patch with my annotations between # #, I will try to explain my intentions. Feel free to point out any flaws: if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) { # Only runs when rx->remaining !=0 and the end of the Ethernet frame + next 32-bit header word is within the URB buffer. # # Therefore, this code does not run when the end of an Ethernet frame has been reached in the previous URB # # or when the end of the Ethernet frame + next 32-bit header word will be in a later URB buffer # It may well be. I don't have the setup with me now, but I can try tomorrow to reproduce an environment where I can add some more detailed logging. Since the URB length has to be >= than the remaining data plus a u32, the devices that John Stultz and I are using (AX88772B in my case) may be adding some additional data/padding after an Ethernet frame, expecting it to be discarded, and running into this check and its consequences. This may mean the device is badly behaved, if it is specified not to send anything extra; in any case, a well-intentioned error correction has gone badly, but I better understand the intent now. I am curious to know how often the device you are using benefits from this block of code. The issue is that the driver should be robust to cope with missing URBs. Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet adaptor in our ARM embedded system which runs in hostile environments, it was noticed that URBs could be lost (probably due to a bug elsewhere or low memory issue). Without this patch, a missing URB causes bad Ethernet frames to be passed up to the IP stack because rx->remaining spans multiple URBs. In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is processed and copies the 1st part of the Ethernet frame into the netdev buffer, for the 2nd URB the remaining part of the Ethernet frame is copied into the same netdev buffer to complete the Ethernet frame. The netdev buffer is then sent up to the IP stack. In the case of a missing URB, a bad Ethernet frame is created as follows: The 1st URB is processed and copies the 1st part of the Ethernet frame into the netdev buffer, the 2nd URB is lost (somehow), the 3rd URB is processed and blindly copies what it thinks is the remaining part of the Ethernet frame in the same netdev buffer which corrupts the Ethernet frame. The netdev buffer is then sent up to the IP stack. The 3rd URB and subsequent URBs are processed but synchronisation has been lost so can misread data as a 32-bit header word. It is likely that some good Ethernet frames get discarded whilst trying to resynchronise. A recovery strategy for regaining lock with the 32-bit header word is necessary otherwise the driver will have difficulty in recovering from a lost URB. In the "olden days", the 32-bit header word was always at the start of the URB buffer so previous URBs did not influence the current URB. So no recovery strategy was needed at that time. But now we have to remember what happened in the previous URB and a lost URB can cause a discontinuity in the data stream because the data is not always aligned to the start of the URB buffer. I agree that your environment may never suffer from lost URBs so removal of the patch would work OK. I will try to find some time to setup a test environment. I got the chance to add some more logging (and add back the header synchronization code) to a system which is receiving video data on an 8-port POE device using AX88772Bs - it replaces the "Data header synchronization was lost" log and logs a few more local items ("u32" = rx->header): [3105478.073908] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() SYNC_LOSS remain 988 len 2048 offset 992 u32 0xecf95000 [3105478.073918] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header Length 0xf05b7c5f, offset 4 [3105478.074022] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 2048 [3105478.074032] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header Length 0xffaae2a5, offset 4 [3105478.074146] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 1518 [3105478.074280] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 2048 [3105478.074291] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header Length 0xe1478400, offset 4 [3105478.074399] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() skb->len = 916 [3105478.074409] asix 1-1.4.2.2:1.0 poe7: asix_rx_fixup() Bad Header Length 0x9fa84eb5, offset 4 [3105478.074657] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() skb->len = 1518 Grepping for just the SYNC_LOSS messages shows a similar pattern to that seen elsewhere in the thread: [3104761.056033] asix 1-1.4.1.4:1.0 poe6: asix_rx_fixup() SYNC_LOSS remain 988 len 2048 offset 992 u32 0xecf95000 [3104761.058035] asix 1-1.4.1.3:1.0 poe2: asix_rx_fixup() SYNC_LOSS remain 988 len 2048 offset 992 u32 0xecf950
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins wrote: > A good test would be to run "ping -c 1 -s $packet_length $ip_address" inside > a script which has a loop with an increasing payload length $packet_length > with a small delay between ping calls. This will show whether particular > packet sizes trigger the failures. > > Then try with "ping -f -c 200 -s $packet_length $ip_address" to load up the > USB link. I've tried both of these on my x86_64 system. I can send single pings up to 65507 without triggering the issue (after which I get errors sending on the host side as I think I cross a 64k boundary with headers, not the asix errors). Then when I try ping -f -c 200 -s 65507 $ip_address, I don't see any failures. I did it for a count of 2000 as well without any issues. I'll be adding more debug prints in soon. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Fri, May 6, 2016 at 8:00 AM, Dean Jenkins wrote: > My conclusion is that your USB to Ethernet Adaptor is not running at high > speed (480Mbps) mode which is causing a partial loss (corruption) of > Ethernet frames across the USB link. A USB Protocol Analyser or software > tool usbmon could be used to confirm this scenario. > > Therefore please retest with a working high-speed USB hub or remove the > full-speed USB hub from the test environment and directly connect the USB to > Ethernet Adaptor to the root hub of the USB port. Then repeat the tests to > see whether anything improved. > > In other words, you need to eliminate the dmesg messages saying "not running > at top speed; connect to a high speed hub". The aarch64 system has a quirk that at the moment limits it to the slower full-speed mode, which also exacerbates the issue (basically taking a fairly slow 1.1.Mb/s network connection without your patch, to an almost unusable 30Kb/s with it). But that isn't the case on the x86_64 system, which is seeing a very similar problem (though the performance effect isn't nearly as bad, as the error rate in time seems relatively similar on both, and I think my scp transmissions are cpu bound on this atom board :). thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Thu, May 5, 2016 at 1:11 AM, Dean Jenkins wrote: > On 05/05/16 00:45, John Stultz wrote: >> >> Just as a sample point, I have managed to reproduce exactly this issue >> on an x86_64 system by simply scp'ing a large file. > > Please tell us the x86_64 kernel version number that you used and which > Linux Distribution it was ? This allows other people a chance to reproduce > your observations. Sorry for being a little slow here, had some other issues I had to chase. On my x86_64 system, its Ubuntu 14.04.4, with a 4.6.0-rc2 kernel. >> [ 417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 > > It is interesting that the reported "remaining" value is 988. Is 988 always > shown ? I mean that do you see any other "remaining" values for the "Data > Header synchronisation was lost" error message ? Yep. Its always the same 988 remaining, on either architecture. >> [ 417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0xef830347, offset 4 > > The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms > which is a large gap in terms of USB 2.0 high speed communications. This gap > is expected to be in the 100us range for consecutive URBs. So 4ms is > strange. > > The expectation is that the "Data Header synchronisation was lost" error > message resets the 32-bit header word synchronisation to the start of the > next URB buffer. The "Bad Header Length, offset 4" is the expected outcome > for the next URB because it is unlikely the 32-bit header word is at the > start of URB buffer due to Ethernet frames spanning across URBs. >> >> [ 417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x31e2b348, offset 4 > > Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed, > are you sure high speed mode is being used ? >> Yep, on my x86_64 system, it seems to be. [3.101115] usb 1-5: new high-speed USB device number 2 using ehci-pci [3.232309] usb 1-5: New USB device found, idVendor=0b95, idProduct=772b [3.232327] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [3.232339] usb 1-5: Product: AX88772B [3.232350] usb 1-5: Manufacturer: ASIX Elec. Corp. [3.232360] usb 1-5: SerialNumber: 188298 [4.032206] asix 1-5:1.0 eth1: register 'asix' at usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98 > Please can you supply the output of ifconfig for the USB to Ethernet > adaptor, your example above shows eth1 as the device. > > Please show the output of ifconfig eth1 before and after the issue is seen. > This will show us whether the kernel logs any network errors and how many > bytes have been transferred. Before: $ ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:50:b6:18:82:98 inet addr:192.168.0.12 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:372 errors:0 dropped:0 overruns:0 frame:0 TX packets:385 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:38523 (38.5 KB) TX bytes:48801 (48.8 KB) After: $ ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:50:b6:18:82:98 inet addr:192.168.0.12 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:151005 errors:169 dropped:0 overruns:0 frame:0 TX packets:61351 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:225874384 (225.8 MB) TX bytes:4431098 (4.4 MB) > After the issue is seen, please can you show us the output of "dmesg | grep > asix" so that we can see status messages from the ASIX driver that the USB > to Ethernet adaptor is using. In particular we need to check that USB high > speed operation (480Mbps) is being used and not full speed operation > (12Mbps). [2.766525] usbcore: registered new interface driver asix [4.031443] asix 1-5:1.0 eth1: register 'asix' at usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98 [ 31.578983] asix 1-5:1.0 eth1: link down [ 33.244743] asix 1-5:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xCDE1 [ 171.959244] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 171.959530] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x1651c2bf, offset 4 [ 171.959768] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0xfcf61092, offset 4 [ 171.960001] asix 1-5:1.0 eth1: asix_rx_fixup() Ba
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 06/05/16 16:27, Andrew Lunn wrote: In other words, the full-speed hub is restricting the USB to Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support Ethernet 100Mbps (full-duplex) traffic. That is not going to work very well because Ethernet frames (perhaps partial Ethernet frames) need to be discarded within the USB link. If that really is true, the design is broken. I would expect the adaptor to reliably transfer whole frames over USB, and drop whole frames from its receive queue when the USB is congested. TCP is also going to see the USB bottleneck as just like any bottleneck in the network and back off. So TCP streams should not cause major congestion on the USB link. The host's USB host controller polls the USB to Ethernet adaptor for more data. The USB to Ethernet adaptor cannot predict when the next poll request comes. The AX88772B can span Ethernet frames across multiple poll requests. This means it is possible get a partial Ethernet frame received in the USB host controller on one poll and it is assumed that the next poll (sometime in the near future) will get the remaining part of the Ethernet frame. However, the USB to Ethernet adaptor does not contain an infinitely sized RX Ethernet buffer for the incoming Ethernet frames. I believe the USB to Ethernet adaptor is just a pipe and does not directly implement flow control for Ethernet frames so the RX buffer is going to overflow causing loss of whole Ethernet frames. I suspect the IP stack in the host computer implements flow control for Ethernet frames. Because the AX88772B can span Ethernet frames across multiple poll requests there is a risk that the designers of the device could of implemented a solution to discard the remaining part of the Ethernet frame before the next poll arrives due to the RX buffer overflowing. I don't know the algorithm used in the AX88772B but there will be loss of data due to the mismatch in bandwidths. I agree that dropping whole Ethernet frames would be preferable to dropping partial Ethernet frames which would corrupt the data stream. My suspicion is that the URB buffers are containing discontinues in the data stream because of lost data due to insufficient bandwidth on the USB link. Going over a 12Mbps USB link should be no different to hitting an old Ethernet hub which can only do 10/Half. Not exactly, because USB is a transport link which is agnostic to the type of data that is flowing. It is up to the layers above USB to manage the data content. In other words, the USB speed needs to be higher than the Ethernet speed to avoid mismatches in bandwidth. Therefore please retest with a working high-speed USB hub or remove the full-speed USB hub from the test environment and directly connect the USB to Ethernet Adaptor to the root hub of the USB port. Then repeat the tests to see whether anything improved. In other words, you need to eliminate the dmesg messages saying "not running at top speed; connect to a high speed hub". I would also suggest testing with the Ethernet at 10/half. You should be able to use Ethtool to set that up. Your USB and Ethernet bandwidth become more equal. If you still see errors, it suggests a protocol implementation error somewhere. I agree with the suggestion but I hope USB high speed (480Mbps) operation was the intended environment rather than the useless USB full speed (12Mbps) operation. Let's hope that not using the USB hub improves things. Regards, Dean Andrew -- Dean Jenkins Embedded Software Engineer Linux Transportation Solutions Mentor Embedded Software Division Mentor Graphics (UK) Ltd. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
> In other words, the full-speed hub is restricting the USB to > Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support > Ethernet 100Mbps (full-duplex) traffic. That is not going to work > very well because Ethernet frames (perhaps partial Ethernet frames) > need to be discarded within the USB link. If that really is true, the design is broken. I would expect the adaptor to reliably transfer whole frames over USB, and drop whole frames from its receive queue when the USB is congested. TCP is also going to see the USB bottleneck as just like any bottleneck in the network and back off. So TCP streams should not cause major congestion on the USB link. Going over a 12Mbps USB link should be no different to hitting an old Ethernet hub which can only do 10/Half. > Therefore please retest with a working high-speed USB hub or remove > the full-speed USB hub from the test environment and directly > connect the USB to Ethernet Adaptor to the root hub of the USB port. > Then repeat the tests to see whether anything improved. > > In other words, you need to eliminate the dmesg messages saying "not > running at top speed; connect to a high speed hub". I would also suggest testing with the Ethernet at 10/half. You should be able to use Ethtool to set that up. Your USB and Ethernet bandwidth become more equal. If you still see errors, it suggests a protocol implementation error somewhere. Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 05/05/16 13:19, Guodong Xu wrote: Hi, Dean I am not sure why do you insist 'not full speed'. Actually, the tests I run on ARM-64bit is at USB full speed mode. I pasted my log here: http://paste.ubuntu.com/16236442/ , which includes the information you requested above, ifconfig, dmesg. The interval between two consecutive errors varies from 10 to 40ms. Your log from http://paste.ubuntu.com/16236442/ shows high speed for device 3 is not being used: [3.586968] usb 1-1: new full-speed USB device number 2 using dwc2 [3.792091] usb 1-1: not running at top speed; connect to a high speed hub [3.800477] hub 1-1:1.0: USB hub found [3.803658] hub 1-1:1.0: 3 ports detected [4.086636] usb 1-1.2: new full-speed USB device number 3 using dwc2 [4.202209] usb 1-1.2: not running at top speed; connect to a high speed hub [8.851236] asix 1-1.2:1.0 eth0: register 'asix' at usb-f72c.usb-1.2, ASIX AX88772B USB 2.0 Ethernet, 00:0e:c6:fa:bf:fd Hopefully, you know USB 2.0 high speed (480Mbps) is faster than full speed (12Mbps) mode. Therefore, your USB to Ethernet Adaptor is not running in its optimal "normal" high speed operation and there is a USB hub in the way that is not running at USB high speed mode. This is an abnormal configuration and potentially explains some of your failure observations. Running at full-speed (12Mbps) mode would explain why the timestamps has gaps of ms rather than us gaps (for 480Mbps). In other words, the full-speed hub is restricting the USB to Ethernet Adaptor to a 12Mbps (half-duplex) bandwidth to support Ethernet 100Mbps (full-duplex) traffic. That is not going to work very well because Ethernet frames (perhaps partial Ethernet frames) need to be discarded within the USB link. Your ifconfig output from http://paste.ubuntu.com/16236442/ shows 249 errors eth0 Link encap:Ethernet HWaddr 00:0e:c6:fa:bf:fd inet addr:192.168.1.11 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:865 errors:249 dropped:0 overruns:0 frame:0 TX packets:880 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1228273 (1.1 MiB) TX bytes:68955 (67.3 KiB) Before the test RX packets:28 errors:0 dropped:0 overruns:0 frame:0 After the test RX packets:865 errors:249 dropped:0 overruns:0 frame:0 Good test packets = 865 - 28 = 837 Detected bad Ethernet frames = 249 Bad to good ratio is 249:837 = 1:3.36 so 1 detected bad Ethernet frame per 3.36 good Ethernet frames Your ifconfig output from http://paste.ubuntu.com/16236764/ shows 1282 errors eth0 Link encap:Ethernet HWaddr 00:0e:c6:fa:bf:fd inet addr:192.168.1.11 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:55 errors:1282 dropped:0 overruns:0 frame:0 TX packets:64 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:14287 (13.9 KiB) TX bytes:7639 (7.4 KiB) Before the test RX packets:19 errors:0 dropped:0 overruns:0 frame:0 After the test RX packets:55 errors:1282 dropped:0 overruns:0 frame:0 Good test packets = 55 - 19 = 36 Detected bad Ethernet frames = 1282 Bad to good ratio is 1282:36 = 1:0.28 so 1 detected bad Ethernet frame per 0.028 good Ethernet frames This suggests a very high error rate. It is interesting that the reported "remaining" value is 988. Is 988 always shown ? I mean that do you see any other "remaining" values for the "Data Header synchronisation was lost" error message ? Yes and No. When doing iperf test in TCP mode, always 988. I have never seen other "remaining" value. But, 1. I tried "ping -f -s 1400 [my.arm.64bit.board.ip]", but this cannot trigger the error. 2. Tried iperf in UDP mode, I saw "Data Header synchronisation was lost" remaining value is 984 (again, seemingly always in several tries). Log is pasted here. http://paste.ubuntu.com/16236764/ In http://paste.ubuntu.com/16236764/ you see very many [ 41.938370] asix 1-1.2:1.0 eth0: asix_rx_fixup() Bad Header Length 0x11400040, offset 4 but only a few [ 42.214607] asix 1-1.2:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 984 This suggests that the "Bad Header Length" and "Data Header synchronisation was lost" error messages are not related to consecutive URBs. The expectation is that a "Data Header synchronisation was lost" error message is immediately followed by a "Bad Header Length" message with a timestamp much less than 1ms (for high speed USB). This is because an Ethernet frame that spans URBs needs low latency so should be sent quickly in consecutive URBs. The Bad Header Length error messages with offset 4 indicates that 32-bit header word was not found in the expected location at the start of the URB buffer. [ 41.938370] asix 1-1.2:1.0 eth0: asix_rx_fixup() Bad Header Le
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 5 May 2016 at 16:11, Dean Jenkins wrote: > On 05/05/16 00:45, John Stultz wrote: >> >> On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins >> wrote: >>> >>> On 03/05/16 11:04, Guodong Xu wrote: did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey is an ARM 64-bit system. I suggest we should be careful on that. I saw similar issues when transferring to a 64-bit system in other net drivers. >>> >>> We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the >>> commits need to be reviewed with 64-bit OS in mind. Do you have any suggestion on this regard? >>> >>> Try testing on a Linux PC x86 32-bit OS which has has a kernel containing >>> my >>> ASIX commits. This will help to confirm whether the failure is related to >>> 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should >>> fail >>> otherwise it points to something specific in your ARM 64-bit platform. >> >> Just as a sample point, I have managed to reproduce exactly this issue >> on an x86_64 system by simply scp'ing a large file. > > Please tell us the x86_64 kernel version number that you used and which > Linux Distribution it was ? This allows other people a chance to reproduce > your observations. > >> >> [ 417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 > > It is interesting that the reported "remaining" value is 988. Is 988 always > shown ? I mean that do you see any other "remaining" values for the "Data > Header synchronisation was lost" error message ? > >> [ 417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0xef830347, offset 4 > > The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms > which is a large gap in terms of USB 2.0 high speed communications. This gap > is expected to be in the 100us range for consecutive URBs. So 4ms is > strange. > > The expectation is that the "Data Header synchronisation was lost" error > message resets the 32-bit header word synchronisation to the start of the > next URB buffer. The "Bad Header Length, offset 4" is the expected outcome > for the next URB because it is unlikely the 32-bit header word is at the > start of URB buffer due to Ethernet frames spanning across URBs. >> >> [ 417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x31e2b348, offset 4 > > Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed, > are you sure high speed mode is being used ? >> >> [ 417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 >> [ 417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x8af91035, offset 4 >> [ 417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x8521fa03, offset 4 >> [ 418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 >> [ 418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x33cd9c7c, offset 4 >> [ 418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x3d850896, offset 4 >> [ 418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0x86443357, offset 4 >> [ 418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length >> 0xee6c81d1, offset 4 >> >> I don't have any 32bit x86 installs around so I'm not sure I can easly >> test there, but its clear its not arm64 specific. > > I agree the issue is not specific to your ARM 64 bit platform. > > Please can you supply the output of ifconfig for the USB to Ethernet > adaptor, your example above shows eth1 as the device. > > Please show the output of ifconfig eth1 before and after the issue is seen. > This will show us whether the kernel logs any network errors and how many > bytes have been transferred. > > After the issue is seen, please can you show us the output of "dmesg | grep > asix" so that we can see status messages from the ASIX driver that the USB > to Ethernet adaptor is using. In particular we need to check that USB high > speed operation (480Mbps) is being used and not full speed operation > (12Mbps). Hi, Dean I am not sure why do you insist 'not full speed'. Actually, the tests I run on ARM-64bit is at USB full speed mode. I pasted my log here: http://paste.ubuntu.com/16236442/ , which includes the information you requested above, ifconfig, dmesg. The interval between two consecutive errors varies from 10 to 40ms. > It is interesting that the reported "remaining" value is 988. Is 988 always > shown ? I mean that do you see any other "remaining" values for the "Data > Header synchronisation was lost" error message ? Yes and No. When doing iperf test in TCP mode, always 988. I have never seen other "remaining" value. But, 1. I tried "ping -f -s 1400 [my.arm.64bit.board.ip]", but this cannot trigger the error. 2. Tried iperf in UDP mode, I saw "Data Header synchronisation was lost" remaining value is 984 (again, seemingly always in several tries). Log is
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 05/05/16 00:45, John Stultz wrote: On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins wrote: On 03/05/16 11:04, Guodong Xu wrote: did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey is an ARM 64-bit system. I suggest we should be careful on that. I saw similar issues when transferring to a 64-bit system in other net drivers. We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the commits need to be reviewed with 64-bit OS in mind. Do you have any suggestion on this regard? Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my ASIX commits. This will help to confirm whether the failure is related to 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail otherwise it points to something specific in your ARM 64-bit platform. Just as a sample point, I have managed to reproduce exactly this issue on an x86_64 system by simply scp'ing a large file. Please tell us the x86_64 kernel version number that you used and which Linux Distribution it was ? This allows other people a chance to reproduce your observations. [ 417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 It is interesting that the reported "remaining" value is 988. Is 988 always shown ? I mean that do you see any other "remaining" values for the "Data Header synchronisation was lost" error message ? [ 417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0xef830347, offset 4 The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms which is a large gap in terms of USB 2.0 high speed communications. This gap is expected to be in the 100us range for consecutive URBs. So 4ms is strange. The expectation is that the "Data Header synchronisation was lost" error message resets the 32-bit header word synchronisation to the start of the next URB buffer. The "Bad Header Length, offset 4" is the expected outcome for the next URB because it is unlikely the 32-bit header word is at the start of URB buffer due to Ethernet frames spanning across URBs. [ 417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x31e2b348, offset 4 Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed, are you sure high speed mode is being used ? [ 417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x8af91035, offset 4 [ 417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x8521fa03, offset 4 [ 418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x33cd9c7c, offset 4 [ 418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x3d850896, offset 4 [ 418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x86443357, offset 4 [ 418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0xee6c81d1, offset 4 I don't have any 32bit x86 installs around so I'm not sure I can easly test there, but its clear its not arm64 specific. I agree the issue is not specific to your ARM 64 bit platform. Please can you supply the output of ifconfig for the USB to Ethernet adaptor, your example above shows eth1 as the device. Please show the output of ifconfig eth1 before and after the issue is seen. This will show us whether the kernel logs any network errors and how many bytes have been transferred. After the issue is seen, please can you show us the output of "dmesg | grep asix" so that we can see status messages from the ASIX driver that the USB to Ethernet adaptor is using. In particular we need to check that USB high speed operation (480Mbps) is being used and not full speed operation (12Mbps). Thanks, Regards, Dean thanks -john -- Dean Jenkins Embedded Software Engineer Linux Transportation Solutions Mentor Embedded Software Division Mentor Graphics (UK) Ltd. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins wrote: > On 03/05/16 11:04, Guodong Xu wrote: >> >> did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey >> is an ARM 64-bit system. I suggest we should be careful on that. I saw >> similar issues when transferring to a 64-bit system in other net >> drivers. > > We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the > commits need to be reviewed with 64-bit OS in mind. >> >> >> Do you have any suggestion on this regard? > > Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my > ASIX commits. This will help to confirm whether the failure is related to > 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail > otherwise it points to something specific in your ARM 64-bit platform. Just as a sample point, I have managed to reproduce exactly this issue on an x86_64 system by simply scp'ing a large file. [ 417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0xef830347, offset 4 [ 417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x31e2b348, offset 4 [ 417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x8af91035, offset 4 [ 417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x8521fa03, offset 4 [ 418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x33cd9c7c, offset 4 [ 418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x3d850896, offset 4 [ 418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0x86443357, offset 4 [ 418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length 0xee6c81d1, offset 4 I don't have any 32bit x86 installs around so I'm not sure I can easly test there, but its clear its not arm64 specific. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 04/05/16 01:28, David B. Robins wrote: Here is the code snippet from the patch with my annotations between # #, I will try to explain my intentions. Feel free to point out any flaws: if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) { # Only runs when rx->remaining !=0 and the end of the Ethernet frame + next 32-bit header word is within the URB buffer. # # Therefore, this code does not run when the end of an Ethernet frame has been reached in the previous URB # # or when the end of the Ethernet frame + next 32-bit header word will be in a later URB buffer # It may well be. I don't have the setup with me now, but I can try tomorrow to reproduce an environment where I can add some more detailed logging. Since the URB length has to be >= than the remaining data plus a u32, the devices that John Stultz and I are using (AX88772B in my case) may be adding some additional data/padding after an Ethernet frame, expecting it to be discarded, and running into this check and its consequences. This may mean the device is badly behaved, if it is specified not to send anything extra; in any case, a well-intentioned error correction has gone badly, but I better understand the intent now. I am curious to know how often the device you are using benefits from this block of code. The issue is that the driver should be robust to cope with missing URBs. Whilst testing with D-Link DUB-E100 C1 AX88772 USB to Ethernet adaptor in our ARM embedded system which runs in hostile environments, it was noticed that URBs could be lost (probably due to a bug elsewhere or low memory issue). Without this patch, a missing URB causes bad Ethernet frames to be passed up to the IP stack because rx->remaining spans multiple URBs. In the good case of an Ethernet frame spanning 2 URBs, the 1st URB is processed and copies the 1st part of the Ethernet frame into the netdev buffer, for the 2nd URB the remaining part of the Ethernet frame is copied into the same netdev buffer to complete the Ethernet frame. The netdev buffer is then sent up to the IP stack. In the case of a missing URB, a bad Ethernet frame is created as follows: The 1st URB is processed and copies the 1st part of the Ethernet frame into the netdev buffer, the 2nd URB is lost (somehow), the 3rd URB is processed and blindly copies what it thinks is the remaining part of the Ethernet frame in the same netdev buffer which corrupts the Ethernet frame. The netdev buffer is then sent up to the IP stack. The 3rd URB and subsequent URBs are processed but synchronisation has been lost so can misread data as a 32-bit header word. It is likely that some good Ethernet frames get discarded whilst trying to resynchronise. A recovery strategy for regaining lock with the 32-bit header word is necessary otherwise the driver will have difficulty in recovering from a lost URB. In the "olden days", the 32-bit header word was always at the start of the URB buffer so previous URBs did not influence the current URB. So no recovery strategy was needed at that time. But now we have to remember what happened in the previous URB and a lost URB can cause a discontinuity in the data stream because the data is not always aligned to the start of the URB buffer. I agree that your environment may never suffer from lost URBs so removal of the patch would work OK. I will try to find some time to setup a test environment. Regards, Dean -- Dean Jenkins Embedded Software Engineer Linux Transportation Solutions Mentor Embedded Software Division Mentor Graphics (UK) Ltd. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 2016-05-03 17:16, Dean Jenkins wrote: On 03/05/16 15:42, David B. Robins wrote: I don't think the first one is giving you problems (except as triggered by the second) but I had concerns about the second myself (and emailed the author off-list, but received no reply), and we did not take that commit for our own product. Sorry, I might have missed your original E-mail. Specifically, the second change, 3f30... (original patch: https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) appears to do the exact opposite of what it claims, i.e., instead of "resync if this looks like a header", it does "resync if this does NOT look like a (packet) header", where "looks like a header" means "bits 0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence for 1/2048 32-bit values starting a continuation URB (easy to hit dealing with large volumes of video data as we were). It appears to expect the header for every URB whereas the rest of the code at least expects it only once per network packet (look at following code that only reads it for remaining == 0). David, I think that your interpretation is incorrect. Please see below. Here is the code snippet from the patch with my annotations between # #, I will try to explain my intentions. Feel free to point out any flaws: if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) { # Only runs when rx->remaining !=0 and the end of the Ethernet frame + next 32-bit header word is within the URB buffer. # # Therefore, this code does not run when the end of an Ethernet frame has been reached in the previous URB # # or when the end of the Ethernet frame + next 32-bit header word will be in a later URB buffer # It may well be. I don't have the setup with me now, but I can try tomorrow to reproduce an environment where I can add some more detailed logging. Since the URB length has to be >= than the remaining data plus a u32, the devices that John Stultz and I are using (AX88772B in my case) may be adding some additional data/padding after an Ethernet frame, expecting it to be discarded, and running into this check and its consequences. This may mean the device is badly behaved, if it is specified not to send anything extra; in any case, a well-intentioned error correction has gone badly, but I better understand the intent now. I am curious to know how often the device you are using benefits from this block of code. Regards, Dean David -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 03/05/16 15:42, David B. Robins wrote: I don't think the first one is giving you problems (except as triggered by the second) but I had concerns about the second myself (and emailed the author off-list, but received no reply), and we did not take that commit for our own product. Sorry, I might have missed your original E-mail. Specifically, the second change, 3f30... (original patch: https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) appears to do the exact opposite of what it claims, i.e., instead of "resync if this looks like a header", it does "resync if this does NOT look like a (packet) header", where "looks like a header" means "bits 0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence for 1/2048 32-bit values starting a continuation URB (easy to hit dealing with large volumes of video data as we were). It appears to expect the header for every URB whereas the rest of the code at least expects it only once per network packet (look at following code that only reads it for remaining == 0). David, I think that your interpretation is incorrect. Please see below. Here is the code snippet from the patch with my annotations between # #, I will try to explain my intentions. Feel free to point out any flaws: if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) { # Only runs when rx->remaining !=0 and the end of the Ethernet frame + next 32-bit header word is within the URB buffer. # # Therefore, this code does not run when the end of an Ethernet frame has been reached in the previous URB # # or when the end of the Ethernet frame + next 32-bit header word will be in a later URB buffer # # offset is an index to the expected next 32-bit header word after the end of the Ethernet frame # offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32); # rx->header contains the expected 32-bit header value corrected for Endianness and alignment # rx->header = get_unaligned_le32(skb->data + offset); offset = 0; # check the data integrity of the size value from the header word # size = (u16)(rx->header & 0x7ff); # if the size value fails the integrity check then we are not looking at a valid header word so # # synchronisation has been lost # if (size != ((~rx->header >> 16) & 0x7ff)) { netdev_err(dev->net, "asix_rx_fixup() Data Header synchronisation was lost, remaining %d\n", rx->remaining); if (rx->ax_skb) { kfree_skb(rx->ax_skb); rx->ax_skb = NULL; /* Discard the incomplete netdev Ethernet frame * and assume the Data header is at the start of * the current URB socket buffer. */ } rx->remaining = 0; } } So that change made no sense to me, but I don't have significant kernel dev experience. Effectively it will drop/truncate every (2047/2048) split (longer than an URB) packet, and report an error for the second URB and then again for treating said second URB as a first URB for a packet. I would expect your problems will go away just removing the second change. You could also change the != to == in "if (size != ...)" but then you'd still have 1/2048 (depending on data patterns) false positives. The code only runs when the Ethernet frame spans across URBs and is checking that the next 32-bit header word is present and valid. Upon loss of synchronisation, the strategy is to assume that the 32-bit header is at the start of the URB buffer. Obviously, that might not be true every time but it is the most likely location especially when Ethernet frames are not spanning URBs at that point at time. Looking at the error messages: [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x54ebb5ec, offset 4 The offset 4 means that the 32-bit header word was invalid at the start of the URB buffer. This could be a consequence of data synchronisation being lost however, we would expect the timestamps between the error messages of "synchronisation was lost" and "Bad Header Length" to very close as they would be consecutive URBs. The evidence is showing 10ms gaps which does not suggest consecutive URBs. In other words, an Ethernet frame should not be spanned over a time gap of 10ms as that would be very inefficient. If that were true then there would be USB communications problem with the USB to Ethernet adaptor which I hope is not true. [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 This error message consistently shows the remaining value to be 988, at least for the 3 examples provided by John. This does not suggest a random failure unless there are other examples of a non 988 remaining value error message. 988 is well within a Ethernet frame
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On Tue, May 3, 2016 at 7:42 AM, David B. Robins wrote: > On 2016-05-03 00:55, John Stultz wrote: >> >> Looking through the commits since the v4.1 kernel where we didn't see >> this, I narrowed the regression down, and reverting the following two >> commits seems to avoid the problem: >> >> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB >> if no RX netdev buffer >> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating >> bad Ethernet frames >> > > I don't think the first one is giving you problems (except as triggered by > the second) but I had concerns about the second myself (and emailed the > author off-list, but received no reply), and we did not take that commit for > our own product. Yes, the first/later commit is being reverted as it modifies code that was also changed by the second/earlier one. So the 3f30 patch doesn't revert cleanly by itself, but I have tested by just removing the (now modified by 6a57) chunk of code it adds does seem to avoid the problem as well. Though I wasn't able to review things closely enough to be confident that didn't introduce any subtle bugs in the remaining logic in the 6a57 patch. > Specifically, the second change, 3f30... (original patch: > https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) > appears to do the exact opposite of what it claims, i.e., instead of "resync > if this looks like a header", it does "resync if this does NOT look like a > (packet) header", where "looks like a header" means "bits 0-10 (size) are > equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence > for 1/2048 32-bit values starting a continuation URB (easy to hit dealing > with large volumes of video data as we were). It appears to expect the > header for every URB whereas the rest of the code at least expects it only > once per network packet (look at following code that only reads it for > remaining == 0). > > So that change made no sense to me, but I don't have significant kernel dev > experience. Effectively it will drop/truncate every (2047/2048) split > (longer than an URB) packet, and report an error for the second URB and then > again for treating said second URB as a first URB for a packet. I would > expect your problems will go away just removing the second change. You could > also change the != to == in "if (size != ...)" but then you'd still have > 1/2048 (depending on data patterns) false positives. I'll have to look into this more. I'm not super familiar with usb or networking, so I'm not sure I can judge the better approach. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 2016-05-03 00:55, John Stultz wrote: In testing with HiKey, we found that since commit 3f30b158eba5c60 (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of noise during network transfers: [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x54ebb5ec, offset 4 [ 239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xcdffe7a2, offset 4 [ 239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x1d36f59d, offset 4 [ 239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xaef3c1e9, offset 4 [ 239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x2881912, offset 4 [ 239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x5638f7e2, offset 4 And network throughput ends up being pretty bursty and slow with a overall throughput of at best ~30kB/s. Looking through the commits since the v4.1 kernel where we didn't see this, I narrowed the regression down, and reverting the following two commits seems to avoid the problem: 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB if no RX netdev buffer 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating bad Ethernet frames With these reverted, we don't see all the error messages, and we see better ~1.1MB/s throughput (I've got a mouse plugged in, so I think the usb host is only running at "full-speed" mode here). I don't think the first one is giving you problems (except as triggered by the second) but I had concerns about the second myself (and emailed the author off-list, but received no reply), and we did not take that commit for our own product. Specifically, the second change, 3f30... (original patch: https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1) appears to do the exact opposite of what it claims, i.e., instead of "resync if this looks like a header", it does "resync if this does NOT look like a (packet) header", where "looks like a header" means "bits 0-10 (size) are equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence for 1/2048 32-bit values starting a continuation URB (easy to hit dealing with large volumes of video data as we were). It appears to expect the header for every URB whereas the rest of the code at least expects it only once per network packet (look at following code that only reads it for remaining == 0). So that change made no sense to me, but I don't have significant kernel dev experience. Effectively it will drop/truncate every (2047/2048) split (longer than an URB) packet, and report an error for the second URB and then again for treating said second URB as a first URB for a packet. I would expect your problems will go away just removing the second change. You could also change the != to == in "if (size != ...)" but then you'd still have 1/2048 (depending on data patterns) false positives. This worries me some, as the patches seem to describe trying to fix the issue they seem to cause, so I suspect a revert isn't the correct solution, but am not sure why we're having such trouble and the patch authors did not. I'd be happy to do further testing of patches if folks have any ideas. Originally Reported-by: Yongqin Liu thanks -john David -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 03/05/16 11:04, Guodong Xu wrote: On 3 May 2016 at 17:23, Dean Jenkins wrote: On 03/05/16 05:55, John Stultz wrote: In testing with HiKey, we found that since commit 3f30b158eba5c60 (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of noise during network transfers: [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x54ebb5ec, offset 4 [ 239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xcdffe7a2, offset 4 [ 239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x1d36f59d, offset 4 [ 239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xaef3c1e9, offset 4 [ 239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x2881912, offset 4 [ 239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x5638f7e2, offset 4 And network throughput ends up being pretty bursty and slow with a overall throughput of at best ~30kB/s. Looking through the commits since the v4.1 kernel where we didn't see this, I narrowed the regression down, and reverting the following two commits seems to avoid the problem: 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB if no RX netdev buffer 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating bad Ethernet frames With these reverted, we don't see all the error messages, and we see better ~1.1MB/s throughput (I've got a mouse plugged in, so I think the usb host is only running at "full-speed" mode here). This worries me some, as the patches seem to describe trying to fix the issue they seem to cause, so I suspect a revert isn't the correct solution, but am not sure why we're having such trouble and the patch authors did not. I'd be happy to do further testing of patches if folks have any ideas. Originally Reported-by: Yongqin Liu thanks -john Hi John, Some ASIX chipsets span the Ethernet frame over consecutive URBs which requires successful transfer of 2 URBs. This means states of a previous URB influences the processing of the next URB including a dropped URB (causes a discontinuity in the data stream). In other words synchronisation of the in-band 32-bit header word needs to be tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header word to be no longer fixed to the start of the URB buffer so it moves to any position within the URB buffer. I understand your point of suggesting it is a "regression" for your device but the driver was broken for DUB-E100 C1 (small black USB device). So you cannot revert the commits as this would break DUB-E100 C1 (small black USB device). 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB if no RX netdev buffer This commit is necessary because it avoids a crash when netdev buffer failed to be allocated for the 1st URB and the 2nd URB containing a spanned Ethernet frame is processed. The crash happens because the 2nd URB assumed that the netdev buffer had been allocated. 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating bad Ethernet frames This commit is necessary to avoid sending bad Ethernet frames into the IP stack during loss of synchronisation and to dropping good Ethernet frames. This commit improves the synchronisation recovery mechanism of the in-band 32-bit header word. The ASIX USB to Ethernet devices these commits were tested on where DUB-E100 C1 (small black USB device). Embedded ARM based systems were used where memory resources can run out. I don't have the chance to look into detail yet. But just a caution, did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey is an ARM 64-bit system. I suggest we should be careful on that. I saw similar issues when transferring to a 64-bit system in other net drivers. We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the commits need to be reviewed with 64-bit OS in mind. Do you have any suggestion on this regard? Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my ASIX commits. This will help to confirm whether the failure is related to 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail otherwise it points to something specific in your ARM 64-bit platform. It could be that for your USB to Ethernet device that the wrong configuration settings have been used. In other words the ASIX driver is flexible to support various variants of the ASIX chipsets. For example, does your device support Ethernet frames spanning multiple URBs (multiple USB transfers) ? Would you please suggest how to find out this information? How can I change my device's configuration settings to support spanning mu
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 3 May 2016 at 17:23, Dean Jenkins wrote: > On 03/05/16 05:55, John Stultz wrote: >> >> In testing with HiKey, we found that since commit 3f30b158eba5c60 >> (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of >> noise during network transfers: >> >> [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 >> [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length >> 0x54ebb5ec, offset 4 >> [ 239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length >> 0xcdffe7a2, offset 4 >> [ 239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 >> [ 239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length >> 0x1d36f59d, offset 4 >> [ 239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length >> 0xaef3c1e9, offset 4 >> [ 239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header >> synchronisation was lost, remaining 988 >> [ 239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length >> 0x2881912, offset 4 >> [ 239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length >> 0x5638f7e2, offset 4 >> >> >> And network throughput ends up being pretty bursty and slow with a >> overall throughput of at best ~30kB/s. >> >> Looking through the commits since the v4.1 kernel where we didn't see >> this, I narrowed the regression down, and reverting the following two >> commits seems to avoid the problem: >> >> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB >> if no RX netdev buffer >> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating >> bad Ethernet frames >> >> With these reverted, we don't see all the error messages, and we see >> better ~1.1MB/s throughput (I've got a mouse plugged in, so I think >> the usb host is only running at "full-speed" mode here). >> >> This worries me some, as the patches seem to describe trying to fix >> the issue they seem to cause, so I suspect a revert isn't the correct >> solution, but am not sure why we're having such trouble and the patch >> authors did not. I'd be happy to do further testing of patches if >> folks have any ideas. >> >> Originally Reported-by: Yongqin Liu >> >> thanks >> -john > > Hi John, > > Some ASIX chipsets span the Ethernet frame over consecutive URBs which > requires successful transfer of 2 URBs. > > This means states of a previous URB influences the processing of the next > URB including a dropped URB (causes a discontinuity in the data stream). In > other words synchronisation of the in-band 32-bit header word needs to be > tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header > word to be no longer fixed to the start of the URB buffer so it moves to any > position within the URB buffer. > > I understand your point of suggesting it is a "regression" for your device > but the driver was broken for DUB-E100 C1 (small black USB device). So you > cannot revert the commits as this would break DUB-E100 C1 (small black USB > device). > >> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB >> if no RX netdev buffer > > This commit is necessary because it avoids a crash when netdev buffer failed > to be allocated for the 1st URB and the 2nd URB containing a spanned > Ethernet frame is processed. The crash happens because the 2nd URB assumed > that the netdev buffer had been allocated. > >> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating >> bad Ethernet frames > > This commit is necessary to avoid sending bad Ethernet frames into the IP > stack during loss of synchronisation and to dropping good Ethernet frames. > This commit improves the synchronisation recovery mechanism of the in-band > 32-bit header word. > > The ASIX USB to Ethernet devices these commits were tested on where DUB-E100 > C1 (small black USB device). Embedded ARM based systems were used where > memory resources can run out. I don't have the chance to look into detail yet. But just a caution, did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey is an ARM 64-bit system. I suggest we should be careful on that. I saw similar issues when transferring to a 64-bit system in other net drivers. Do you have any suggestion on this regard? > > It could be that for your USB to Ethernet device that the wrong > configuration settings have been used. In other words the ASIX driver is > flexible to support various variants of the ASIX chipsets. For example, does > your device support Ethernet frames spanning multiple URBs (multiple USB > transfers) ? Would you please suggest how to find out this information? How can I change my device's configuration settings to support spanning multiple URBs? > > So I doubt my commits are "broken" because we don't see your failures (not > tested your device). It is more likely that your ASIX device needs to be > properly identified and configured to be compatible with the ASIX driver. At >
Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
On 03/05/16 05:55, John Stultz wrote: In testing with HiKey, we found that since commit 3f30b158eba5c60 (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of noise during network transfers: [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x54ebb5ec, offset 4 [ 239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xcdffe7a2, offset 4 [ 239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x1d36f59d, offset 4 [ 239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xaef3c1e9, offset 4 [ 239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x2881912, offset 4 [ 239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x5638f7e2, offset 4 And network throughput ends up being pretty bursty and slow with a overall throughput of at best ~30kB/s. Looking through the commits since the v4.1 kernel where we didn't see this, I narrowed the regression down, and reverting the following two commits seems to avoid the problem: 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB if no RX netdev buffer 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating bad Ethernet frames With these reverted, we don't see all the error messages, and we see better ~1.1MB/s throughput (I've got a mouse plugged in, so I think the usb host is only running at "full-speed" mode here). This worries me some, as the patches seem to describe trying to fix the issue they seem to cause, so I suspect a revert isn't the correct solution, but am not sure why we're having such trouble and the patch authors did not. I'd be happy to do further testing of patches if folks have any ideas. Originally Reported-by: Yongqin Liu thanks -john Hi John, Some ASIX chipsets span the Ethernet frame over consecutive URBs which requires successful transfer of 2 URBs. This means states of a previous URB influences the processing of the next URB including a dropped URB (causes a discontinuity in the data stream). In other words synchronisation of the in-band 32-bit header word needs to be tracked between URBs. Some ASIX chipsets allow the in-band 32-bit header word to be no longer fixed to the start of the URB buffer so it moves to any position within the URB buffer. I understand your point of suggesting it is a "regression" for your device but the driver was broken for DUB-E100 C1 (small black USB device). So you cannot revert the commits as this would break DUB-E100 C1 (small black USB device). 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB if no RX netdev buffer This commit is necessary because it avoids a crash when netdev buffer failed to be allocated for the 1st URB and the 2nd URB containing a spanned Ethernet frame is processed. The crash happens because the 2nd URB assumed that the netdev buffer had been allocated. 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating bad Ethernet frames This commit is necessary to avoid sending bad Ethernet frames into the IP stack during loss of synchronisation and to dropping good Ethernet frames. This commit improves the synchronisation recovery mechanism of the in-band 32-bit header word. The ASIX USB to Ethernet devices these commits were tested on where DUB-E100 C1 (small black USB device). Embedded ARM based systems were used where memory resources can run out. It could be that for your USB to Ethernet device that the wrong configuration settings have been used. In other words the ASIX driver is flexible to support various variants of the ASIX chipsets. For example, does your device support Ethernet frames spanning multiple URBs (multiple USB transfers) ? So I doubt my commits are "broken" because we don't see your failures (not tested your device). It is more likely that your ASIX device needs to be properly identified and configured to be compatible with the ASIX driver. At least, I suggest that is the best place to start your investigation. Of course, your ASIX chipset might have a different behaviour for how the in-band 32-bit header word operates so perhaps special treatment is needed for your chipset ? Please send to the mailing list the output of lsusb for your device so that people can know the USB product ID and vendor ID for your device. This is allows people to assist with the investigation. Do you have any links to websites that sell your device ? Are you using UDP or TCP connections ? Regards, Dean -- Dean Jenkins Embedded Software Engineer Linux Transportation Solutions Mentor Embedded Software Division Mentor Graphics (UK) Ltd. -- To unsubscribe from this
[REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions
In testing with HiKey, we found that since commit 3f30b158eba5c60 (asix: On RX avoid creating bad Ethernet frames), we're seeing lots of noise during network transfers: [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x54ebb5ec, offset 4 [ 239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xcdffe7a2, offset 4 [ 239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x1d36f59d, offset 4 [ 239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xaef3c1e9, offset 4 [ 239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x2881912, offset 4 [ 239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x5638f7e2, offset 4 And network throughput ends up being pretty bursty and slow with a overall throughput of at best ~30kB/s. Looking through the commits since the v4.1 kernel where we didn't see this, I narrowed the regression down, and reverting the following two commits seems to avoid the problem: 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB if no RX netdev buffer 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating bad Ethernet frames With these reverted, we don't see all the error messages, and we see better ~1.1MB/s throughput (I've got a mouse plugged in, so I think the usb host is only running at "full-speed" mode here). This worries me some, as the patches seem to describe trying to fix the issue they seem to cause, so I suspect a revert isn't the correct solution, but am not sure why we're having such trouble and the patch authors did not. I'd be happy to do further testing of patches if folks have any ideas. Originally Reported-by: Yongqin Liu thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html