On 3/3/20 12:25 AM, Lukasz Majewski wrote: > Hi Marek, Hi,
[...] >>>> Every read operation starts at the maximum block size. When the USB >>>> pendrive is not able to correctly serve this data read request, the >>>> dynamic reduction of IO size is performed. Up to six tries (with >>>> smaller IO block each time) are attempted. >>>> >>>> A related problem is that some drives are slow to come up. Linux >>>> handles this by issuing a spinup command and allowing more time for >>>> the drive to respond. The same idea is applied in this fix. >>>> >>>> On TPC70 (i.MX6Q) once per ~10 times (without this patch): >>>> >>>> Bus usb@2184200: USB EHCI 1.00 >>>> scanning bus usb@2184200 for devices... 2 USB Device(s) found >>>> scanning usb for storage devices... 1 Storage Device(s) >>>> found EHCI timed out on TD - token=0x1f8c80 >>> >>> This is how the error gets evident. The detailed explanation is in >>> link [1]. >> >> Is there one specific post in that forum, or do I need to read through >> the whole multi-page thread ? > > There is a thread pointed out by the below link which explains the > issue thoroughly: > https://forum.doozan.com/read.php?3,35295,35295#msg-35295 > > (The first 2 posts from 'rayvt' explains the problem). So basically a drive spin-up problem and the problem with stick counter overflow. btw QH/qTD are not microcode instructions, they are just plain DMA descriptors. And the USB controller in Marvell Kirkwood/Orion SoCs is on-SoC, it's not a discrete chip. >> I would expect that if you run -- I assume 'usb reset' (the command is >> missing above) -- then the bus gets power-cycled, hence the USB device >> also gets power-cycled. > > Unfortunately, the 'usb reset' is not fixing the issue. The controller > is unresponsive and the only way for recovering is a power cycle. My guess would be it might be related to the async schedule, so that's where I would start debugging. [...] >>>> + qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token); >>>> if (!(QT_TOKEN_GET_STATUS(qhtoken) & >>>> QT_TOKEN_STATUS_ACTIVE)) { debug("TOKEN=%#x\n", qhtoken); >>>> + if (qhtoken & QT_TOKEN_STATUS_XACTERR) { >>> ^^^^^^^^^^^^^^^^^^^^^^^^ - this flag >>> before this patch was also not >>> checked. >>>> + if (--trynum >= 0) { >>>> + /* >>>> + * It is necessary to do this, >>>> otherwise the >>>> + * disk is clagged. >>>> + */ >>>> + debug("reset the TD and redo, >>>> because of XACTERR\n"); >>>> + qhtoken &= >>>> ~QT_TOKEN_STATUS_HALTED; >>>> + qhtoken |= QT_TOKEN_STATUS_ACTIVE >>>> | >>>> + QT_TOKEN_CERR(2); >>>> + vtd->qt_token = >>>> cpu_to_hc32(qhtoken); >>>> + qh->qh_overlay.qt_token = >>>> cpu_to_hc32(qhtoken); >>>> + goto retry_xacterr; >>>> + } >>>> + dev->status = USB_ST_XACTERR; >>>> + dev->act_len = length - >>>> + QT_TOKEN_GET_TOTALBYTES(qhtoken); >>>> >>> >>> This may solve the issue - as it resets the controller in the case >>> of transfer error. >> >> Do you need to reset the controller ? Or is there some graceful way >> out, i.e. a way to recover from the error ? > > The original patch was developed when the async transmission was start > and stopped for each transfer (before your optimization), hence the code > for disabling it. > > I need to check if it would be enough to just clear the > QT_TOKEN_STATUS_HALTED and set QT_TOKEN_STATUS_ACTIVE again. OK