Hi,

it seems I made some progress on this issue. Warning: loooong text
ahead...

I'm now in possession of a powered 7-port USB 3 hub which I use for
testing devices. This hub happens to have a power button, allowing me
to easily cut USB power and bring it back up again.

It just so happens that I can reproduce the initialization problems
100% of the time by using the hub's power button instead of hot
-plugging the device.

On Wed, 2015-11-25 at 17:49 +0100, Daniel Elstner wrote:
> I still suspect it's partly due to flaky FX2 firmware on the device,
> and partly due to my driver code straddling the boundaries more 
> tightly than the original software.

I'm almost sure now that it is *not* the sigrok driver code, but flaky
FX2 firmware.

What seems to happen is that the USB device does not reset properly in
some circumstances. When the device resets properly, in such a way that
it works correctly afterwards, the dmesg output on power-on/hotplug
looks like this:

[235815.350314] usb 1-2.1.4: new high-speed USB device number 65 using xhci_hcd
[235817.141662] usb 1-2.1.4: new high-speed USB device number 66 using xhci_hcd
[235817.242819] usb 1-2.1.4: New USB device found, idVendor=2961, idProduct=6689
[235817.242822] usb 1-2.1.4: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0
[235817.242824] usb 1-2.1.4: Product: LWLA-1034 v1.0
[235817.242825] usb 1-2.1.4: Manufacturer: www.SysClk.com

Note the double enumeration at the beginning. If the device does not
reset properly, the the dmesg output looks like this instead:

[235762.633529] usb 1-2.1.4: new high-speed USB device number 64 using xhci_hcd
[235762.734699] usb 1-2.1.4: New USB device found, idVendor=2961, idProduct=6689
[235762.734702] usb 1-2.1.4: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0
[235762.734704] usb 1-2.1.4: Product: LWLA-1034 v1.0
[235762.734705] usb 1-2.1.4: Manufacturer: www.SysClk.com

Note that the double enumeration is missing this time. Occasionally, I
can even get it stuck in a state where it identifies as:

    ID 04b4:8613 Cypress Semiconductor Corp. CY7C68013 EZ-USB FX2 USB 2.0 
Development Kit

This hints that even though the FX2 firmware is loaded from an EEPROM,
the device still needs to trigger reenumeration as part of its
initialization process, so it's not completely transparent.

Now, if the device turns up with the correct VID/PID of the SysClk
LWLA1034, one would think that the FX2 firmware did load correctly. So
why does it not work?

Here's the thing: It mostly does, except for one small detail. Here's a
summary of my findings:

1) Bulk transfers of any size to endpoint 4 (FPGA bitstream) and
endpoint 2 (commands to the FPGA logic) work fine. Also, not only the
transfer works, but the associated action is in fact performed.

2) Bulk transfers from endpoint 6, which is used for receiving replies
to commands, are *cut off after 64 bytes*, even though the endpoint
descriptor says the endpoint size is 512 bytes. However, it appears
that more data is waiting and can be read in subsequent transfers
(which are also cut short); and not reading the waiting data throws off
the device state logic.

3) libusb_reset_device(), libusb_clear_halt(), playing around with the
configuration descriptor etc *do not have any effect*. Yep, that's
right, not even an explicit reset is able to bring the device into a
proper state.

So, just for kicks, I tried this: When a short transfer comes in but
more data is expected, I re-submit the in transfer with adjusted buffer
pointer and length, so as to fetch the remaining data that is waiting.
This *almost* works! It seems to work completely for status requests,
but unfortunately it fails for capture memory reads. With memory reads,
this technique results in multiple 64-byte transfers which return more
data, but the final few bytes are missing. So it seems in case of
memory reads, the limited transfer size throws off the internal logic
in a way that is not recoverable.

I a desperate last-ditch attempt, I then reduced the chunk size of
memory read requests to the smallest possible amount, so that the
response to each request would fit within 64 bytes. *That* made it
work! However, it also slows down the capture memory read quite
noticeably. So I don't think this an appropriate "solution".

Phew.

A side issue is that leaving data waiting on the reply endpoint for any
reason will make the device completely unusable afterwards. I'm not
entirely sure, but I believe the incorrect read of the test ID people
have been seeing may be due to that.

Since this issue is independent of the weird 64-byte limit problem, I
have added a routine in the sigrok driver that drains any pending data
from the endpoint during device initialization. The code can be found
here:

https://github.com/danielkitta/libsigrok/tree/lwla-fixes

I've also created a branch for the short transfer hacks, if anyone is
interested in playing around with that:

https://github.com/danielkitta/libsigrok/tree/lwla-hackery

> I just hope it's not due to the peculiarities of the OS' USB stack
> (either causing or more likely hiding the issue). The original 
> software is Windows-only.

I now fear that this is indeed the case. Windows must be doing
something that hides the problem, or this issue would have turned up
for many users.

Cheers,
--Daniel


------------------------------------------------------------------------------
_______________________________________________
sigrok-devel mailing list
sigrok-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sigrok-devel

Reply via email to