Re: usbserial / ftdi_sio (+ others) bug?

2014-11-17 Thread Johan Hovold
[ +CC: linux-usb]

Hi Doug,

On Mon, Nov 17, 2014 at 12:23:05AM -0500, Douglas Gilbert wrote:
 Hi,
 I just ran into the lots ** of spurious zeros on read problem
 with a XBee reader adapter (for USB) from Sparkfun. It is really
 annoying. Looked around and your name popped up on a linux-usb
 thread whose name is in the subject line.
 
 I have two different adapters:
- FT232RL based that has the spurious zeros problem
- FT231X based that works fine
 
 Probably related to the spurious zeros appearing in my test code
 is this in syslog:
ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback -
  nonzero urb status: -71

Do see this during normal operation? Or when disconnecting an open port?

In the former case it could indicate a hardware problem.

 I'm using lk 3.17.3 . Has there been any resolution to this problem?

The issue in the thread you refer to is a hardware one that is causing
overruns, which in turns gets reported as NULL-bytes. [ The driver bug
that is also discussed is only about whether it is possible to disable
this error reporting. ]
 
 Doug Gilbert
 
 ** my read()s are for 200 bytes and when this problem occurs, it
 reads 200 bytes of zeros. And there is no LED illumination on the
 XBee adapter that would usually indicate inbound data. This at
 9600 baud so 200 chars would take at least 2 seconds.

Have you checked the interrupt counters (TIOCMGICOUNT ioctl) to see if
there have been overruns when this occurs?

You should also see it, albeit in a more raw form, in the status bytes
from the device if you enable debugging in usbserial.

Thanks,
Johan
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-11-05 Thread Johan Hovold
On Tue, Nov 04, 2014 at 08:29:13PM +0200, Janne Huttunen wrote:
 On Tue, 4 Nov 2014 09:14:49 +0100
 Johan Hovold jo...@kernel.org wrote:
   2. The chip responds with single correct character followed by a few
  hundred or so replies containing only the overrun status (no
   data) which are then converted to a bunch of binary zeroes by the
   ldisc because of the bug I mentioned earlier. After that the chip
   starts responding with proper data again and works until closed.
  
  Note that the only bug is that the application cannot disable the
  overrun reporting, but why would you want that?
 
 The merits of doing so may be debatable, but if using the quotes
 around bug is supposed to indicate that it isn't one, I have
 to respectfully disagree. I know it is not the most important
 thing in the world and without the hardware fault I probably
 would not have seen it at all, but I would still call it a bug.

And so have I. It is a bug, but it's not what causing your problems
here. In fact, I would argue that you do not even want to disable
overrun reporting. That was my point.

  What's on the other side of the FTDI chip?
 
 Some kind of an optical receiver circuit (the link is optically
 isolated). On the other side of that is then the device that sends
 periodical data packets (a couple of times per second 17 bytes
 each) to the computer. The computer doesn't send anything i.e.
 the tx functionality of the chip is not used at all.

What baudrates? Have you verified the RS232 signals?

  It still sounds like your hardware is broken, but at least you
  seem to have found a work-around.
 
 Like I said, the hw is the real culprit here, there's no doubt
 about it. But I also doubt that it's just the individual chip
 in my device that has this issue. The device is practically
 brand new and while that is no guarantee that there won't be any
 faults, I find it much more likely that what I am seeing here is
 a quirk of the implementation and there are lots of these chips
 with the same issue out there.

Your device behaving this way is the first one I hear of. 

 The real questions that remain are then; 1. is the chip real or
 counterfeit and how am I supposed to know it,

No idea.

I have three FT232R plugged in as we speak and they have the same
descriptors as yours (bcdDevice etc). Haven't had any issues with them.

 2. how much the driver can or even should try to accommodate the
 quirks of the hw, and

Without knowing for sure that this is an issue with a class of devices,
there's not much we can do.

 3. does the answer to #2 depend on the answer to #1.

Yes.

  Perhaps you can report it to the logging-device (?) manufacturer
  or FTDI.
 
 Sure, if I can find someone that cares, which is doubtful.

If the chip is sold as part of the logging device, I would hope the
manufacturer would.

Johan
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-11-04 Thread Johan Hovold
On Mon, Nov 03, 2014 at 11:46:09PM +0200, Janne Huttunen wrote:
 On Wed, 29 Oct 2014 09:51:28 +0100
 Johan Hovold jo...@kernel.org wrote:
  Having the driver not reporting overrun (and other) errors will
  obviously not fix the underlying issue with your device, which is
  generating all these errors in the first place.
 
 Ok, I did take a closer look at this (mostly with usbmon) and it seems
 to be caused by the hardware. When the application does open the device
 and the driver submits the first bulk reads, there's basically three
 possibilities what happens next:
 
 1. The chip responds with correct data and everything works fine from
there until the device is closed.
 2. The chip responds with single correct character followed by a few
hundred or so replies containing only the overrun status (no data)
which are then converted to a bunch of binary zeroes by the ldisc
because of the bug I mentioned earlier. After that the chip starts
responding with proper data again and works until closed.

Note that the only bug is that the application cannot disable the
overrun reporting, but why would you want that? The data stream is
already corrupt (missing data) so the extra NULL-byte doesn't do much
harm, but does provide a hint about what went wrong.

 3. The chip hangs forever without ever responding anything on the
bulk endpoint.
 
 As a rough estimate I'd say that something like at least one out of
 ten opens currently exhibits either behavior 2 or 3. Also it doesn't
 seem to have anything to do with any real buffering inside the chip
 i.e. if I close a working connection and immediately open it again,
 it may hang the chip.

What's on the other side of the FTDI chip?

 After some poking around, it seems that the chip really doesn't like
 the latency timer value of 1 when it is reset. After it gets the data
 going it doesn't seem to mind it i.e. I have not seen the chip to
 hang or report superfluous overruns during normal operation even with
 latency timer value of 1. With timer value 2 I did get something like
 300 opens before hitting the issue and with value 3 I have not seen
 the device misbehave (yet) in like a thousand or so opens. I do think
 that more testing is still needed before saying anything definite,
 but larger timer at least seems to mitigate the issue significantly.

That's interesting, and does indeed point to the FTDI chip.

 BTW, in case nobody else is ever experiencing this issue, please note
 that I cannot guarantee in any way that the FT232RL in my device is
 actually authentic. If it is counterfeit, it is a different one than
 the one that was having the issue with the Windows driver lately. My
 device doesn't seem to have that bug, but that is no guarantee that
 it is the real deal. And obviously, real or not, it *does* have some
 bug that causes it to now misbehave during open().
 
 So, tentatively seems that in order to get rid of the issue with at
 least this FT232 variant (whatever it may happen to be), either the
 minimum latency timer value should be increased or possibly
 alternatively the chip could be reset with higher value and the actual
 value set later when the chip has started properly. Although I don't
 yet know for sure which latency value would work 100% of the time or
 if the alternative idea would actually work at all (I just thought
 about trying something like that).

You know that you can change the latency-timer setting from user space?

setserial /dev/ttyUSBx ^low_latency
echo 16 /sys/class/tty/ttyUSBx/device/latency_timer

It still sounds like your hardware is broken, but at least you seem to
have found a work-around. Perhaps you can report it to the
logging-device (?) manufacturer or FTDI.

What is the lsusb -v output for your device by the way.

Thanks,
Johan
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-11-04 Thread Janne Huttunen
On Tue, 4 Nov 2014 09:14:49 +0100
Johan Hovold jo...@kernel.org wrote:
  2. The chip responds with single correct character followed by a few
 hundred or so replies containing only the overrun status (no
  data) which are then converted to a bunch of binary zeroes by the
  ldisc because of the bug I mentioned earlier. After that the chip
  starts responding with proper data again and works until closed.
 
 Note that the only bug is that the application cannot disable the
 overrun reporting, but why would you want that?

The merits of doing so may be debatable, but if using the quotes
around bug is supposed to indicate that it isn't one, I have
to respectfully disagree. I know it is not the most important
thing in the world and without the hardware fault I probably
would not have seen it at all, but I would still call it a bug.

 What's on the other side of the FTDI chip?

Some kind of an optical receiver circuit (the link is optically
isolated). On the other side of that is then the device that sends
periodical data packets (a couple of times per second 17 bytes
each) to the computer. The computer doesn't send anything i.e.
the tx functionality of the chip is not used at all.

 It still sounds like your hardware is broken, but at least you
 seem to have found a work-around.

Like I said, the hw is the real culprit here, there's no doubt
about it. But I also doubt that it's just the individual chip
in my device that has this issue. The device is practically
brand new and while that is no guarantee that there won't be any
faults, I find it much more likely that what I am seeing here is
a quirk of the implementation and there are lots of these chips
with the same issue out there.

The real questions that remain are then; 1. is the chip real or
counterfeit and how am I supposed to know it, 2. how much the
driver can or even should try to accommodate the quirks of
the hw, and 3. does the answer to #2 depend on the answer to #1.

 Perhaps you can report it to the logging-device (?) manufacturer
 or FTDI.

Sure, if I can find someone that cares, which is doubtful.

 What is the lsusb -v output for your device by the way.

Bus 002 Device 006: ID 0403:6001 Future Technology Devices
International, Ltd FT232 USB-Serial (UART) IC Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize0 8
  idVendor   0x0403 Future Technology Devices International, Ltd
  idProduct  0x6001 FT232 USB-Serial (UART) IC
  bcdDevice6.00
  iManufacturer   1 FTDI
  iProduct2 FT232R USB UART
  iSerial 3 A400EJPK
  bNumConfigurations  1
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   32
bNumInterfaces  1
bConfigurationValue 1
iConfiguration  0 
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower   90mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   2
  bInterfaceClass   255 Vendor Specific Class
  bInterfaceSubClass255 Vendor Specific Subclass
  bInterfaceProtocol255 Vendor Specific Protocol
  iInterface  2 FT232R USB UART
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81  EP 1 IN
bmAttributes2
  Transfer TypeBulk
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0040  1x 64 bytes
bInterval   0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02  EP 2 OUT
bmAttributes2
  Transfer TypeBulk
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0040  1x 64 bytes
bInterval   0
Device Status: 0x
  (Bus Powered)

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-11-03 Thread Janne Huttunen
On Wed, 29 Oct 2014 09:51:28 +0100
Johan Hovold jo...@kernel.org wrote:
 Having the driver not reporting overrun (and other) errors will
 obviously not fix the underlying issue with your device, which is
 generating all these errors in the first place.

Ok, I did take a closer look at this (mostly with usbmon) and it seems
to be caused by the hardware. When the application does open the device
and the driver submits the first bulk reads, there's basically three
possibilities what happens next:

1. The chip responds with correct data and everything works fine from
   there until the device is closed.
2. The chip responds with single correct character followed by a few
   hundred or so replies containing only the overrun status (no data)
   which are then converted to a bunch of binary zeroes by the ldisc
   because of the bug I mentioned earlier. After that the chip starts
   responding with proper data again and works until closed.
3. The chip hangs forever without ever responding anything on the
   bulk endpoint.

As a rough estimate I'd say that something like at least one out of
ten opens currently exhibits either behavior 2 or 3. Also it doesn't
seem to have anything to do with any real buffering inside the chip
i.e. if I close a working connection and immediately open it again,
it may hang the chip.

After some poking around, it seems that the chip really doesn't like
the latency timer value of 1 when it is reset. After it gets the data
going it doesn't seem to mind it i.e. I have not seen the chip to
hang or report superfluous overruns during normal operation even with
latency timer value of 1. With timer value 2 I did get something like
300 opens before hitting the issue and with value 3 I have not seen
the device misbehave (yet) in like a thousand or so opens. I do think
that more testing is still needed before saying anything definite,
but larger timer at least seems to mitigate the issue significantly.

BTW, in case nobody else is ever experiencing this issue, please note
that I cannot guarantee in any way that the FT232RL in my device is
actually authentic. If it is counterfeit, it is a different one than
the one that was having the issue with the Windows driver lately. My
device doesn't seem to have that bug, but that is no guarantee that
it is the real deal. And obviously, real or not, it *does* have some
bug that causes it to now misbehave during open().

So, tentatively seems that in order to get rid of the issue with at
least this FT232 variant (whatever it may happen to be), either the
minimum latency timer value should be increased or possibly
alternatively the chip could be reset with higher value and the actual
value set later when the chip has started properly. Although I don't
yet know for sure which latency value would work 100% of the time or
if the alternative idea would actually work at all (I just thought
about trying something like that).

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


usbserial / ftdi_sio (+ others) bug?

2014-10-29 Thread Janne Huttunen
I own a device that implements a data logging interface using the
FT232 USB-serial -chip. Very often it happens that connecting the
associated software with the device requires multiple attempts.
There seems to be two kinds of issues; either the program reports
that it did not receive any data or it reports reading lots of
data, but it was all invalid. I haven't yet looked at the former,
but I did spend some time investigating the latter.

Simple strace of the program startup showed that when connecting
fails, the program gets a lot (hundreds) of binary zeros while
reading the device. I used usbmon to capture the traffic between
the host and the device and the zeros are not strictly speaking
coming from the device. However when this problem happens the
device seems to report quite lot of overruns for a while, which
was a clue. After a somewhat successful attempt to understand
the operation of the tty code in Linux, I have a theory.

The usbserial driver sets the TTY_DRIVER_REAL_RAW flag. Based on
the comment in tty_driver.h this implies that the driver is not
supposed to report any statuses (including overruns) to ldisc
if they are ignored by the application (like they are in this
case). It's just that AFAICS the ftdi_sio subdriver (and many
others) doesn't seem quite honor this, but seems to report any
status unconditionally. Also AFAICS this then means that every
overrun will get converted into single binary zero delivered to
the application(?). If so, this probably isn't what is supposed
to happen and would explain the flood of extraneous zeros the
application was seeing when the connecting failed.

I haven't had yet the time to test this theory, but at least it
seems plausible to me. Any thoughts, anybody?
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-10-29 Thread Johan Hovold
[ +CC: Peter, linux-serial ]

On Wed, Oct 29, 2014 at 10:07:26AM +0200, Janne Huttunen wrote:
 I own a device that implements a data logging interface using the
 FT232 USB-serial -chip. Very often it happens that connecting the
 associated software with the device requires multiple attempts.
 There seems to be two kinds of issues; either the program reports
 that it did not receive any data or it reports reading lots of
 data, but it was all invalid. I haven't yet looked at the former,
 but I did spend some time investigating the latter.
 
 Simple strace of the program startup showed that when connecting
 fails, the program gets a lot (hundreds) of binary zeros while
 reading the device. I used usbmon to capture the traffic between
 the host and the device and the zeros are not strictly speaking
 coming from the device. However when this problem happens the
 device seems to report quite lot of overruns for a while, which
 was a clue. After a somewhat successful attempt to understand
 the operation of the tty code in Linux, I have a theory.
 
 The usbserial driver sets the TTY_DRIVER_REAL_RAW flag. Based on
 the comment in tty_driver.h this implies that the driver is not
 supposed to report any statuses (including overruns) to ldisc
 if they are ignored by the application (like they are in this
 case). It's just that AFAICS the ftdi_sio subdriver (and many
 others) doesn't seem quite honor this, but seems to report any
 status unconditionally. Also AFAICS this then means that every
 overrun will get converted into single binary zero delivered to
 the application(?). If so, this probably isn't what is supposed
 to happen and would explain the flood of extraneous zeros the
 application was seeing when the connecting failed.
 
 I haven't had yet the time to test this theory, but at least it
 seems plausible to me. Any thoughts, anybody?

You are correct. The usb-serial drivers, and at least some serial
drivers, fail to implement TTY_DRIVER_REAL_RAW correctly in that they
do not honour ((IGNBRK || (!BRKINT  !PARMRK))  (IGNPAR || !INPCK)).

I'll take a look at the usb-serial drivers.

Having the driver not reporting overrun (and other) errors will
obviously not fix the underlying issue with your device, which is
generating all these errors in the first place.

Thanks,
Johan
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-10-29 Thread Janne Huttunen
On Wed, Oct 29, 2014 at 10:51 AM, Johan Hovold jo...@kernel.org wrote:
 Having the driver not reporting overrun (and other) errors will
 obviously not fix the underlying issue with your device, which is
 generating all these errors in the first place.

Yes, although that might be related to the other fault I have been
seeing where the program reports receiving no data whatsoever. I'll
have to take a look at that too when I have the time.
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usbserial / ftdi_sio (+ others) bug?

2014-10-29 Thread Peter Hurley
On 10/29/2014 04:51 AM, Johan Hovold wrote:
 [ +CC: Peter, linux-serial ]
 
 On Wed, Oct 29, 2014 at 10:07:26AM +0200, Janne Huttunen wrote:
 I own a device that implements a data logging interface using the
 FT232 USB-serial -chip. Very often it happens that connecting the
 associated software with the device requires multiple attempts.
 There seems to be two kinds of issues; either the program reports
 that it did not receive any data or it reports reading lots of
 data, but it was all invalid. I haven't yet looked at the former,
 but I did spend some time investigating the latter.

 Simple strace of the program startup showed that when connecting
 fails, the program gets a lot (hundreds) of binary zeros while
 reading the device.

So you're only getting status and not data.

 I used usbmon to capture the traffic between
 the host and the device and the zeros are not strictly speaking
 coming from the device. However when this problem happens the
 device seems to report quite lot of overruns for a while, which
 was a clue. After a somewhat successful attempt to understand
 the operation of the tty code in Linux, I have a theory.

 The usbserial driver sets the TTY_DRIVER_REAL_RAW flag. Based on
 the comment in tty_driver.h this implies that the driver is not
 supposed to report any statuses (including overruns) to ldisc
 if they are ignored by the application (like they are in this
 case). It's just that AFAICS the ftdi_sio subdriver (and many
 others) doesn't seem quite honor this, but seems to report any
 status unconditionally. Also AFAICS this then means that every
 overrun will get converted into single binary zero delivered to
 the application(?). If so, this probably isn't what is supposed
 to happen and would explain the flood of extraneous zeros the
 application was seeing when the connecting failed.

 I haven't had yet the time to test this theory, but at least it
 seems plausible to me. Any thoughts, anybody?
 
 You are correct. The usb-serial drivers, and at least some serial
 drivers, fail to implement TTY_DRIVER_REAL_RAW correctly in that they
 do not honour ((IGNBRK || (!BRKINT  !PARMRK))  (IGNPAR || !INPCK)).

These settings are a constant source of bugs in serial drivers.
We really need to abstract the way these settings are processed;
even the 8250 driver is getting this wrong.

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html