Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-08 Thread Richard Ash
On Wed, 5 Nov 2014 15:46:25 +
Daniel Silverstone dsilv...@digital-scurf.org wrote:
 Sadly I can't look at the exact firmware of the device, but I can
 tell you it was based on ST Electronics' virtual COM port example in
 their *old* STM32 development kits.

From some experience with this at work, I have to say that that sample
code had (circa 2010) some serious issues, and we ended up doing a lot
of work to make it meet requirements on a project. I'm not that
surprised it needs a quirk ...

Richard
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-06 Thread Nix
On 5 Nov 2014, Johan Hovold told this:

 On Wed, Nov 05, 2014 at 03:14:49PM +, Nix wrote:
What if you
   physically disconnect and reconnect the device instead, or simply
  
  That fixes it, in fact it's the only way to fix it once it's broken by
  this bug.
 
  I didn't mean whether it would get the device working again, but rather
  whether you could kill it this way.
 
 Never seen it fail after a physical disconnection.

 Ok.

 And it has to include an enumeration, since just opening and closing it
 (restarting the daemon) repeatedly seems to work?

Well, it's more accurate to say that restarting the daemon doesn't make
it fail, but won't make it start working if it's already gone nuts
either. It normally copes fine with the daemon stopping and starting,
yes (and the daemon copes fine with keys being connected and
disconnected).

  Yeah, it seems your device firmware has crashed. It stops responding to
  control requests.
 
 Not surprising: I was fairly sure we were provoking a key firmware crash
 or something like that. This is a device with no support for flow
 control at all, after all, so I'm not terribly surprised that trying to
 do flow control confuses it.

 Only weird thing is that it has been coping with those calls for a long
 time. Only the slightly changed timings seems to have triggered this
 issue.

Yeah. I get the strong impression from Daniel that the USB side of this
was done by taking something that worked on the kernel of the day,
adding the key-specific stuff to it, and making sure that it still
worked. i.e. this was not a from-spec reimplementation, it was using an
existing (old) stack. If that stack makes iffy assumptions, so will the
entropy key.

 Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
 of the firmware I have is baked into the sealed key. :)

As his email noted, no he can't :) but you do now have a link to the
thing it was based on.

 Ah, ok. I guess we need a new quirk then. There's already a quirk in the
 driver to suppress error from when trying to set the control lines.

 But that doesn't help this device, which happily accepts the requests
 and then dies at random times.

Yeah. Strange. (I thought it was it's 'right after', but you seem to
have disproved that. It's only 'sometime later'.)

 Could you try two more things (too make sure line control is really the
 culprit):

 1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
 that fix the stability issue?

Definitely not. I got a hang after the first reboot out of an afflicted
kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
reboot anyway?)

 2. Can you verify that the patch below works? Did I use the correct
 VID/PID? Could you provide lsusb -v output (for the capability flags)
 as well?

The VID/PID are right.

The patch seems to work. I tested against the usb testing tree directly,
since that was easier than adjusting it to apply against 3.16. Sixteen
reboots, no failures, looks to be fixed.

lsusb output:

Bus 002 Device 003: ID 20df:0001 Simtec Electronics Entropy Key [UDEKEY01]
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass2 Communications
  bDeviceSubClass 0
  bDeviceProtocol 0
  bMaxPacketSize064
  idVendor   0x20df Simtec Electronics
  idProduct  0x0001 Entropy Key [UDEKEY01]
  bcdDevice2.00
  iManufacturer   1 Simtec Electronics
  iProduct2 Entropy Key
  iSerial 3 M/9tBjBLNzE2RSFD
  bNumConfigurations  1
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   67
bNumInterfaces  2
bConfigurationValue 1
iConfiguration  0
bmAttributes 0xc0
  Self Powered
MaxPower   76mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   1
  bInterfaceClass 2 Communications
  bInterfaceSubClass  2 Abstract (modem)
  bInterfaceProtocol  1 AT-commands (v.25ter)
  iInterface  0
  CDC Header:
bcdCDC   1.10
  CDC Call Management:
bmCapabilities   0x00
bDataInterface  1
  CDC ACM:
bmCapabilities   0x00
  CDC Union:
bMasterInterface0
bSlaveInterface 1
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82  EP 2 IN
bmAttributes3
  Transfer TypeInterrupt
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0008  1x 8 bytes
bInterval 255

Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-06 Thread Johan Hovold
On Thu, Nov 06, 2014 at 01:49:13PM +, Nix wrote:
 On 5 Nov 2014, Johan Hovold told this:
  On Wed, Nov 05, 2014 at 03:14:49PM +, Nix wrote:

  Could you try two more things (too make sure line control is really the
  culprit):
 
  1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
  that fix the stability issue?
 
 Definitely not. I got a hang after the first reboot out of an afflicted
 kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
 reboot anyway?)

It's actually only the timings related to the control-lines being raised
on open that has changed, so this would seem consistent with that.

  2. Can you verify that the patch below works? Did I use the correct
  VID/PID? Could you provide lsusb -v output (for the capability flags)
  as well?
 
 The VID/PID are right.
 
 The patch seems to work. I tested against the usb testing tree directly,
 since that was easier than adjusting it to apply against 3.16. Sixteen
 reboots, no failures, looks to be fixed.

Great, thanks for testing.

  Note that the patch is against the usb-linus branch of the usb tree,
 
 OK. (I presume this is gregkh's tree, not yours?)

Yes, Greg's tree, could have been more clear about that.

  which also contains another fix which could affect this device
  (set_termios will now be called an extra time on every open). You also
  need this one, which have not yet been applied:
 
  http://marc.info/?l=linux-usbm=141520959813505w=2
 
 It had been applied by the time I tested this :)

Greg is fast. :)

Thanks for tracking this down. That bisect cannot have been fun given
the low failure rate (sometimes only one in ten reboots?).

Johan
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-06 Thread Nix
On 6 Nov 2014, Johan Hovold said:

 On Thu, Nov 06, 2014 at 01:49:13PM +, Nix wrote:
 On 5 Nov 2014, Johan Hovold told this:
  On Wed, Nov 05, 2014 at 03:14:49PM +, Nix wrote:

  Could you try two more things (too make sure line control is really the
  culprit):
 
  1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
  that fix the stability issue?
 
 Definitely not. I got a hang after the first reboot out of an afflicted
 kernel, when using a HUPCLless ekeyd. Weird. (I guess they're lowered on
 reboot anyway?)

 It's actually only the timings related to the control-lines being raised
 on open that has changed, so this would seem consistent with that.

Urgh. No wonder it was intermittent.

 Thanks for tracking this down. That bisect cannot have been fun given
 the low failure rate (sometimes only one in ten reboots?).

It often failed quite fast, but yes, the negative case was hard to
prove: I had to rewind twice. I counted reboots because I'm a flaming
aspie pedant. 173 reboots that took... thank goodness it replicated on
the machine of mine that's fastest to reboot!

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-05 Thread Johan Hovold
On Fri, Oct 31, 2014 at 04:44:46PM +, Nix wrote:
 Sorry for the delay: illness and work-related release time flurries.
 
 On 24 Oct 2014, Johan Hovold told this:
 
  [ +CC: linux-usb ]
 
  On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
  On 22 Oct 2014, Johan Hovold outgrape:
  
   On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
   On 14 Oct 2014, Johan Hovold verbalised:
   
On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
I have checked: this code is being executed against a symlink that
points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
succeeding on the kernel I'm running now, but of course that's 3.16.5
with this commit reverted...)
   
You could verify that by enabling debugging in the cdc-acm driver and
making sure that the corresponding control messages are indeed sent on
close.
   
   I have a debugging dump at
   http://www.esperi.org.uk/~nix/temporary/cdc-acm.log; it's fairly
  
   What kernel were you using here? The log seems to suggest that it was
   generated with the commit in question reverted.
  
  Look now :) (the previous log is still there in cdc-acm-reverted.log.)
 
  Unfortunately, it seems the logs are incomplete. There are lots of
  entries missing (e.g. acm_tty_install when opening, but also some
  acm_submit_read_urb), some of which were there in your first log.
 
 OK. What we have now in
 http://www.esperi.org.uk/~nix/temporary/cdc-acm.log is a log from the
 pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
 acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
 the dmesg buffer size so the top isn't being cut off any more. It took a
 lot of boots for it to fail this time: about a dozen. The log contains
 the boot that failed and the one before.
 
 (To my uneducated eye, the initial traffic to/from the key looks *very*
 different in the second boot. Something is clearly wrong by this point,
 but that's not much of a surprise!)

The log appears incomplete again, this time it seems the last part is
completely missing (the device is never closed for example). The device
still seems to be responding.

  This contains two boots -- one on which the USB key worked, and the next
  (with an identical kernel) with which the key was broken. (I'm not sure
  whether this problem happens at startup or shutdown time, so it seemed
  best to provide both.)
 
  That's a good idea.
 
  Is it only after reboot you've seen the device fail?
 
 So far.
 
   What if you
  physically disconnect and reconnect the device instead, or simply
 
 That fixes it, in fact it's the only way to fix it once it's broken by
 this bug.

I didn't mean whether it would get the device working again, but rather
whether you could kill it this way.

  repeatedly open and close it?
 
 Hm. Good idea.

Same here, but the below test was also informative.
 
 ... no, that doesn't help. Additional log from that shows a lot of what
 looks like error returns:
 
 Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: 
 acm_tty_close
 Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: 
 acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110

Yeah, it seems your device firmware has crashed. It stops responding to
control requests.

 Oct 31 16:38:08 fold kern debug: : [  173.130557] cdc_acm 2-1:1.0: 
 acm_port_shutdown
 Oct 31 16:38:08 fold kern debug: : [  173.131475] cdc_acm 2-1:1.0: 
 acm_ctrl_irq - urb shutting down with status: -2
 Oct 31 16:38:08 fold kern debug: : [  173.132474] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.132519] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.133473] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.133510] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.134471] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.134507] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.135471] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.135509] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.136472] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.136507] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.137471] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.137517] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.138471] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: : [  173.138520] cdc_acm 2-1:1.1: acm_softint
 Oct 31 16:38:08 fold kern debug: : [  173.139469] cdc_acm 2-1:1.1: 
 acm_write_bulk - len 0/1, status -2
 Oct 31 16:38:08 fold kern debug: 

Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-05 Thread Nix
On 5 Nov 2014, Johan Hovold stated:

 On Fri, Oct 31, 2014 at 04:44:46PM +, Nix wrote:
 Sorry for the delay: illness and work-related release time flurries.
 
 On 24 Oct 2014, Johan Hovold told this:
 
  [ +CC: linux-usb ]
 
  On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
  On 22 Oct 2014, Johan Hovold outgrape:
  
   On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
   On 14 Oct 2014, Johan Hovold verbalised:
   
On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
I have checked: this code is being executed against a symlink that
points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, 
it's
succeeding on the kernel I'm running now, but of course that's 
3.16.5
with this commit reverted...)
   
You could verify that by enabling debugging in the cdc-acm driver and
making sure that the corresponding control messages are indeed sent 
on
close.
   
   I have a debugging dump at
   http://www.esperi.org.uk/~nix/temporary/cdc-acm.log; it's fairly
  
   What kernel were you using here? The log seems to suggest that it was
   generated with the commit in question reverted.
  
  Look now :) (the previous log is still there in cdc-acm-reverted.log.)
 
  Unfortunately, it seems the logs are incomplete. There are lots of
  entries missing (e.g. acm_tty_install when opening, but also some
  acm_submit_read_urb), some of which were there in your first log.
 
 OK. What we have now in
 http://www.esperi.org.uk/~nix/temporary/cdc-acm.log is a log from the
 pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
 acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
 the dmesg buffer size so the top isn't being cut off any more. It took a
 lot of boots for it to fail this time: about a dozen. The log contains
 the boot that failed and the one before.
 
 (To my uneducated eye, the initial traffic to/from the key looks *very*
 different in the second boot. Something is clearly wrong by this point,
 but that's not much of a surprise!)

 The log appears incomplete again, this time it seems the last part is
 completely missing (the device is never closed for example). The device
 still seems to be responding.

Nope -- by the time I clipped the log, the device was definitely
nonresponsive.

I've appended the remaining log, just in case. This is the same as the
snapshot I added to the email itself last time: a close-and-open as I
tried restarting the daemon, and another close as part of system
shutdown.

   What if you
  physically disconnect and reconnect the device instead, or simply
 
 That fixes it, in fact it's the only way to fix it once it's broken by
 this bug.

 I didn't mean whether it would get the device working again, but rather
 whether you could kill it this way.

Never seen it fail after a physical disconnection.

 ... no, that doesn't help. Additional log from that shows a lot of what
 looks like error returns:
 
 Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: 
 acm_tty_close
 Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: 
 acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110

 Yeah, it seems your device firmware has crashed. It stops responding to
 control requests.

Not surprising: I was fairly sure we were provoking a key firmware crash
or something like that. This is a device with no support for flow
control at all, after all, so I'm not terribly surprised that trying to
do flow control confuses it.

 The above is all normal, but

 Oct 31 16:38:08 fold kern debug: : [  173.161489] cdc_acm 2-1:1.0: 
 acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62

 here's another timeout. It's dead.

Again, not surprising.

 Did you get anywhere with trying to look at the device firmware?

Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
of the firmware I have is baked into the sealed key. :)

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-05 Thread Daniel Silverstone
On Wed, Nov 05, 2014 at 15:14:49 +, Nix wrote:
  Did you get anywhere with trying to look at the device firmware?
 Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
 of the firmware I have is baked into the sealed key. :)

Sadly I can't look at the exact firmware of the device, but I can tell you it
was based on ST Electronics' virtual COM port example in their *old* STM32
development kits.

Fortunately I do have a record of that example code here:
https://git.gitano.org.uk/stm32-virtcomport.git/

I'm not a USB guru though.

D.

-- 
Daniel Silverstone http://www.digital-scurf.org/
PGP mail accepted and encouraged.Key Id: 3CCE BABE 206C 3B69
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-11-05 Thread Johan Hovold
On Wed, Nov 05, 2014 at 03:14:49PM +, Nix wrote:
 On 5 Nov 2014, Johan Hovold stated:
 
  On Fri, Oct 31, 2014 at 04:44:46PM +, Nix wrote:
  Sorry for the delay: illness and work-related release time flurries.
  
  On 24 Oct 2014, Johan Hovold told this:
  

  The log appears incomplete again, this time it seems the last part is
  completely missing (the device is never closed for example). The device
  still seems to be responding.
 
 Nope -- by the time I clipped the log, the device was definitely
 nonresponsive.

 I've appended the remaining log, just in case. This is the same as the
 snapshot I added to the email itself last time: a close-and-open as I
 tried restarting the daemon, and another close as part of system
 shutdown.

Ok, yeah, there's the crash.

What if you
   physically disconnect and reconnect the device instead, or simply
  
  That fixes it, in fact it's the only way to fix it once it's broken by
  this bug.
 
  I didn't mean whether it would get the device working again, but rather
  whether you could kill it this way.
 
 Never seen it fail after a physical disconnection.

Ok.

And it has to include an enumeration, since just opening and closing it
(restarting the daemon) repeatedly seems to work?

  ... no, that doesn't help. Additional log from that shows a lot of what
  looks like error returns:
  
  Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: 
  acm_tty_close
  Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: 
  acm_ctrl_msg - rq 0x22, val 0x0, len 0x0, result -110
 
  Yeah, it seems your device firmware has crashed. It stops responding to
  control requests.
 
 Not surprising: I was fairly sure we were provoking a key firmware crash
 or something like that. This is a device with no support for flow
 control at all, after all, so I'm not terribly surprised that trying to
 do flow control confuses it.

Only weird thing is that it has been coping with those calls for a long
time. Only the slightly changed timings seems to have triggered this
issue.

  The above is all normal, but
 
  Oct 31 16:38:08 fold kern debug: : [  173.161489] cdc_acm 2-1:1.0: 
  acm_ctrl_msg - rq 0x22, val 0x3, len 0x0, result -62
 
  here's another timeout. It's dead.
 
 Again, not surprising.
 
  Did you get anywhere with trying to look at the device firmware?
 
 Look at it? Only Daniel Silverstone (Cc:ed) can do that. The only copy
 of the firmware I have is baked into the sealed key. :)

Ah, ok. I guess we need a new quirk then. There's already a quirk in the
driver to suppress error from when trying to set the control lines.

But that doesn't help this device, which happily accepts the requests
and then dies at random times.

Could you try two more things (too make sure line control is really the
culprit):

1. If you clear HUPCL in ekeyd so that the lines are never lowered, does
that fix the stability issue?

2. Can you verify that the patch below works? Did I use the correct
VID/PID? Could you provide lsusb -v output (for the capability flags)
as well?

Note that the patch is against the usb-linus branch of the usb tree,
which also contains another fix which could affect this device
(set_termios will now be called an extra time on every open). You also
need this one, which have not yet been applied:

http://marc.info/?l=linux-usbm=141520959813505w=2

Thanks,
Johan


From 76abc8a7eda9ea1978ee3527c773210c28332317 Mon Sep 17 00:00:00 2001
From: Johan Hovold jo...@kernel.org
Date: Wed, 5 Nov 2014 18:51:54 +0100
Subject: [PATCH] USB: cdc-acm: add quirk for control-line state requests

Add new quirk for devices that cannot handle control-line state
requests.

Note that we currently send these requests to all devices, regardless of
whether they claim to support it, but that errors are only logged if
support is claimed.

Since commit 0943d8ead30e (USB: cdc-acm: use tty-port dtr_rts), which
only changed the timings for these requests slightly, this has been
reported to cause occasional firmware crashes on Simtec Electronics
Entropy Key devices after re-enumeration. Enable the quirk for this
device.

Reported-by: Nix n...@esperi.org.uk
Cc: stable sta...@vger.kernel.org # v3.16
Signed-off-by: Johan Hovold jo...@kernel.org
---
 drivers/usb/class/cdc-acm.c | 14 --
 drivers/usb/class/cdc-acm.h |  2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 9d6495424b06..077d58ac3dcb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -148,8 +148,15 @@ static int acm_ctrl_msg(struct acm *acm, int request, int 
value,
 /* devices aren't required to support these requests.
  * the cdc acm descriptor tells whether they do...
  */
-#define acm_set_control(acm, control) \
-   acm_ctrl_msg(acm, USB_CDC_REQ_SET_CONTROL_LINE_STATE, control, NULL, 0)
+static inline int acm_set_control(struct acm 

Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-10-31 Thread Nix
Sorry for the delay: illness and work-related release time flurries.

On 24 Oct 2014, Johan Hovold told this:

 [ +CC: linux-usb ]

 On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
 On 22 Oct 2014, Johan Hovold outgrape:
 
  On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
  On 14 Oct 2014, Johan Hovold verbalised:
  
   On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
   I have checked: this code is being executed against a symlink that
   points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
   succeeding on the kernel I'm running now, but of course that's 3.16.5
   with this commit reverted...)
  
   You could verify that by enabling debugging in the cdc-acm driver and
   making sure that the corresponding control messages are indeed sent on
   close.
  
  I have a debugging dump at
  http://www.esperi.org.uk/~nix/temporary/cdc-acm.log; it's fairly
 
  What kernel were you using here? The log seems to suggest that it was
  generated with the commit in question reverted.
 
 Look now :) (the previous log is still there in cdc-acm-reverted.log.)

 Unfortunately, it seems the logs are incomplete. There are lots of
 entries missing (e.g. acm_tty_install when opening, but also some
 acm_submit_read_urb), some of which were there in your first log.

OK. What we have now in
http://www.esperi.org.uk/~nix/temporary/cdc-acm.log is a log from the
pristine upstream 3.16.6 kernel with cdc-acm debugging turned on and the
acm_tty_write - count stuff in acm_tty_write() disabled: I've increased
the dmesg buffer size so the top isn't being cut off any more. It took a
lot of boots for it to fail this time: about a dozen. The log contains
the boot that failed and the one before.

(To my uneducated eye, the initial traffic to/from the key looks *very*
different in the second boot. Something is clearly wrong by this point,
but that's not much of a surprise!)

 This contains two boots -- one on which the USB key worked, and the next
 (with an identical kernel) with which the key was broken. (I'm not sure
 whether this problem happens at startup or shutdown time, so it seemed
 best to provide both.)

 That's a good idea.

 Is it only after reboot you've seen the device fail?

So far.

  What if you
 physically disconnect and reconnect the device instead, or simply

That fixes it, in fact it's the only way to fix it once it's broken by
this bug.

 repeatedly open and close it?

Hm. Good idea.

... no, that doesn't help. Additional log from that shows a lot of what
looks like error returns:

Oct 31 16:38:03 fold kern debug: : [  168.135213] cdc_acm 2-1:1.0: acm_tty_close
Oct 31 16:38:08 fold kern debug: : [  173.130531] cdc_acm 2-1:1.0: acm_ctrl_msg 
- rq 0x22, val 0x0, len 0x0, result -110
Oct 31 16:38:08 fold kern debug: : [  173.130557] cdc_acm 2-1:1.0: 
acm_port_shutdown
Oct 31 16:38:08 fold kern debug: : [  173.131475] cdc_acm 2-1:1.0: acm_ctrl_irq 
- urb shutting down with status: -2
Oct 31 16:38:08 fold kern debug: : [  173.132474] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.132519] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.133473] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.133510] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.134471] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.134507] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.135471] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.135509] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.136472] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.136507] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.137471] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.137517] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.138471] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.138520] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.139469] cdc_acm 2-1:1.1: 
acm_write_bulk - len 0/1, status -2
Oct 31 16:38:08 fold kern debug: : [  173.139515] cdc_acm 2-1:1.1: acm_softint
Oct 31 16:38:08 fold kern debug: : [  173.140470] cdc_acm 2-1:1.1: 
acm_read_bulk_callback - urb 0, len 0
Oct 31 16:38:08 fold kern debug: : [  173.140491] cdc_acm 2-1:1.1: 
acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.141469] cdc_acm 2-1:1.1: 
acm_read_bulk_callback - urb 1, len 0
Oct 31 16:38:08 fold kern debug: : [  173.141489] cdc_acm 2-1:1.1: 
acm_read_bulk_callback - non-zero urb status: -2
Oct 31 16:38:08 fold kern debug: : [  173.142469] cdc_acm 2-1:1.1: 

Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-10-24 Thread Johan Hovold
[ +CC: linux-usb ]

On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
 On 22 Oct 2014, Johan Hovold outgrape:
 
  On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
  On 14 Oct 2014, Johan Hovold verbalised:
  
   On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
   I have checked: this code is being executed against a symlink that
   points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
   succeeding on the kernel I'm running now, but of course that's 3.16.5
   with this commit reverted...)
  
   You could verify that by enabling debugging in the cdc-acm driver and
   making sure that the corresponding control messages are indeed sent on
   close.
  
  I have a debugging dump at
  http://www.esperi.org.uk/~nix/temporary/cdc-acm.log; it's fairly
 
  What kernel were you using here? The log seems to suggest that it was
  generated with the commit in question reverted.
 
 Look now :) (the previous log is still there in cdc-acm-reverted.log.)

Unfortunately, it seems the logs are incomplete. There are lots of
entries missing (e.g. acm_tty_install when opening, but also some
acm_submit_read_urb), some of which were there in your first log.

Also you can disable the acm_tty_write - count output, which isn't
really useful here.

 This contains two boots -- one on which the USB key worked, and the next
 (with an identical kernel) with which the key was broken. (I'm not sure
 whether this problem happens at startup or shutdown time, so it seemed
 best to provide both.)

That's a good idea.

Is it only after reboot you've seen the device fail? What if you
physically disconnect and reconnect the device instead, or simply
repeatedly open and close it?

Johan
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.16.1 BISECTED REGRESSION]: Simtec Entropy Key (cdc-acm) broken in 3.16

2014-10-24 Thread Nix
On 24 Oct 2014, Johan Hovold told this:

 [ +CC: linux-usb ]
 On Wed, Oct 22, 2014 at 04:36:59PM +0100, Nix wrote:
 On 22 Oct 2014, Johan Hovold outgrape:
 
  On Wed, Oct 22, 2014 at 10:31:17AM +0100, Nix wrote:
  On 14 Oct 2014, Johan Hovold verbalised:
  
   On Sun, Oct 12, 2014 at 10:36:30PM +0100, Nix wrote:
   I have checked: this code is being executed against a symlink that
   points to /dev/ttyACM0, and the tcsetattr() succeeds. (At least, it's
   succeeding on the kernel I'm running now, but of course that's 3.16.5
   with this commit reverted...)
  
   You could verify that by enabling debugging in the cdc-acm driver and
   making sure that the corresponding control messages are indeed sent on
   close.
  
  I have a debugging dump at
  http://www.esperi.org.uk/~nix/temporary/cdc-acm.log; it's fairly
 
  What kernel were you using here? The log seems to suggest that it was
  generated with the commit in question reverted.
 
 Look now :) (the previous log is still there in cdc-acm-reverted.log.)

 Unfortunately, it seems the logs are incomplete. There are lots of
 entries missing (e.g. acm_tty_install when opening, but also some
 acm_submit_read_urb), some of which were there in your first log.

Curious. It's a straight cut-and-paste from the syslog. Maybe the kmsg
buffer overflowed, but I start the ekey daemon *after* I start syslogd,
so that seems unlikely.

I'll have another look.

 Is it only after reboot you've seen the device fail?

Yes (though sometimes after reboot of an unaffected kernel into an
affected one! i.e. sometimes the first boot into an affected kernel is
broken).

  What if you
 physically disconnect and reconnect the device instead,

That makes it work.

 or simply
 repeatedly open and close it?

IIRC -- but I'll have to check this tomorrow when I look at the log
problem, so don't take it as gospel -- that doesn't affect it: I can
stop and restart the daemon repeatedly and, if it wasn't working before,
it's not working afterwards: if it was working before, it'll be working
afterwards.

-- 
NULL  (void)
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html