RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Kasberger Andreas
So I am back with more tests on this problem.
Intel itself told us it is a problem on the driver for the XHCI host 
controller. I will put some stuff here now what I goo from Intel:

Let me explain why I came back to you with this problem.
We have already tried the same thing with an echo device from Ellisys to see 
if the problem came from our board. 
And we got also with this very simple device (just echoing) the same problem.
XHCI host controller do not respond anymore

So it has nothing to do with our board and it is a bug exisiting on the current 
Linux kernel

Here some short mail traiffic between Intel and our company. I removed the 
helpless parts

First Mail from us :

We have a problem with a CDC device that simulates a serial interface. Problem 
is that the host stops CDC BULK IN transfers after a while on those USB2.0 
ports that are connected to the chip set's internal rate matching hub, others 
seem to work fine.
We tried it with our own host hardware (using PCH 82QM87 [Lynx Point] chip set) 
as well as with your CRB Emeral Lake 2. We also sent our hardware to a 
consulting company (http://thesycon.de/eng/home.shtml) for analysis. They tried 
our CDC device and their own CDC echo device, both with same result. Tests 
have been done with unchanged Kubuntu 3.13.0-24-generic and our own in-house 
Linux (kernel 3.6 with RT patches), without any differences. It also makes no 
difference if the device is connected directly or via external USB2.0 hub, same 
behavior.
We realized that it has to do with file open/close on Linux because if we open 
/dev/ttyACM0 once communication works fine for hours but if we re-open 
/dev/ttyACM0 for each message CDC BULK IN transfer is stopped within seconds 
(CDC BULK OUT still works). The problem could be reproduced with simply cat 
and echo in a loop as well as with an own written tool that just opens 
/dev/ttyACM0, writes something, expects an answer and closes the file again in 
a loop.
Please find attached a log file made with USB logger from Ellisys (software is 
available for free at Ellisys homepage if necessary: 
http://www.ellisys.com/products/usbex200/download.php) that contains whole CDC 
device transfer from being plugged in until CDC BULK IN transfer has been 
stopped. Furthermore find attached Thesycon’s device descriptor file. Both CDC 
devices differs in a way that our CDC device has only one interface whereas 
Thesycon’s has two.
So far we are not sure if the problem lays inside the host controller hardware 
or any of the Linux device drivers. Do you have ever heard about that problem, 
any suggestions, bug fixes or work arounds?
... to prevent confusion please note: Problem can only be reproduced on those 
USB2.0 ports that are NOT HANDLED by chip set's rate matching hub.

If we disable USB3.0 within the BIOS (or remove USB3.0 Linux drivers) all ports 
are handled via rate matching hub and therefore all ports seem to work, in that 
case the error cannot be reproduced anymore at any port.

Resposes from Intel
Sorry for the delay. I was not able to get in contact with Sarah Sharp like you 
told me, see seems to be out on vacations. Anyway,  I see the other issue was 
rejected in the Linux channel because they don't deal with peripheral drivers, 
just graphics drivers. I think we can veer towards the Linux community at this 
point. As we've discussed earlier, since the issue does not happen under the 
Windows environment, this more of linux driver issue that hardware.


We can conclude the same if we take into consideration that you see the same 
issue with the Intel CRB. Most of the times, Linux issues related to drivers 
have been already addressed and solved in the community, that's we strongly 
encourage you to ping them about it.


I've found a USB Linux drivers website that offers device driver support and it 
lists a bunch of USB devices as well as CDC class drivers. Perhaps this could 
help you out.


http://www.linux-usb.org/devices.html
Best Regards,


 From: andreaskasber...@hotmail.com
 To: st...@rowland.harvard.edu
 CC: sarah.a.sh...@linux.intel.com; pe...@stuge.se; linux-usb@vger.kernel.org; 
 mathias.ny...@intel.com
 Subject: RE: PROBLEM: XHCI Host Controller on Intel Panther Point with 
 CDC/ACM dead after massive NAK
 Date: Thu, 13 Feb 2014 16:15:13 +

 Does my log means it has nothing to do with kernel itself ?

 Maybe you're experiencing a problem with link power management. Some
 changes were just merged into Greg KH's development tree (the usb-linus
 branch), and they should appear in the next 3.14-rc release. You could
 try either one of those. Or you could try building a kernel without
 CONFIG_PM_RUNTIME, which will disable link power management.


 I will give the latest kernel a try in some days. The test with disabled 
 power managment I have done already but maybe something is getting better 
 with the new kernel anyway.


 Andreas
  --
To 

RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-09-10 Thread Kasberger Andreas
Yes I tried Sarah's suggestions and disabled power management.
but same results. 
The time how long it takes until XHCI stops responding is sometimes only 1 
second, sometimes several minutes


 Subject: Re: PROBLEM: XHCI Host Controller on Intel Panther Point with 
 CDC/ACM dead after massive NAK
 From: oneu...@suse.de
 To: andreaskasber...@hotmail.com
 CC: st...@rowland.harvard.edu; sarah.a.sh...@linux.intel.com; pe...@stuge.se; 
 linux-usb@vger.kernel.org; mathias.ny...@intel.com
 Date: Wed, 10 Sep 2014 11:36:35 +0200

 On Wed, 2014-09-10 at 07:04 +, Kasberger Andreas wrote:
 So I am back with more tests on this problem.
 Intel itself told us it is a problem on the driver for the XHCI host 
 controller. I will put some stuff here now what I goo from Intel:

 Let me explain why I came back to you with this problem.
 We have already tried the same thing with an echo device from Ellisys to 
 see if the problem came from our board.
 And we got also with this very simple device (just echoing) the same problem.
 XHCI host controller do not respond anymore

 So it has nothing to do with our board and it is a bug exisiting on the 
 current Linux kernel

 Here some short mail traiffic between Intel and our company. I removed the 
 helpless parts

 Your trace hasn't made it through the list. Anyway, have you tried
 following Sarah's suggestion and tried without LPM?

 Regards
 Oliver


 --
 To unsubscribe from this list: send the line unsubscribe linux-usb in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  --
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2014-02-14 Thread Kasberger Andreas
unsubscribe linux-usb --
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-13 Thread Kasberger Andreas
 Does my log means it has nothing to do with kernel itself ?

 Maybe you're experiencing a problem with link power management. Some
 changes were just merged into Greg KH's development tree (the usb-linus
 branch), and they should appear in the next 3.14-rc release. You could
 try either one of those. Or you could try building a kernel without
 CONFIG_PM_RUNTIME, which will disable link power management.


I will give the latest kernel a try in some days. The test with disabled power 
managment I have done already but maybe something is getting better with the 
new kernel anyway.


Andreas   --
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-10 Thread Kasberger Andreas
I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size 0040. 

As far as I understand the packet 7092 in wireshark with URB data length 128 
should not possible? What happens at such packets sizes? Or does wireshark just 
joking me  --
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-10 Thread Kasberger Andreas
 I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size 0040.

 As far as I understand the packet 7092 in wireshark with URB data
 length 128 should not possible? What happens at such packets sizes?
 Or does wireshark just joking me --

 Wireshark adds 64 bytes of overhead to each packet it captures.


Yes Alan, overall wireshark shows 192 byte for that packet. 
  --
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-05 Thread Kasberger Andreas
Hello Peter,

one short remark

 Application-specific or vendor-specific are often frowned upon in
 other contexts but if the protocol is documented publically then it
 is a great way to take advantage of all that USB offers, and it is
 explicitly supported by the specification. Use bDeviceClass or
 bInterfaceClass 0xff.

Certainly we try to use only the CDC/ACM standard and not using any custom 
communication 

Best regards
   Andreas--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK

2014-02-04 Thread Kasberger Andreas
Hello Peter,

many many thanks for your long and detailed answer. 

 On the protocol design:

 First, using CDC-ACM means sacrificing all structured communication
 offered by the USB packet bus and settling for such primitive use of
 USB is not a decision that should be made lightly. Almost all
 applications can benefit quite significantly both in end-user
 usability and in ease of implementation from an application-specific
 protocol which takes advantage of what USB offers.


Yes you are absolutely right. No the best idea. The usage for this protocol is 
to make firmware updates. In normal life it is a simple keyboard. And sending 
out bulk messages is the great advantage of CDC/ACM


What is still puzzling me is the fact that the host controller stops any 
communication.
That means there is really electrically no communication (bulk_out) from HC to 
device anymore. It seems that the host controller has shut down communication 
port to one particular device. unbind and bind host controller will solve the 
problem

But anyway I will try do my best to find out the root cause of 
mis-communication between between both sides.


 You mention device-side buffering and that the device at some point
 can't accept anything more from the host. With USB this means that
 you must ensure that the host will know when it must not send more.


I thought sending NAK as response for each package is the correct way to tell 
the host not now but maybe later.Please try again.  After the internal device 
queue is not completely full namyore the comunication is done in normal way. 
But after some time HC stops completely any communication. 
In real life it means a huge firmware update takes long time and so  it could 
happens the internal device  queue is full. But a broken firmware update is a 
bad thing


 The USB way to do this, were you using an application-specific
 protocol instead of serial port simulation, would be to stall the
 endpoint. Unfortunately CDC-ACM doesn't allow doing that.

Ok. I will think about this if another way is possible


 So you have to include some kind of in-band signalling for this. :\

 This is just one reason why ACM is a poor choice for when you need
 structured communication.


Anyway many many thanks for your precious time and detailed answers.

My conclusions and todo :

1. Thinking about design
2. Still try to find out the main reason why host controller shutdown connection

Arrrghhh Just saw also USB 2.0 has some problems. Host controller is resetting 
after some hours but not getting in work state again.

I hope in future I can make more sensible contributions to the list

Best Regards
   Andreas--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html