RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
So I am back with more tests on this problem. Intel itself told us it is a problem on the driver for the XHCI host controller. I will put some stuff here now what I goo from Intel: Let me explain why I came back to you with this problem. We have already tried the same thing with an echo device from Ellisys to see if the problem came from our board. And we got also with this very simple device (just echoing) the same problem. XHCI host controller do not respond anymore So it has nothing to do with our board and it is a bug exisiting on the current Linux kernel Here some short mail traiffic between Intel and our company. I removed the helpless parts First Mail from us : We have a problem with a CDC device that simulates a serial interface. Problem is that the host stops CDC BULK IN transfers after a while on those USB2.0 ports that are connected to the chip set's internal rate matching hub, others seem to work fine. We tried it with our own host hardware (using PCH 82QM87 [Lynx Point] chip set) as well as with your CRB Emeral Lake 2. We also sent our hardware to a consulting company (http://thesycon.de/eng/home.shtml) for analysis. They tried our CDC device and their own CDC echo device, both with same result. Tests have been done with unchanged Kubuntu 3.13.0-24-generic and our own in-house Linux (kernel 3.6 with RT patches), without any differences. It also makes no difference if the device is connected directly or via external USB2.0 hub, same behavior. We realized that it has to do with file open/close on Linux because if we open /dev/ttyACM0 once communication works fine for hours but if we re-open /dev/ttyACM0 for each message CDC BULK IN transfer is stopped within seconds (CDC BULK OUT still works). The problem could be reproduced with simply cat and echo in a loop as well as with an own written tool that just opens /dev/ttyACM0, writes something, expects an answer and closes the file again in a loop. Please find attached a log file made with USB logger from Ellisys (software is available for free at Ellisys homepage if necessary: http://www.ellisys.com/products/usbex200/download.php) that contains whole CDC device transfer from being plugged in until CDC BULK IN transfer has been stopped. Furthermore find attached Thesycon’s device descriptor file. Both CDC devices differs in a way that our CDC device has only one interface whereas Thesycon’s has two. So far we are not sure if the problem lays inside the host controller hardware or any of the Linux device drivers. Do you have ever heard about that problem, any suggestions, bug fixes or work arounds? ... to prevent confusion please note: Problem can only be reproduced on those USB2.0 ports that are NOT HANDLED by chip set's rate matching hub. If we disable USB3.0 within the BIOS (or remove USB3.0 Linux drivers) all ports are handled via rate matching hub and therefore all ports seem to work, in that case the error cannot be reproduced anymore at any port. Resposes from Intel Sorry for the delay. I was not able to get in contact with Sarah Sharp like you told me, see seems to be out on vacations. Anyway, I see the other issue was rejected in the Linux channel because they don't deal with peripheral drivers, just graphics drivers. I think we can veer towards the Linux community at this point. As we've discussed earlier, since the issue does not happen under the Windows environment, this more of linux driver issue that hardware. We can conclude the same if we take into consideration that you see the same issue with the Intel CRB. Most of the times, Linux issues related to drivers have been already addressed and solved in the community, that's we strongly encourage you to ping them about it. I've found a USB Linux drivers website that offers device driver support and it lists a bunch of USB devices as well as CDC class drivers. Perhaps this could help you out. http://www.linux-usb.org/devices.html Best Regards, From: andreaskasber...@hotmail.com To: st...@rowland.harvard.edu CC: sarah.a.sh...@linux.intel.com; pe...@stuge.se; linux-usb@vger.kernel.org; mathias.ny...@intel.com Subject: RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK Date: Thu, 13 Feb 2014 16:15:13 + Does my log means it has nothing to do with kernel itself ? Maybe you're experiencing a problem with link power management. Some changes were just merged into Greg KH's development tree (the usb-linus branch), and they should appear in the next 3.14-rc release. You could try either one of those. Or you could try building a kernel without CONFIG_PM_RUNTIME, which will disable link power management. I will give the latest kernel a try in some days. The test with disabled power managment I have done already but maybe something is getting better with the new kernel anyway. Andreas -- To
RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
Yes I tried Sarah's suggestions and disabled power management. but same results. The time how long it takes until XHCI stops responding is sometimes only 1 second, sometimes several minutes Subject: Re: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK From: oneu...@suse.de To: andreaskasber...@hotmail.com CC: st...@rowland.harvard.edu; sarah.a.sh...@linux.intel.com; pe...@stuge.se; linux-usb@vger.kernel.org; mathias.ny...@intel.com Date: Wed, 10 Sep 2014 11:36:35 +0200 On Wed, 2014-09-10 at 07:04 +, Kasberger Andreas wrote: So I am back with more tests on this problem. Intel itself told us it is a problem on the driver for the XHCI host controller. I will put some stuff here now what I goo from Intel: Let me explain why I came back to you with this problem. We have already tried the same thing with an echo device from Ellisys to see if the problem came from our board. And we got also with this very simple device (just echoing) the same problem. XHCI host controller do not respond anymore So it has nothing to do with our board and it is a bug exisiting on the current Linux kernel Here some short mail traiffic between Intel and our company. I removed the helpless parts Your trace hasn't made it through the list. Anyway, have you tried following Sarah's suggestion and tried without LPM? Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe linux-usb -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
Does my log means it has nothing to do with kernel itself ? Maybe you're experiencing a problem with link power management. Some changes were just merged into Greg KH's development tree (the usb-linus branch), and they should appear in the next 3.14-rc release. You could try either one of those. Or you could try building a kernel without CONFIG_PM_RUNTIME, which will disable link power management. I will give the latest kernel a try in some days. The test with disabled power managment I have done already but maybe something is getting better with the new kernel anyway. Andreas -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size 0040. As far as I understand the packet 7092 in wireshark with URB data length 128 should not possible? What happens at such packets sizes? Or does wireshark just joking me -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
I saw the in the device endpoint ep82/ep83 at wMaxPacketSize a size 0040. As far as I understand the packet 7092 in wireshark with URB data length 128 should not possible? What happens at such packets sizes? Or does wireshark just joking me -- Wireshark adds 64 bytes of overhead to each packet it captures. Yes Alan, overall wireshark shows 192 byte for that packet. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
Hello Peter, one short remark Application-specific or vendor-specific are often frowned upon in other contexts but if the protocol is documented publically then it is a great way to take advantage of all that USB offers, and it is explicitly supported by the specification. Use bDeviceClass or bInterfaceClass 0xff. Certainly we try to use only the CDC/ACM standard and not using any custom communication Best regards Andreas-- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PROBLEM: XHCI Host Controller on Intel Panther Point with CDC/ACM dead after massive NAK
Hello Peter, many many thanks for your long and detailed answer. On the protocol design: First, using CDC-ACM means sacrificing all structured communication offered by the USB packet bus and settling for such primitive use of USB is not a decision that should be made lightly. Almost all applications can benefit quite significantly both in end-user usability and in ease of implementation from an application-specific protocol which takes advantage of what USB offers. Yes you are absolutely right. No the best idea. The usage for this protocol is to make firmware updates. In normal life it is a simple keyboard. And sending out bulk messages is the great advantage of CDC/ACM What is still puzzling me is the fact that the host controller stops any communication. That means there is really electrically no communication (bulk_out) from HC to device anymore. It seems that the host controller has shut down communication port to one particular device. unbind and bind host controller will solve the problem But anyway I will try do my best to find out the root cause of mis-communication between between both sides. You mention device-side buffering and that the device at some point can't accept anything more from the host. With USB this means that you must ensure that the host will know when it must not send more. I thought sending NAK as response for each package is the correct way to tell the host not now but maybe later.Please try again. After the internal device queue is not completely full namyore the comunication is done in normal way. But after some time HC stops completely any communication. In real life it means a huge firmware update takes long time and so it could happens the internal device queue is full. But a broken firmware update is a bad thing The USB way to do this, were you using an application-specific protocol instead of serial port simulation, would be to stall the endpoint. Unfortunately CDC-ACM doesn't allow doing that. Ok. I will think about this if another way is possible So you have to include some kind of in-band signalling for this. :\ This is just one reason why ACM is a poor choice for when you need structured communication. Anyway many many thanks for your precious time and detailed answers. My conclusions and todo : 1. Thinking about design 2. Still try to find out the main reason why host controller shutdown connection Arrrghhh Just saw also USB 2.0 has some problems. Host controller is resetting after some hours but not getting in work state again. I hope in future I can make more sensible contributions to the list Best Regards Andreas-- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html