Re: Testing for hardware bug in EHCI controllers

2013-03-26 Thread Noone Nowhere
Hello again Alan, we read that the patch broke. Damn, our fear was
justified. If we are right, it will break again because it affects
various
devices using different busses, each one with its limitations and
defects. That's why we wrote to Sarah(and got no reply yet) that
vendors
should fix their h/w bugs in-house or give open source developers full
documentation (like nda datasheets,bios spec,bios spec up)
to cope with their buggy hardware!

 Probably it is cured.  But something is still wrong, even though it may
 be unrelated.

Since something is broken and the last program tested exits at A50M,
we have some questions for today prior testing A50M, moschip and
ICH4M:

1)Do we need to apply a third
patch(http://marc.info/?l=linux-usbm=136379040918329w=2) [it's upper
part to be specific]? If yes, please make
a cumulative patch for 3.8 series before we mess things up and post it
under this subject. At the same post, add the final test program with
program version number printing as well as a do NOT unplug the usb
and a leave the program to go up to 1000 message. Also point out that
the
device must not be inside the unusual devs file as you told us. This
will make it easier for others to follow and help you.

2)If A50M still fails, do you have time to analyze our usbmon output?

3)Has any type of transfer corruption been observed from this bug?
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-26 Thread Alan Stern
On Tue, 26 Mar 2013, Noone Nowhere wrote:

 Hello again Alan, we read that the patch broke.

What patch?  The one that works around the Intel/AMD hardware problem?  
Yes, it had a mistake, which has now been fixed.  (Although the fix has 
not yet been released in a 3.8.stable kernel.)

  Damn, our fear was
 justified. If we are right, it will break again because it affects
 various
 devices using different busses, each one with its limitations and
 defects.

That reasoning doesn't make sense.  Practically every line of code in 
ehci-hcd.c and its associated files affects various devices using 
different buses.  If your fear was justified, the driver would never 
work at all.

  That's why we wrote to Sarah(and got no reply yet) that

Your message to Sarah was little more than a rant about issues that are 
out of her control and took place long before she started working at 
Intel.  Surely you didn't expect her to have a constructive reply?

 vendors
 should fix their h/w bugs in-house or give open source developers full
 documentation (like nda datasheets,bios spec,bios spec up)
 to cope with their buggy hardware!

Well, it's questionable whether this really should be called a bug.  
Clearly it is undesirable behavior, but strictly speaking it is not in
violation of the EHCI spec.

  Probably it is cured.  But something is still wrong, even though it may
  be unrelated.
 
 Since something is broken and the last program tested exits at A50M,
 we have some questions for today prior testing A50M, moschip and
 ICH4M:
 
 1)Do we need to apply a third
 patch(http://marc.info/?l=linux-usbm=136379040918329w=2) [it's upper
 part to be specific]?

Are you considering the hardware workaround and the bug-detection patch
as the first two?  Then yes, a third patch is needed, but not the one
you mentioned.  The third patch is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=d714aaf649460cbfd5e82e75520baa856b4fa0a0

Oddly, I can't find the patch submission anywhere in the email
archives.  Maybe I forgot to CC: linux-usb when sending it in.

 If yes, please make
 a cumulative patch for 3.8 series before we mess things up and post it
 under this subject.

It is below.

 At the same post, add the final test program with
 program version number printing as well as a do NOT unplug the usb
 and a leave the program to go up to 1000 message. Also point out that
 the
 device must not be inside the unusual devs file as you told us. This
 will make it easier for others to follow and help you.

You can easily make these changes and post the result yourself.

 2)If A50M still fails, do you have time to analyze our usbmon output?

Yes.

 3)Has any type of transfer corruption been observed from this bug?

Not as far as I know.

Alan Stern



Index: 3.8/drivers/usb/host/ehci-q.c
===
--- 3.8.orig/drivers/usb/host/ehci-q.c
+++ 3.8/drivers/usb/host/ehci-q.c
@@ -547,7 +547,7 @@ qh_completions (struct ehci_hcd *ehci, s
if (stopped != 0 || hw-hw_qtd_next == EHCI_LIST_END(ehci)) {
switch (state) {
case QH_STATE_IDLE:
-   qh_refresh(ehci, qh);
+// qh_refresh(ehci, qh);
break;
case QH_STATE_LINKED:
/* We won't refresh a QH that's linked (after the HC
@@ -1170,7 +1170,7 @@ static void single_unlink_async(struct e
struct ehci_qh  *prev;
 
/* Add to the end of the list of QHs waiting for the next IAAD */
-   qh-qh_state = QH_STATE_UNLINK;
+   qh-qh_state = QH_STATE_UNLINK_WAIT;
if (ehci-async_unlink)
ehci-async_unlink_last-unlink_next = qh;
else
@@ -1213,9 +1213,19 @@ static void start_iaa_cycle(struct ehci_
 
/* Do only the first waiting QH (nVidia bug?) */
qh = ehci-async_unlink;
-   ehci-async_iaa = qh;
-   ehci-async_unlink = qh-unlink_next;
-   qh-unlink_next = NULL;
+
+   /*
+* Intel (?) bug: The HC can write back the overlay region
+* even after the IAA interrupt occurs.  In self-defense,
+* always go through two IAA cycles for each QH.
+*/
+   if (qh-qh_state == QH_STATE_UNLINK_WAIT) {
+   qh-qh_state = QH_STATE_UNLINK;
+   } else {
+   ehci-async_iaa = qh;
+   ehci-async_unlink = qh-unlink_next;
+   qh-unlink_next = NULL;
+   }
 
/* Make sure the unlinks are all visible to the hardware */
wmb();
@@ -1232,6 +1242,7 @@ static void start_iaa_cycle(struct ehci_
 static void end_unlink_async(struct ehci_hcd *ehci)
 {
struct ehci_qh  *qh;
+   __hc32  tok1, tok2;
 
if (ehci-has_synopsys_hc_bug)

Re: Testing for hardware bug in EHCI controllers

2013-03-19 Thread Alan Stern
On Tue, 19 Mar 2013, Noone Nowhere wrote:

  That depends on where the bug is.  If it is in the stick then testing
  a different brand of flash drive would help.
 
 Does it need an unusual dev entry to limit max sectors? It is the
 first USB stick that arrived here back in 2003(?) and uses SLC flash!!
 We do not think we can find SLC USB stick any more. We will test
 affected hosts with a much newer USB stick.

I don't know what your old USB stick needs.  If it does have an 
unusual_devs entry to limit max_sectors then the test program probably 
won't work with it.

  The test program does a bunch of error checking but practically no
  error recovery.  If almost any tiny little thing goes wrong, the
  program exits.  You might think that nothing else should go wrong --
  but the fact is that errors do happen.
 
 Is the controller supposed to do such error recovery or the OS?

The OS.

  As far
 as should is concerned you are completely right as A50M still breaks
 with the 2 patches applied on 3.8.3 despite that we use a 16 times
 larger USB stick. Breakage though occurs at the program output only:
 
 [  215.244713] usb-storage 4-3:1.0: disconnect by usbfs
 [  229.774169] ehci-pci :00:12.2: shutdown urb 88005111f9c0
 ep1in-bulk 600x
 
 100
 200
 300
 400
 500
 URB timed out; bug may be present
 Wrong URB completed
 
 , is the bug curred or no?

Probably it is cured.  But something is still wrong, even though it may 
be unrelated.

  We are puzzled with this laptop, why does
 the program detect the bug only on this one?

Remember, the program doesn't _detect_ the bug.  The kernel patch that
goes along with the program does the detection.  The program only tries
to _trigger_ the bug.

  It surely has the bug but
 it breaks on its own way or breaks because of additional errata(linux
 has already a USB workaround for isochronous transfers when ASPM is
 used on this chip). If any sort of usbmon dump is needed, provide
 instructions and we can test it at once.

Instructions for usbmon are in the kernel source file 
Documentation/usb/usbmon.txt.

  As said we have docs and we
 shall read its errata. If nothing usefull is found, we can try to poke
 with the controller settings. Since we mentioned ASPM, before you ask
 it is disabled for all devices by ACPI FADT table.
 
  That's what I would like to do.  But nobody knows what
  vendor/device/revision values should be in the blacklist.
 
 We should not harry up and give some time for more reports. Tonight we
 will test an ICH-4M and we can buy ATI IXP mainboards for testing
 within this week if a table can be made. Also it is possible to buy an
 ALi controller. We also have a HUMAX DVB-S2 with an external USB port
 and we would love to poke with it provided that we can save all the
 settings from telnet. It must be using the bcma-hcd driver and we are
 looking forward to test it.

Okay.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-18 Thread Alan Stern
On Mon, 18 Mar 2013, Noone Nowhere wrote:

  I can't fix the block count issue.  That's not a bug in the program;
  it's a bug somewhere else.
 
 Who can fix it? Should we test another stick?

That depends on where the bug is.  If it is in the stick then testing 
a different brand of flash drive would help.

 A workaround for the bug has already been submitted to the various
 stable kernel series.  The workaround may decrease performance slightly
 for things like large data transfers to/from a mass-storage device.  I
 haven't tried to measure the effect, and it will be different on
 different platforms anyway.
 
 Will UAS(when operational) on 2.0 be affected?

It could well be.

  Evidently hardware from a large number of vendors is affected by this
  bug.  You have to wonder if they all got the intellectual property
  from a common source.
 
 Or better who copies/steals from whom?

:-)

  It will apply to a 3.8 kernel.  After applying that patch, you should
  find that the EHCI bug doesn't show up.
 
 Will retest all boards with a kernel having the patch.
 
 Testing only the A50M, in dmesg we get:
 
 [  139.795681] ehci-pci :00:12.2: IAA with IAAD still set? , 150 times
 [  141.814383] ehci-pci :00:12.2: shutdown urb 88004cdf5d80
 ep1in-bulk , 600 times

The IAA with IAAD messages are probably the result of a harmless
oversight in the workaround patch.  I believe that the patch below,
applied on top of the workaround, will eliminate them.  Alternatively,
you can simply run the tests with a kernel that has CONFIG_USB_DEBUG
disabled.

 what is alarming is that the test program still fails and prints:
 
 100
 200
 300
 URB timed out; bug may be present
 Wrong URB completed
 
 , with the patch applied the program shouldn't have exited on its own, right?

Should is a slippery concept.  The hardware bug we're concerned about 
should have been fixed in Intel's original hardware design.

The test program does a bunch of error checking but practically no
error recovery.  If almost any tiny little thing goes wrong, the
program exits.  You might think that nothing else should go wrong --
but the fact is that errors do happen.

If you would like to spend time and energy to figure out the cause of 
this particular error, go right ahead.  I'll help as much as I can.

 Finally the patch seems somehow wrong because it makes generic
 assumptions. We shouldn't slowdown the USB controller of a ppc based
 PS3 or XBOX360 for a PCI(add in cards) and a x86(chipsets) specific
 issue unless otherwise proven.

That is a very strong statement, and I do not totally agree with it.

For one thing, we do not know that this hardware issue is specific to
x86 chipsets or PCI add-on cards.  In fact, until people began sending
in the results of their testing, I had no way to know which EHCI
controllers were affected.  The only safe course was to assume that
they all are.

  It is not right to waste an ARM tablet
 with an AHB USB controller. Are you sure it will not brake anything
 else? Even in x86 you slowdown half of our machines because the rest
 are broken. If we were asked, we would blacklist controllers based on
 pairs of PCI device ID and revision.

That's what I would like to do.  But nobody knows what 
vendor/device/revision values should be in the blacklist.

  That is mandatory because
 blacklisting only vendor will work for intel but not for VIA as the
 same PCI ID is used on many different chipsets and rev has to be
 additionally used. When a known pair is met a debug only message can
 be printed and when an unknown pair is met a normal print asking user
 to send the results to this list.

No.  Because of other changes to the driver, failure to apply the
workaround when it is needed may lead to I/O errors or other problems.  
We don't want that to happen.

Alan Stern



Index: 3.8/drivers/usb/host/ehci-hcd.c
===
--- 3.8.orig/drivers/usb/host/ehci-hcd.c
+++ 3.8/drivers/usb/host/ehci-hcd.c
@@ -748,7 +748,7 @@ static irqreturn_t ehci_irq (struct usb_
/* guard against (alleged) silicon errata */
if (cmd  CMD_IAAD)
ehci_dbg(ehci, IAA with IAAD still set?\n);
-   if (ehci-async_iaa)
+   if (ehci-iaa_in_progress)
COUNT(ehci-stats.iaa);
end_unlink_async(ehci);
}
Index: 3.8/drivers/usb/host/ehci-q.c
===
--- 3.8.orig/drivers/usb/host/ehci-q.c
+++ 3.8/drivers/usb/host/ehci-q.c
@@ -1194,8 +1194,9 @@ static void start_iaa_cycle(struct ehci_
 * Do nothing if an IAA cycle is already running or
 * if one will be started shortly.
 */
-   if (ehci-async_iaa || ehci-async_unlinking)
+   if (ehci-iaa_in_progress || ehci-async_unlinking)
return;
+   ehci-iaa_in_progress = true;
 
/* If the controller isn't running, we don't 

Re: Testing for hardware bug in EHCI controllers

2013-03-15 Thread Noone Nowhere
Hello Alan again, we will clear things up. Sorry for the late reply,
dark forces delayed us.


 Intel probably doesn't care, because they don't make the affected
 chipsets any more.  I don't know what AMD is currently making.  Do you?

Yes we do. One of our ICH5 board uses ICH5 with SL7YC SL spec, meaning
it is an embedded chipset with extended availability so they must care
as it is paired with 865 and 875 series which are listed in
http://ark.intel.com/#DesktopProducts-DesktopChipsets and
http://ark.intel.com/products/27674/Intel-82801EB-IO-Controller for
ICH5. For AMD read below.

 I don't know.  In my experience, 1000 iterations have been enough for
 the bug to show up.  But I have tested on only two computers.

we have run it up to 2000 for peace of mind each time with a specific
USB stick on 12 different EHCI USB hosts. On VT8235M it took ages to
go to 1000. A human readable timestamp could be used.

 What is A50M?

It is an AMD southbridge codenamed Hudson-M produced this period and
interestingly it is affected by the bug despite that SB700 and SB950
work fine. We have full docs for this chipset.

To sum up printing a start up message to leave it run up to 1000 is
necessary as well as a warning to stop it using control+Z in avoid to
avoid dmesg spamming. Timestamps is another improvement and fixing the
block count issue is the last one as on 2 controllers (moschip and
SB850) the program exited immediately after initialization. First we
must deal with those 2 controllers and check all hosts with the new
version again. Since you prefer the logs with the current program, no
problem. You will receive a message after this one with all the logs
to keep you happy. Despite that, notice that when the program will be
able to run on all hosts, we shall retest and resend the results. Are
there any performance or reliability issues from this bug except the
original DVB card detection problem? Will the workaround affect
performance?
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-15 Thread Noone Nowhere
Hello ,here are the logs from the failing EHCI hosts(chipset-SL
Spec/revision@mainboard-system):


ICH4-SL66K@FIC VC37
00:1d.7 USB Controller [0c03]: Intel Corporation 82801DB/DBM
(ICH4/ICH4-M) USB2 EHCI Controller [8086:24cd] (rev 01) (prog-if 20
[EHCI])
Subsystem: FIRST INTERNATIONAL Computer Inc Device [1509:9016]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 23
Region 0: Memory at e208 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: ehci_pci
00: 86 80 cd 24 06 00 90 02 01 20 03 0c 00 00 00 00
10: 00 00 08 e2 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 16 90
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 04 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 c2 c9 00 00 00 00 0a 00 80 20 00 00 00 00
60: 20 20 7f 00 00 00 00 00 01 00 00 00 00 00 08 c0
70: 00 00 c7 0f 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3f 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 78 bf 1f 00 88 83 00 00 60 0f 00 00 06 00 00 00

[  179.101048] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 9d00
[  179.112792] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.113668] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.114419] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.122167] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.124043] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.128416] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.132920] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 9d00
[  179.137275] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.138166] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.139544] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.141417] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.142285] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.152790] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.154411] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.161535] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 9d00
[  179.162789] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.165671] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 9d00
[  179.171158] ehci-pci :00:1d.7: EHCI hardware bug detected:
02008d80 80008d00
[  179.177040] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00
[  179.181788] ehci-pci :00:1d.7: EHCI hardware bug detected:
82008d80 8d00

ICH5-SL7YC@ASRockP4i65G
00:1d.7 USB Controller [0c03]: Intel Corporation 82801EB/ER
(ICH5/ICH5R) USB2 EHCI Controller [8086:24dd] (rev 02) (prog-if 20
[EHCI])
Subsystem: ASRock Incorporation Device [1849:24d0]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 23
Region 0: Memory at ff27fc00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port: BAR=1 offset=00a0
Kernel driver in use: ehci-pci
00: 86 80 dd 24 06 01 90 02 02 20 03 0c 00 00 00 00
10: 00 fc 27 ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 49 18 d0 24
30: 00 00 00 00 50 00 00 00 00 00 00 00 0b 04 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 58 c2 c9 00 00 00 00 0a 00 a0 20 00 00 00 00
60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 c0
70: 00 00 d5 3f 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 

Re: Testing for hardware bug in EHCI controllers

2013-03-15 Thread Alan Stern
On Fri, 15 Mar 2013, Noone Nowhere wrote:

 Hello Alan again, we will clear things up. Sorry for the late reply,
 dark forces delayed us.
 
 
  Intel probably doesn't care, because they don't make the affected
  chipsets any more.  I don't know what AMD is currently making.  Do you?
 
 Yes we do. One of our ICH5 board uses ICH5 with SL7YC SL spec, meaning
 it is an embedded chipset with extended availability so they must care
 as it is paired with 865 and 875 series which are listed in
 http://ark.intel.com/#DesktopProducts-DesktopChipsets and
 http://ark.intel.com/products/27674/Intel-82801EB-IO-Controller for
 ICH5. For AMD read below.

Sarah, do you know how I can report this to the right person at Intel?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-15 Thread Alan Stern
On Fri, 15 Mar 2013, Noone Nowhere wrote:

 USB stick on 12 different EHCI USB hosts. On VT8235M it took ages to
 go to 1000. A human readable timestamp could be used.

Evidently the VT8235M is very slow for some of the operations used by 
the test program.

 To sum up printing a start up message to leave it run up to 1000 is
 necessary

It is not necessary.  I mentioned this in the original email where the 
program was first distributed.

 as well as a warning to stop it using control+Z in avoid to
 avoid dmesg spamming.

Control-Z will not kill the program; the best way to stop it is to 
unplug the test device.  Doing this does not cause any dmesg spamming 
(unless you have CONFIG_USB_DEBUG enabled, in which case the extra 
messages are appropriate).

 Timestamps is another improvement

Why does anybody care about timestamps?

 and fixing the
 block count issue is the last one as on 2 controllers (moschip and
 SB850) the program exited immediately after initialization.

I can't fix the block count issue.  That's not a bug in the program; 
it's a bug somewhere else.

 First we
 must deal with those 2 controllers and check all hosts with the new
 version again. Since you prefer the logs with the current program, no
 problem. You will receive a message after this one with all the logs
 to keep you happy. Despite that, notice that when the program will be
 able to run on all hosts, we shall retest and resend the results. Are
 there any performance or reliability issues from this bug except the
 original DVB card detection problem? Will the workaround affect
 performance?

This EHCI bug is not related to DVB card detection.  I don't know what
original problem you're referring to.

A workaround for the bug has already been submitted to the various
stable kernel series.  The workaround may decrease performance slightly
for things like large data transfers to/from a mass-storage device.  I
haven't tried to measure the effect, and it will be different on
different platforms anyway.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-15 Thread Alan Stern
On Fri, 15 Mar 2013, Noone Nowhere wrote:

 Hello ,here are the logs from the failing EHCI hosts(chipset-SL
 Spec/revision@mainboard-system):
 
 
 ICH4-SL66K@FIC VC37
 00:1d.7 USB Controller [0c03]: Intel Corporation 82801DB/DBM
 (ICH4/ICH4-M) USB2 EHCI Controller [8086:24cd] (rev 01) (prog-if 20
 [EHCI])
   Subsystem: FIRST INTERNATIONAL Computer Inc Device [1509:9016]

 ICH5-SL7YC@ASRockP4i65G
 00:1d.7 USB Controller [0c03]: Intel Corporation 82801EB/ER
 (ICH5/ICH5R) USB2 EHCI Controller [8086:24dd] (rev 02) (prog-if 20
 [EHCI])
   Subsystem: ASRock Incorporation Device [1849:24d0]

 ICH5R-SL724@ASUS P4P800 SE
 00:1d.7 USB Controller [0c03]: Intel Corporation 82801EB/ER
 (ICH5/ICH5R) USB2 EHCI Controller [8086:24dd] (rev 02) (prog-if 20
 [EHCI])
   Subsystem: ASUSTeK Computer Inc. P4P800/P5P800 series motherboard 
 [1043:80a6]

 ICH7-SL8FX@ASRock P4i945GC
 00:1d.7 USB Controller [0c03]: Intel Corporation N10/ICH 7 Family USB2
 EHCI Controller [8086:27cc] (rev 01) (prog-if 20 [EHCI])
   Subsystem: ASRock Incorporation Device [1849:27cc]

 8237S@ASRock P4VM900
 00:10.4 USB Controller [0c03]: VIA Technologies, Inc. USB 2.0
 [1106:3104] (rev 90) (prog-if 20 [EHCI])
   Subsystem: ASRock Incorporation K7VT6 motherboard [1849:3104]

 A50M FCH-A13@COMPAQ CQ57
 This one has 2 EHCI controllers so we attach both(in fact it has 3 but
 1 is disabled).
 00:12.2 USB controller [0c03]: Advanced Micro Devices [AMD] nee ATI
 SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396] (prog-if 20 [EHCI])
   Subsystem: Hewlett-Packard Company Device [103c:3577]

 00:16.2 USB controller [0c03]: Advanced Micro Devices [AMD] nee ATI
 SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396] (prog-if 20 [EHCI])
   Subsystem: Hewlett-Packard Company Device [103c:3577]

 This was the only host giving error message at the program:
 URB timed out; bug may be present
 Wrong URB completed

Evidently hardware from a large number of vendors is affected by this 
bug.  You have to wonder if they all got the intellectual property 
from a common source.

 We thought to use a NEC hub on our broken laptop in order to get the
 bug fixed by the hub but the results did not make us happy(stupid idea
 for stupid hardware). Additionally we have tested the program on two
 more different hubs. We haven't tested our 3.0 hub. Do you think there
 is any change to fall on a broken hub while running on a good host and
 make the error come to surface? Please mention if it has to be run on
 hubs. Here are the results:

Hubs will not make any difference.  There's no reason to use a hub.

 , hope you figure out what you need. You can now see the full picture,
 as said on the first message the subject must change because it is not
 intel specific. Intel though was consistent all their initial
 southbridges were broken while other vendors broke their good EHCI
 controller. Good luck with fixing, we can test any patch prior
 merging. Unfortunately one SiS 478 board died after POST and we
 couldn't test it so R.I.P until service!

Thank you very much for all the testing.  We may find that the
workaround isn't needed on certain chipsets, but we can't know which 
ones without testing them.

The workaround patch is here:

http://marc.info/?l=linux-usbm=136215301302639w=2

It will apply to a 3.8 kernel.  After applying that patch, you should
find that the EHCI bug doesn't show up.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-15 Thread Sarah Sharp
On Fri, Mar 15, 2013 at 11:08:12AM -0400, Alan Stern wrote:
 On Fri, 15 Mar 2013, Noone Nowhere wrote:
 
  Hello Alan again, we will clear things up. Sorry for the late reply,
  dark forces delayed us.
  
  
   Intel probably doesn't care, because they don't make the affected
   chipsets any more.  I don't know what AMD is currently making.  Do you?
  
  Yes we do. One of our ICH5 board uses ICH5 with SL7YC SL spec, meaning
  it is an embedded chipset with extended availability so they must care
  as it is paired with 865 and 875 series which are listed in
  http://ark.intel.com/#DesktopProducts-DesktopChipsets and
  http://ark.intel.com/products/27674/Intel-82801EB-IO-Controller for
  ICH5. For AMD read below.
 
 Sarah, do you know how I can report this to the right person at Intel?

I'll bubble up the results of this thread to our hardware team.  To the
original person who reported this (Noone?), can you privately send me
which company you work for?  Getting the name of an affected customer
will help expedite a fix for this.

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-12 Thread Alan Stern
On Mon, 11 Mar 2013, Noone Nowhere wrote:

 Hello Alan this is VoRTeX from the linux kernel ata wiki, greetings
 from the occupied Greece...Nick uses UNIX since 2003 (FreeBSD back
 then...5.2.1) and after all these years
 is an expert in Linux and in HDDs. In 2008 a repeated silent data
 corruption occurred at a specific USB2SATA enclosure that we still
 have. Three copies produced three different
 sums of large data image. It could be the libata, the kernel USB code,
 it could be the SB700 USB (from the quite troublesome ASUS M3A78 PRO)
 or WTF...
 
 We are monitoring the libata and USB list for some years and you are
 doing an amazing job and with respectable continuity and endurance,
 keep up the good work. You are granted access to ALL of our hardware
 but our time is limited.This means a lot of old x86 boards and all
 kinds of cards.
 
 Initial testing shows that the underlying bug is found in many
 chipsets. Once we are done with testing please notify Intel and AMD
 officially so that they update their southbridge specification updates
 and errata respectively...

Intel probably doesn't care, because they don't make the affected 
chipsets any more.  I don't know what AMD is currently making.  Do you?

 Unfortunately the program needs improvements. To begin with a single
 USB 2.0 128 MB! stick was used on all tests. VT8235M does not have the
 problem but errors were encountered prior detecting the stick. After
 some removals and insertions it worked but going from 100 to 1000 took
 considerably more time than on all other hosts. It might good to add
 version and time printing at 100, 200 and so on. Program execution on
 non broken hosts in unclear, for how many iterations should it run?

I don't know.  In my experience, 1000 iterations have been enough for 
the bug to show up.  But I have tested on only two computers.

 Intel ICH5 chipsets affected by the bug, start making a buzzing noise
 when the program runs! Also assuming that 3.8.2 has as1617 applied, we
 probably fell into another hardware bug while testing a moschip PCIe
 EHCI controller and a SB850, as the program aborts saying:
 Block count is too large
 Block count is too small: 513
 , meaning that something got shifted from one block to another?, go figure!
 
 Lastly on A50M dmesg prints EHCI hardware bug detected:... and the program:

What is A50M?

 URB timed out; bug may be present
 Wrong URB completed
 so why on intel hosts only dmesg proves bug presence and not the
 program? Did it need more time or something else is there?

The program does not detect the bug; it merely creates conditions where
the bug is likely to show up.  The kernel driver detects the bug, when
it occurs.

 Should you need more info, ask it in the mailing list and you shall
 receive it. If you want to improve and correct the program (as it
 cannot run everywhere as is)we suggest you to move it under a new
 subject on the USB list so that we are all running the latest and
 greatest. If you do not agree with the improvements then we will send
 you details on the failed southbridges including SL specs or revisions
 but only on systems that where able to run your program. We have also
 tested two out of our three USB 3.0 controllers as well.

What improvements do you suggest?  This wasn't clear from what you 
wrote above.  Just the increments by 100 instead of by 1000?

Yes, please do send information on the failed controllers.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-08 Thread Roger Quadros
On 03/04/2013 05:36 AM, Ming Lei wrote:
 On Tue, Feb 26, 2013 at 4:54 AM, Alan Stern st...@rowland.harvard.edu wrote:
 

 I'd be interested to hear the results of testing on a variety of
 controllers.  (This computer also has an NEC EHCI controller, and that
 one does not have the bug.)  Do the EHCI controllers on current Intel
 chipsets pass the test?  What about other vendors?
 
 No the 'EHCI hardware bug detected' error are observed on EHCI
 of OMAP4 pandaboard.


On OMAP4 panda I was observing the same URB timeout message as Clemens.
After applying Clemens' fix [1] to the test code I can't see the problems
on OMAP4 Panda anymore. So OMAP EHCI is not affected.

[1] http://marc.info/?l=linux-usbm=136226443502631w=2

Log below with Transcend Jetflash

root@omaphost:~# ehci-test /dev/bus/usb/001/004
[   34.412078] usb-storage 1-1.3:1.0: disconnect by usbfs
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600

cheers,
-roger

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-08 Thread Alan Stern
On Fri, 8 Mar 2013, Roger Quadros wrote:

 On 03/04/2013 05:36 AM, Ming Lei wrote:
  On Tue, Feb 26, 2013 at 4:54 AM, Alan Stern st...@rowland.harvard.edu 
  wrote:
  
 
  I'd be interested to hear the results of testing on a variety of
  controllers.  (This computer also has an NEC EHCI controller, and that
  one does not have the bug.)  Do the EHCI controllers on current Intel
  chipsets pass the test?  What about other vendors?
  
  No the 'EHCI hardware bug detected' error are observed on EHCI
  of OMAP4 pandaboard.
 
 
 On OMAP4 panda I was observing the same URB timeout message as Clemens.
 After applying Clemens' fix [1] to the test code I can't see the problems
 on OMAP4 Panda anymore. So OMAP EHCI is not affected.

The patch by Clemens fixes an error that gets detected by the device,
not by the host.  That's why the original, unpatched program worked
with some flash drives but not others -- not all drives are careful
about checking for errors.

 [1] http://marc.info/?l=linux-usbm=136226443502631w=2
 
 Log below with Transcend Jetflash
 
 root@omaphost:~# ehci-test /dev/bus/usb/001/004
 [   34.412078] usb-storage 1-1.3:1.0: disconnect by usbfs
 100
 200
 300
 400
 500
 600
 700
 800
 900
 1000
 1100
 1200
 1300
 1400
 1500
 1600
 
 cheers,
 -roger

Thanks for testing.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-06 Thread Bo Shen

Hi Matthieu,

On 3/5/2013 16:52, Matthieu CASTET wrote:


Do you know which vendor did the ehci IP that is in your atmel SOC.


I am sorry, I don't know the vendor of the ehci IP.

Best Regards,
Bo Shen
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-05 Thread Matthieu CASTET
Bo Shen a écrit :
 Hi Alan,
 
 On 3/4/2013 23:16, Alan Stern wrote:
 On Mon, 4 Mar 2013, Bo Shen wrote:

 Hi Alan,

 On 02/26/2013 04:54 AM, Alan Stern wrote:
 Sarah (and anyone else who's interested):

 A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
 controllers.  You pointed out that these are rather old components, not
 being used in current systems, which is quite true.
 I test this on Atmel at91sam9x5ek board with Linux-3.8. And get the
 similar information. So please indicate me more detail information about
 the bug. (Sorry for not catch the hardware bug e-mail)
 The problem is explained in more detail here:

  http://marc.info/?l=linux-usbm=135492071812265w=2

 Note that the test program itself requires a small fix, which was
 posted here:

  http://marc.info/?l=linux-usbm=136226443502631w=2

 
 Thanks.
 
 If you don't mind, I'd like to see the kernel log from your test run.
 
 The dmesg log information as following:
 
 root@at91sam9x5ek:~# ./ehci-test /dev/bus/usb/001/003
 atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
 atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 9d00



Do you know which vendor did the ehci IP that is in your atmel SOC.


Matthieu
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-05 Thread Alan Stern
On Tue, 5 Mar 2013, Lan Tianyu wrote:

 Ok apply it. Retest. No problem.
 
 The following log only takes place When I stop ehci-test or unplug
 device during test.
 [  140.871303] ehci-pci :00:1a.0: shutdown urb 88014545f0c0 ep1in-bulk
 [  140.878231] ehci-pci :00:1a.0: shutdown urb 88014545fcc0 ep1in-bulk
 [  140.885158] ehci-pci :00:1a.0: shutdown urb 88014545fb40 ep1in-bulk
 [  140.892088] ehci-pci :00:1a.0: shutdown urb 88014545f9c0 ep1in-bulk

Those are normal debugging messages.  It looks like Intel fixed the 
EHCI bug in the Sandybridge incarnation.  Thanks for testing.

Alan Stern


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Lan Tianyu
2013/2/26 Alan Stern st...@rowland.harvard.edu:
 Sarah (and anyone else who's interested):

 A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
 controllers.  You pointed out that these are rather old components, not
 being used in current systems, which is quite true.

 Now I have figured out a simple way for anyone to test for this bug in
 any EHCI controller, without the need for a g-zero gadget.  It's a
 two-part procedure:

 Apply the patch below (which is written for vanilla 3.8) and
 load the resulting driver.  The patch adds an explicit test
 to ehci-hcd for detecting the bug.

 Then plug in an ordinary USB flash drive and run the attached
 program (as root), giving it the device path for the flash
 drive as the single command-line argument.  For example:

 sudo ./ehci-test /dev/bus/usb/002/003

 The program won't do anything bad to the flash drive; it just reads the
 first 256 KB of data over and over again, now and then unlinking an URB
 to try and trigger the bug.  If the program works right, it will print
 out a loop counter every hundred iterations.  If it runs for 1000
 iterations with no error messages in the kernel log, you may consider
 that the controller has passed the test.  This should take under a
 minute, depending on the hardware speed.

 The program won't stop by itself unless something goes wrong.  You can
 kill it with ^C or more simply by unplugging the flash drive.  (If you
 want to be safe, make sure there are no mounted filesystems on the
 drive before running the test program.)

 If the hardware bug is detected, the kernel patch will print error
 messages to the system log.  For example, when I run the test on the
 Intel controller in this computer, I get:

 [  150.019441] usb-storage 3-8:1.0: disconnect by usbfs
 [  150.271190] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  150.591089] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  151.538560] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  151.857569] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  152.018886] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  152.179810] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  153.211804] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  153.374497] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  153.770443] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  154.247861] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
 8d00
 [  154.566912] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
 8d00
 [  155.359101] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  155.838132] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  156.791107] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  157.267620] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  159.252057] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  159.886048] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  160.206625] ehci-pci :00:1d.7: EHCI hardware bug detected: 02008d80 
 80008d00
 ...

 You get the idea.  The values in the two columns on the right are
 always supposed to be equal; when they aren't it indicates that the
 controller has done a DMA write at a time when ehci-hcd isn't expecting
 one to happen.

 I'd be interested to hear the results of testing on a variety of
 controllers.  (This computer also has an NEC EHCI controller, and that
 one does not have the bug.)  Do the EHCI controllers on current Intel
 chipsets pass the test?  What about other vendors?

 Thanks to all who try it out and report their results.
Test on the Sandybridge platform.
At the first time, I get following output. But after that, I was
hard to get any output. And test on the v3.8.

sudo ./ehci-test /dev/bus/usb/001/003
[  140.855342] usb-storage 1-1.2:1.0: disconnect by usbfs
Invalid URB stat[  140.863000] ehci-pci :00:1a.0: shutdown urb
88014545f300 ep1in-bulk
[  140.871303] ehci-pci :00:1a.0: shutdown urb 88014545f0c0 ep1in-bulk
[  140.878231] ehci-pci :00:1a.0: shutdown urb 88014545fcc0 ep1in-bulk
[  140.885158] ehci-pci :00:1a.0: shutdown urb 88014545fb40 ep1in-bulk
[  140.892088] ehci-pci :00:1a.0: shutdown urb 88014545f9c0 ep1in-bulk
[  140.899015] ehci-pci :00:1a.0: shutdown urb 88014545f780 ep1in-bulk
[  140.905941] ehci-pci :00:1a.0: shutdown urb 88014545f240 ep1in-bulk
[  140.912870] ehci-pci :00:1a.0: shutdown urb 88014545f900 ep1in-bulk
[  140.919799] ehci-pci :00:1a.0: shutdown urb 88014545fc00 ep1in-bulk
[  140.926725] ehci-pci :00:1a.0: shutdown urb 

Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Alan Stern
On Mon, 4 Mar 2013, Ming Lei wrote:

 On Tue, Feb 26, 2013 at 4:54 AM, Alan Stern st...@rowland.harvard.edu wrote:
 
 
  I'd be interested to hear the results of testing on a variety of
  controllers.  (This computer also has an NEC EHCI controller, and that
  one does not have the bug.)  Do the EHCI controllers on current Intel
  chipsets pass the test?  What about other vendors?
 
 No the 'EHCI hardware bug detected' error are observed on EHCI
 of OMAP4 pandaboard.

Good; thank you for testing.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Alan Stern
On Mon, 4 Mar 2013, Bo Shen wrote:

 Hi Alan,
 
 On 02/26/2013 04:54 AM, Alan Stern wrote:
  Sarah (and anyone else who's interested):
 
  A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
  controllers.  You pointed out that these are rather old components, not
  being used in current systems, which is quite true.
 
 I test this on Atmel at91sam9x5ek board with Linux-3.8. And get the 
 similar information. So please indicate me more detail information about 
 the bug. (Sorry for not catch the hardware bug e-mail)

The problem is explained in more detail here:

http://marc.info/?l=linux-usbm=135492071812265w=2

Note that the test program itself requires a small fix, which was 
posted here:

http://marc.info/?l=linux-usbm=136226443502631w=2

If you don't mind, I'd like to see the kernel log from your test run.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Alan Stern
On Mon, 4 Mar 2013, Lan Tianyu wrote:

 Test on the Sandybridge platform.
 At the first time, I get following output. But after that, I was
 hard to get any output. And test on the v3.8.

You have to unplug the flash drive after running the test each time. 
By the way, be sure to apply Clemens Ladisch's fix to the test program:

http://marc.info/?l=linux-usbm=136226443502631w=2

 sudo ./ehci-test /dev/bus/usb/001/003
 [  140.855342] usb-storage 1-1.2:1.0: disconnect by usbfs
 Invalid URB stat[  140.863000] ehci-pci :00:1a.0: shutdown urb
 88014545f300 ep1in-bulk
 [  140.871303] ehci-pci :00:1a.0: shutdown urb 88014545f0c0 ep1in-bulk
 [  140.878231] ehci-pci :00:1a.0: shutdown urb 88014545fcc0 ep1in-bulk
 [  140.885158] ehci-pci :00:1a.0: shutdown urb 88014545fb40 ep1in-bulk
 [  140.892088] ehci-pci :00:1a.0: shutdown urb 88014545f9c0 ep1in-bulk
 [  140.899015] ehci-pci :00:1a.0: shutdown urb 88014545f780 ep1in-bulk
 [  140.905941] ehci-pci :00:1a.0: shutdown urb 88014545f240 ep1in-bulk
 [  140.912870] ehci-pci :00:1a.0: shutdown urb 88014545f900 ep1in-bulk
 [  140.919799] ehci-pci :00:1a.0: shutdown urb 88014545fc00 ep1in-bulk
 [  140.926725] ehci-pci :00:1a.0: shutdown urb 88014545f540 ep1in-bulk
 [  140.933655] ehci-pci :00:1a.0: shutdown urb 88014545f3c0 ep1in-bulk
 [  140.940583] ehci-pci :00:1a.0: shutdown urb 88014545fd80 ep1in-bulk
 [  140.947512] ehci-pci :00:1a.0: shutdown urb 88014545f600 ep1in-bulk
 [  140.954440] ehci-pci :00:1a.0: shutdown urb 88014545f180 ep1in-bulk
 [  140.961368] ehci-pci :00:1a.0: shutdown urb 88014545f000 ep1in-bulk
 [  140.968297] ehci-pci :00:1a.0: shutdown urb 88014545fa80 ep1in-bulk
 [  140.975223] ehci-pci :00:1a.0: shutdown urb 88014545f840 ep1in-bulk
 us -32, act len [  140.982151] ehci-pci :00:1a.0: shutdown urb
 88014545fe40 ep1in-bulk
 [  140.990459] ehci-pci :00:1a.0: shutdown urb 88014545ff00 ep1in-bulk
 [  140.997388] ehci-pci :00:1a.0: shutdown urb 880145f08000 ep1in-bulk
 [  141.004316] ehci-pci :00:1a.0: shutdown urb 880145f080c0 ep1in-bulk
 [  141.011245] ehci-pci :00:1a.0: shutdown urb 880145f08180 ep1in-bulk

The fix to the test program ought to help with this.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Frank Schäfer
Am 25.02.2013 21:54, schrieb Alan Stern:
 Sarah (and anyone else who's interested):

 A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
 controllers.  You pointed out that these are rather old components, not 
 being used in current systems, which is quite true.

 Now I have figured out a simple way for anyone to test for this bug in
 any EHCI controller, without the need for a g-zero gadget.  It's a
 two-part procedure:

   Apply the patch below (which is written for vanilla 3.8) and
   load the resulting driver.  The patch adds an explicit test
   to ehci-hcd for detecting the bug.

   Then plug in an ordinary USB flash drive and run the attached
   program (as root), giving it the device path for the flash
   drive as the single command-line argument.  For example:

   sudo ./ehci-test /dev/bus/usb/002/003

 The program won't do anything bad to the flash drive; it just reads the
 first 256 KB of data over and over again, now and then unlinking an URB
 to try and trigger the bug.  If the program works right, it will print
 out a loop counter every hundred iterations.  If it runs for 1000
 iterations with no error messages in the kernel log, you may consider
 that the controller has passed the test.  This should take under a
 minute, depending on the hardware speed.

 The program won't stop by itself unless something goes wrong.  You can
 kill it with ^C or more simply by unplugging the flash drive.  (If you
 want to be safe, make sure there are no mounted filesystems on the
 drive before running the test program.)

 If the hardware bug is detected, the kernel patch will print error
 messages to the system log.  For example, when I run the test on the
 Intel controller in this computer, I get:

 [  150.019441] usb-storage 3-8:1.0: disconnect by usbfs
 [  150.271190] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  150.591089] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  151.538560] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  151.857569] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  152.018886] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  152.179810] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  153.211804] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  153.374497] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  153.770443] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  154.247861] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
 8d00
 [  154.566912] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
 8d00
 [  155.359101] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  155.838132] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  156.791107] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  157.267620] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
 80008d00
 [  159.252057] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  159.886048] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
 8d00
 [  160.206625] ehci-pci :00:1d.7: EHCI hardware bug detected: 02008d80 
 80008d00
 ...

 You get the idea.  The values in the two columns on the right are 
 always supposed to be equal; when they aren't it indicates that the 
 controller has done a DMA write at a time when ehci-hcd isn't expecting 
 one to happen.

 I'd be interested to hear the results of testing on a variety of 
 controllers.  (This computer also has an NEC EHCI controller, and that 
 one does not have the bug.)  Do the EHCI controllers on current Intel 
 chipsets pass the test?  What about other vendors?

 Thanks to all who try it out and report their results.

 Alan Stern

Here is the result of your test procedure (fix applied, running kernel
3.9-rc1) for the following device:

00:02.1 USB controller [0c03]: NVIDIA Corporation MCP61 USB 2.0
Controller [10de:03f2] (rev a2) (prog-if 20 [EHCI])
Subsystem: ASUSTeK Computer Inc. Device [1043:8234]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin B routed to IRQ 22
Region 0: Memory at fe02e000 (32-bit, non-prefetchable) [size=256]
Capabilities: [44] Debug port: BAR=1 offset=0098
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: ehci-pci


= dmesg output:
[  207.965961] ehci-pci 

Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Alan Stern
On Mon, 4 Mar 2013, [ISO-8859-1] Frank Sch�fer wrote:

 Here is the result of your test procedure (fix applied, running kernel
 3.9-rc1) for the following device:
 
 00:02.1 USB controller [0c03]: NVIDIA Corporation MCP61 USB 2.0
 Controller [10de:03f2] (rev a2) (prog-if 20 [EHCI])
 Subsystem: ASUSTeK Computer Inc. Device [1043:8234]

 = dmesg output:
 [  207.965961] ehci-pci :00:02.1: EHCI hardware bug detected:
 80008d00 8d00
 [  208.020904] ehci-pci :00:02.1: EHCI hardware bug detected:
 82008d80 8d00
 [  208.198698] ehci-pci :00:02.1: EHCI hardware bug detected:
 80008d00 9d00
 [  208.201699] ehci-pci :00:02.1: EHCI hardware bug detected:
 82008d80 9d00

So NVIDIA hardware has the same bug as Intel.  That's good to know; I 
was starting to think no other vendors would be affected.  Thanks.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-04 Thread Bo Shen

Hi Alan,

On 3/4/2013 23:16, Alan Stern wrote:

On Mon, 4 Mar 2013, Bo Shen wrote:


Hi Alan,

On 02/26/2013 04:54 AM, Alan Stern wrote:

Sarah (and anyone else who's interested):

A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
controllers.  You pointed out that these are rather old components, not
being used in current systems, which is quite true.


I test this on Atmel at91sam9x5ek board with Linux-3.8. And get the
similar information. So please indicate me more detail information about
the bug. (Sorry for not catch the hardware bug e-mail)


The problem is explained in more detail here:

http://marc.info/?l=linux-usbm=135492071812265w=2

Note that the test program itself requires a small fix, which was
posted here:

http://marc.info/?l=linux-usbm=136226443502631w=2



Thanks.


If you don't mind, I'd like to see the kernel log from your test run.


The dmesg log information as following:

root@at91sam9x5ek:~# ./ehci-test /dev/bus/usb/001/003
atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 8d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 02008d80 80008d00
atmel-ehci 70.ehci: EHCI hardware bug detected: 82008d80 9d00

Best Regards,
Bo Shen
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-03 Thread Alan Stern
On Sat, 2 Mar 2013, Clemens Ladisch wrote:

  At any rate, if you're interested in finding out what the problem might
  be, the best place to start would be with a usbmon trace.
 
 ehci-test:
 8801e96c06c0 2335244662 S Bo:1:018:1 -115 31 = 55534243 6500 0400 
 0a28  0200  00
 usb-storage:
 88021037c480 3386301775 S Bo:1:021:1 -115 31 = 55534243 5400 0010 
 8a28 ef4f 3e08  00
 
 With the patch below, it runs just fine.
 (And doesn't find EHCI bugs.)

Once again, usbmon proves its worth.  Thanks for the fix; it is 
obviously correct.

 --- ehci-test.c
 +++ ehci-test.c
 @@ -192,7 +192,7 @@ int send_READ10(void)
   (NUM_BLOCKS  1)  0xff,
   (NUM_BLOCKS  7)  0xff,
   (NUM_BLOCKS  15)  0xff,
 - 0, 0, 10,   /* Flags, LUN, Length of CDB */
 + 0x80, 0, 10,/* Flags, LUN, Length of CDB */
   0x28, 0,/* CDB: READ(10), LUN 0 */
   0, 0, 0, 0, /* LBA = 0  */
   0,  /* Reserved */

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-03 Thread Ming Lei
On Tue, Feb 26, 2013 at 4:54 AM, Alan Stern st...@rowland.harvard.edu wrote:


 I'd be interested to hear the results of testing on a variety of
 controllers.  (This computer also has an NEC EHCI controller, and that
 one does not have the bug.)  Do the EHCI controllers on current Intel
 chipsets pass the test?  What about other vendors?

No the 'EHCI hardware bug detected' error are observed on EHCI
of OMAP4 pandaboard.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-03 Thread Bo Shen

Hi Alan,

On 02/26/2013 04:54 AM, Alan Stern wrote:

Sarah (and anyone else who's interested):

A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
controllers.  You pointed out that these are rather old components, not
being used in current systems, which is quite true.


I test this on Atmel at91sam9x5ek board with Linux-3.8. And get the 
similar information. So please indicate me more detail information about 
the bug. (Sorry for not catch the hardware bug e-mail)


Thanks.

Best Regards,
Bo Shen
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-02 Thread Alan Stern
On Sat, 2 Mar 2013, Clemens Ladisch wrote:

 Alan Stern wrote:
  Then plug in an ordinary USB flash drive and run the attached
  program (as root), giving it the device path for the flash
  drive as the single command-line argument.  For example:
 
  sudo ./ehci-test /dev/bus/usb/002/003
 
  The program won't do anything bad to the flash drive;
 
 All my flash drives beg to differ.
 
 Transcend JetFlash:
 
 ./ehci-test /dev/bus/usb/001/007
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 ^C
 ./ehci-test /dev/bus/usb/001/007
 Unable to send TEST UNIT READY: Broken pipe
 ./ehci-test /dev/bus/usb/001/007
 Unable to send TEST UNIT READY: Broken pipe

This is normal.  You didn't unplug the flash drive between tests.  The 
test program leaves the drive in an intermediate state; it has to be 
reset before it will work again.  Unplugging it is the easiest way to 
reset it.

 Nexus S:
 
 ./ehci-test /dev/bus/usb/002/006
 TEST UNIT READY status 1

That's odd.  Perhaps the Nexus S doesn't initialize the flash drive
when it is plugged in.  Normally this initialization happens when
usb-storage binds to the flash drive, but maybe the Nexus S doesn't
have usb-storage.

 ./ehci-test /dev/bus/usb/002/006
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 ^C
 ./ehci-test /dev/bus/usb/002/006
 TEST UNIT READY status 2
 ./ehci-test /dev/bus/usb/002/006
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 ^C

Once again you didn't unplug the flash drive between tests.

 some Genesys card reader:

On what system?  The Nexus S?

 ./ehci-test /dev/bus/usb/001/010
 TEST UNIT READY status 1
 ./ehci-test /dev/bus/usb/001/010
 TEST UNIT READY status 1
 
 (and, of course, nothing in the log)

I stand by my original statement.  If you unplug the flash drive after 
a test and then plug it back in, you will find it back in its original 
condition.  No harm done and no data erased.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-02 Thread Clemens Ladisch
Alan Stern wrote:
 On Sat, 2 Mar 2013, Clemens Ladisch wrote:
 Alan Stern wrote:
 Then plug in an ordinary USB flash drive and run the attached
 program (as root), giving it the device path for the flash
 drive as the single command-line argument.  For example:

 sudo ./ehci-test /dev/bus/usb/002/003

 The program won't do anything bad to the flash drive;

 All my flash drives beg to differ.

 Transcend JetFlash:

 ./ehci-test /dev/bus/usb/001/007
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 URB timed out; bug may be present
 ^C
 ./ehci-test /dev/bus/usb/001/007
 Unable to send TEST UNIT READY: Broken pipe

 This is normal.  You didn't unplug the flash drive between tests.

 Nexus S:

 ./ehci-test /dev/bus/usb/002/006
 TEST UNIT READY status 1

 That's odd.  Perhaps the Nexus S doesn't initialize the flash drive
 when it is plugged in.

In this case, the Nexus S *is* the flash drive.

 some Genesys card reader:

 On what system?

All three devices on the same system with an AMD SB710 controller.

 (and, of course, nothing in the log)

 I stand by my original statement.

I didn't want to imply any damage.  But the timed out messages come
regularly every two seconds, which appears to indicate that the test
program doesn't manage to do what it wants to do.

Are these messages normal?


Regards,
Clemens
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-02 Thread Bjørn Mork
Alan Stern st...@rowland.harvard.edu writes:

 A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
 controllers.  You pointed out that these are rather old components, not 
 being used in current systems, which is quite true.

Just tested on my (almost as old) laptop with Intel ICH9 EHCI
controllers (GM45 chipset):

 bjorn@nemi:/tmp$ lspci -nns 00:1d.7
 00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 
EHCI Controller #1 [8086:293a] (rev 03)

and it was a success:

[59091.240771] driver: '4-1:1.0': driver_bound: bound to device 'usbfs'
[59091.339847] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.470486] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.535738] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.559874] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.567118] ehci-pci :00:1d.7: EHCI hardware bug detected: 02008d80 
80008d00
[59091.588250] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.590246] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.597376] ehci-pci :00:1d.7: EHCI hardware bug detected: 02008d80 
80008d00
[59091.604376] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.617502] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.619749] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.632002] ehci-pci :00:1d.7: EHCI hardware bug detected: 02008d80 
80008d00
[59091.649004] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[59091.659503] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00

So I guess this bug is common, at least in older Intel chipsets.


Bjørn
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-02 Thread Alan Stern
On Sat, 2 Mar 2013, Clemens Ladisch wrote:

  Transcend JetFlash:
 
  ./ehci-test /dev/bus/usb/001/007
  URB timed out; bug may be present
  URB timed out; bug may be present
  URB timed out; bug may be present
  URB timed out; bug may be present

...

 All three devices on the same system with an AMD SB710 controller.
 
  (and, of course, nothing in the log)
 
  I stand by my original statement.
 
 I didn't want to imply any damage.  But the timed out messages come
 regularly every two seconds, which appears to indicate that the test
 program doesn't manage to do what it wants to do.

Possibly.  The program uses a 2-second timeout, which explains the 
regular timing of the messages.

The timeout means that _something_ is wrong.  It could be the bug I was
testing for, but more likely it is something else.

 Are these messages normal?

Well, let's say that they are normal when something goes wrong.  :-)

With a well-behaved host and a well-behaved flash drive, nothing should
go wrong.  But it's not all that surprising for a problem to arise.  I
wrote the test program for one specific purpose, so it doesn't have
lots of diagnostic or recovery code.  Maybe you're hitting a different 
bug, one that affects AMD SB* controllers.

At any rate, if you're interested in finding out what the problem might
be, the best place to start would be with a usbmon trace.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-02 Thread Alan Stern
On Sat, 2 Mar 2013, Bjørn Mork wrote:

 Alan Stern st...@rowland.harvard.edu writes:
 
  A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
  controllers.  You pointed out that these are rather old components, not 
  being used in current systems, which is quite true.
 
 Just tested on my (almost as old) laptop with Intel ICH9 EHCI
 controllers (GM45 chipset):
 
  bjorn@nemi:/tmp$ lspci -nns 00:1d.7
  00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 
 EHCI Controller #1 [8086:293a] (rev 03)
 
 and it was a success:
 
 [59091.240771] driver: '4-1:1.0': driver_bound: bound to device 'usbfs'
 [59091.339847] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
 8d00
 [59091.470486] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
 8d00
...

 So I guess this bug is common, at least in older Intel chipsets.

Yes.  Not too surprising, I guess.  Sarah mentioned that the EHCI cores 
were redesigned for the next generation of chipsets after ICH*.  Maybe 
the bug was fixed then.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Testing for hardware bug in EHCI controllers

2013-03-02 Thread Clemens Ladisch
Alan Stern wrote:
 On Sat, 2 Mar 2013, Clemens Ladisch wrote:
 But the timed out messages come regularly every two seconds, which
 appears to indicate that the test program doesn't manage to do what it
 wants to do.

 Possibly.  The program uses a 2-second timeout, which explains the
 regular timing of the messages.

 The timeout means that _something_ is wrong.  It could be the bug I was
 testing for, but more likely it is something else.

 Are these messages normal?

 Well, let's say that they are normal when something goes wrong.  :-)

 With a well-behaved host and a well-behaved flash drive, nothing should
 go wrong.  But it's not all that surprising for a problem to arise.  I
 wrote the test program for one specific purpose, so it doesn't have
 lots of diagnostic or recovery code.  Maybe you're hitting a different
 bug, one that affects AMD SB* controllers.

 At any rate, if you're interested in finding out what the problem might
 be, the best place to start would be with a usbmon trace.

ehci-test:
8801e96c06c0 2335244662 S Bo:1:018:1 -115 31 = 55534243 6500 0400 
0a28  0200  00
usb-storage:
88021037c480 3386301775 S Bo:1:021:1 -115 31 = 55534243 5400 0010 
8a28 ef4f 3e08  00

With the patch below, it runs just fine.
(And doesn't find EHCI bugs.)


Regards,
Clemens


--- ehci-test.c
+++ ehci-test.c
@@ -192,7 +192,7 @@ int send_READ10(void)
(NUM_BLOCKS  1)  0xff,
(NUM_BLOCKS  7)  0xff,
(NUM_BLOCKS  15)  0xff,
-   0, 0, 10,   /* Flags, LUN, Length of CDB */
+   0x80, 0, 10,/* Flags, LUN, Length of CDB */
0x28, 0,/* CDB: READ(10), LUN 0 */
0, 0, 0, 0, /* LBA = 0  */
0,  /* Reserved */
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Testing for hardware bug in EHCI controllers

2013-02-25 Thread Alan Stern
Sarah (and anyone else who's interested):

A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
controllers.  You pointed out that these are rather old components, not 
being used in current systems, which is quite true.

Now I have figured out a simple way for anyone to test for this bug in
any EHCI controller, without the need for a g-zero gadget.  It's a
two-part procedure:

Apply the patch below (which is written for vanilla 3.8) and
load the resulting driver.  The patch adds an explicit test
to ehci-hcd for detecting the bug.

Then plug in an ordinary USB flash drive and run the attached
program (as root), giving it the device path for the flash
drive as the single command-line argument.  For example:

sudo ./ehci-test /dev/bus/usb/002/003

The program won't do anything bad to the flash drive; it just reads the
first 256 KB of data over and over again, now and then unlinking an URB
to try and trigger the bug.  If the program works right, it will print
out a loop counter every hundred iterations.  If it runs for 1000
iterations with no error messages in the kernel log, you may consider
that the controller has passed the test.  This should take under a
minute, depending on the hardware speed.

The program won't stop by itself unless something goes wrong.  You can
kill it with ^C or more simply by unplugging the flash drive.  (If you
want to be safe, make sure there are no mounted filesystems on the
drive before running the test program.)

If the hardware bug is detected, the kernel patch will print error
messages to the system log.  For example, when I run the test on the
Intel controller in this computer, I get:

[  150.019441] usb-storage 3-8:1.0: disconnect by usbfs
[  150.271190] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  150.591089] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  151.538560] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  151.857569] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  152.018886] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  152.179810] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
8d00
[  153.211804] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  153.374497] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  153.770443] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
8d00
[  154.247861] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[  154.566912] ehci-pci :00:1d.7: EHCI hardware bug detected: 82008d80 
8d00
[  155.359101] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  155.838132] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  156.791107] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
8d00
[  157.267620] ehci-pci :00:1d.7: EHCI hardware bug detected: 8d00 
80008d00
[  159.252057] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
8d00
[  159.886048] ehci-pci :00:1d.7: EHCI hardware bug detected: 80008d00 
8d00
[  160.206625] ehci-pci :00:1d.7: EHCI hardware bug detected: 02008d80 
80008d00
...

You get the idea.  The values in the two columns on the right are 
always supposed to be equal; when they aren't it indicates that the 
controller has done a DMA write at a time when ehci-hcd isn't expecting 
one to happen.

I'd be interested to hear the results of testing on a variety of 
controllers.  (This computer also has an NEC EHCI controller, and that 
one does not have the bug.)  Do the EHCI controllers on current Intel 
chipsets pass the test?  What about other vendors?

Thanks to all who try it out and report their results.

Alan Stern




Index: usb-3.8/drivers/usb/host/ehci-q.c
===
--- usb-3.8.orig/drivers/usb/host/ehci-q.c
+++ usb-3.8/drivers/usb/host/ehci-q.c
@@ -547,7 +547,7 @@ qh_completions (struct ehci_hcd *ehci, s
if (stopped != 0 || hw-hw_qtd_next == EHCI_LIST_END(ehci)) {
switch (state) {
case QH_STATE_IDLE:
-   qh_refresh(ehci, qh);
+// qh_refresh(ehci, qh);
break;
case QH_STATE_LINKED:
/* We won't refresh a QH that's linked (after the HC
@@ -1232,6 +1232,7 @@ static void start_iaa_cycle(struct ehci_
 static void end_unlink_async(struct ehci_hcd *ehci)
 {
struct ehci_qh  *qh;
+   __hc32  tok1, tok2;
 
if (ehci-has_synopsys_hc_bug)
ehci_writel(ehci, (u32) ehci-async-qh_dma,
@@ -1242,6 +1243,7 @@ static void end_unlink_async(struct ehci
ehci-async_unlinking = true;
while (ehci-async_iaa) {
qh = ehci-async_iaa;
+   tok1 = 

Re: Testing for hardware bug in EHCI controllers

2013-02-25 Thread Sarah Sharp
On Mon, Feb 25, 2013 at 03:54:22PM -0500, Alan Stern wrote:
 Sarah (and anyone else who's interested):
 
 A while ago I wrote about a hardware bug in my Intel ICH5 and ICH8 EHCI
 controllers.  You pointed out that these are rather old components, not 
 being used in current systems, which is quite true.
 
 Now I have figured out a simple way for anyone to test for this bug in
 any EHCI controller, without the need for a g-zero gadget.  It's a
 two-part procedure:

Thanks Alan!  I'll try this on the Ivy Bridge and Haswell EHCI hosts.

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html