Possible EHCI bugs

2003-06-15 Thread Bill Paul

I was recently contacted by an individual at Transmeta who was trying
to use FreeBSD current with a board containing an EHCI USB controller
and encountered some problems with it. He original intent was to use
FreeBSD's USB 2.0 support and the if_axe driver to help debug a problem
with said hardware combination with another OS which shall remain
nameless. Along the way, he discovered the following:

- The USB_ATTACH() routine in if_axe.c would lead to a panic because
  uaa-iface was NULL. I consider this a bit peculiar because a) with
  my own test setup (my laptop, with UHCI controller), uaa-iface is
  always populated, and b) uaa-iface is set to something during
  the USB_PROBE() routine (it must be, otherwise the probe would fail).
  I worked around this by grabbing the interface handle using
  usbd_device2interface_handle(), but having the EHCI driver behave
  inconsistently with respect to the UHCI and OHCI driver seems a
  bit counterintuitive.

- The system panics under load. In this case, the load was induced
  by running bonnie on an NFS filesystem mount over the axe0 interface.
  Below is the console output with stack trace:

stray irq 7
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #3: Fri Jun 13 00:40:59 PDT 2003
 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/EHCI
Preloaded elf kernel /boot/kernel/kernel at 0xc0722000.
Preloaded elf module /boot/kernel/acpi.ko at 0xc0722294.
Timecounter i8254  frequency 1193182 Hz
Timecounter TSC  frequency 800032593 Hz
CPU: Transmeta Proprietary/Confidential-NDA Required (800.03-MHz 686-class CPU)
   Origin = GenuineTMx86  Id = 0xf24
real memory  = 233766912 (222 MB)
avail memory = 219463680 (209 MB)
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: PTLTDRSDT   on motherboard
pcibios: BIOS version 2.10
Using $PIR table, 12 entries at 0xc00fdf00
 ACPI-1287: *** Error: Method execution failed [\_SB_.PCI0.AC__._STA] (Node 
0xc227d4c0), AE_AML_REGION_LIMIT
 ACPI-0175: *** Error: Method execution failed [\_SB_.PCI0.AC__._STA] (Node 
0xc227d4c0), AE_AML_REGION_LIMIT
 ACPI-1287: *** Error: Method execution failed [\_SB_.PCI0.BATT._STA] (Node 
0xc227d400), AE_AML_REGION_LIMIT
 ACPI-0175: *** Error: Method execution failed [\_SB_.PCI0.BATT._STA] (Node 
0xc227d400), AE_AML_REGION_LIMIT
acpi0: power button is handled as a fixed feature programming model.
acpi0: sleep button is handled as a fixed feature programming model.
Timecounter ACPI-safe  frequency 3579545 Hz
 ACPI-1287: *** Error: Method execution failed [\_SB_.PCI0.AC__._STA] (Node 
0xc227d4c0), AE_AML_REGION_LIMIT
 ACPI-0175: *** Error: Method execution failed [\_SB_.PCI0.AC__._STA] (Node 
0xc227d4c0), AE_AML_REGION_LIMIT
 ACPI-1287: *** Error: Method execution failed [\_SB_.PCI0.BATT._STA] (Node 
0xc227d400), AE_AML_REGION_LIMIT
 ACPI-0175: *** Error: Method execution failed [\_SB_.PCI0.BATT._STA] (Node 
0xc227d400), AE_AML_REGION_LIMIT
acpi_timer0: 32-bit timer at 3.579545MHz port 0x8008-0x800b on acpi0
acpi_cpu0: CPU on acpi0
acpi_tz0: thermal zone on acpi0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib0: slot 4 INTA is routed to irq 11
pcib0: slot 14 INTB is routed to irq 10
pcib0: slot 15 INTA is routed to irq 11
pcib0: slot 15 INTB is routed to irq 10
pcib0: slot 15 INTD is routed to irq 7
pcib1: ACPI PCI-PCI bridge at device 1.0 on pci0
pci1: ACPI PCI bus on pcib1
pcib1: slot 0 INTA is routed to irq 11
pci1: display, VGA at device 0.0 (no driver attached)
pcib2: ACPI PCI-PCI bridge at device 2.0 on pci0
pci2: ACPI PCI bus on pcib2
isab0: PCI-ISA bridge at device 3.0 on pci0
isa0: ISA bus on isab0
pci0: bridge, PCI-unknown at device 3.1 (no driver attached)
pci0: multimedia, audio at device 4.0 (no driver attached)
atapci0: AcerLabs Aladdin UDMA100 controller port 
0x80a0-0x80af,0x374-0x377,0x170-0x17f,0x3f4-0x3f7,0x1f0-0x1ff at device 14.0 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: network, ethernet at device 14.1 (no driver attached)
ohci0: AcerLabs M5237 (Aladdin-V) USB controller mem 0xe-0xe0fff irq 11 at 
device 15.0 on pci0
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: AcerLabs M5237 (Aladdin-V) USB controller on ohci0
usb0: USB revision 1.0
uhub0: AcerLabs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub0: device problem, disabling port 2
ohci1: AcerLabs M5237 (Aladdin-V) USB controller mem 0xe8002000-0xe8002fff irq 10 at 
device 15.1 on pci0
usb1: OHCI version 1.0, legacy support
usb1: AcerLabs M5237 (Aladdin-V) USB controller on ohci1
usb1: USB revision 1.0
uhub1: AcerLabs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
ehci0: EHCI 

Re: Possible EHCI bugs

2003-06-15 Thread Bernd Walter
On Sun, Jun 15, 2003 at 03:59:48PM -0700, Bill Paul wrote:
 
 I was recently contacted by an individual at Transmeta who was trying
 to use FreeBSD current with a board containing an EHCI USB controller
 and encountered some problems with it. He original intent was to use
 FreeBSD's USB 2.0 support and the if_axe driver to help debug a problem
 with said hardware combination with another OS which shall remain
 nameless. Along the way, he discovered the following:
 
 - The USB_ATTACH() routine in if_axe.c would lead to a panic because
   uaa-iface was NULL. I consider this a bit peculiar because a) with
   my own test setup (my laptop, with UHCI controller), uaa-iface is
   always populated, and b) uaa-iface is set to something during
   the USB_PROBE() routine (it must be, otherwise the probe would fail).
   I worked around this by grabbing the interface handle using
   usbd_device2interface_handle(), but having the EHCI driver behave
   inconsistently with respect to the UHCI and OHCI driver seems a
   bit counterintuitive.
 
 - The system panics under load. In this case, the load was induced
   by running bonnie on an NFS filesystem mount over the axe0 interface.

USB_DEBUG with sysctl hw.usb.ehci.debug=1 during attach and shortly
befor the panic could help.
I hope it doesn't produce too much output during bonnie run.
The first case could be timing thing - some devices behave differently
on high speed than on full speed.

 Fri Jun 13 01:01:06 PDT 2003
 Jaxe0: read PHY failed
 axe0: read PHY failed
 axe0: read PHY failed
 axe0: read PHY failed

Were these over time or shortly befor the panic?

 Memory modified after free 0xc22e8310(12)
 panic: Most recently used by USB

 This seems to indicate something in the USB code is re-using a
 free()ed memory buffer. Unfortunately, I don't have this particular
 hardware available to me, and I don't know how much debugging support
 the individual at Transmeta will be able to offer. (He has his own
 problems.) Hopefully this will at least help spur some investigation.

Unfortunately there are many places where such problems could happen.
Without reviewing the complete ehci code it's difficult to get an idea
about what went wrong.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possible EHCI bugs

2003-06-15 Thread Bernd Walter
On Sun, Jun 15, 2003 at 03:59:48PM -0700, Bill Paul wrote:
 
 I was recently contacted by an individual at Transmeta who was trying
 to use FreeBSD current with a board containing an EHCI USB controller
 and encountered some problems with it. He original intent was to use
 FreeBSD's USB 2.0 support and the if_axe driver to help debug a problem
 with said hardware combination with another OS which shall remain
 nameless. Along the way, he discovered the following:
 
 - The USB_ATTACH() routine in if_axe.c would lead to a panic because
   uaa-iface was NULL. I consider this a bit peculiar because a) with
   my own test setup (my laptop, with UHCI controller), uaa-iface is
   always populated, and b) uaa-iface is set to something during
   the USB_PROBE() routine (it must be, otherwise the probe would fail).
   I worked around this by grabbing the interface handle using
   usbd_device2interface_handle(), but having the EHCI driver behave
   inconsistently with respect to the UHCI and OHCI driver seems a
   bit counterintuitive.

After the first quick reply now with a bit rethinking...

The parameter is handled with common code - no differences here.
This could possibly happen with every controller.
And we have a hardware problem with the OHCI controller.
As it's a companion controller sharing the mechanical port this could
result in problems.

 ohci0: AcerLabs M5237 (Aladdin-V) USB controller mem 0xe-0xe0fff irq 11 at 
 device 15.0 on pci0
 usb0: OHCI version 1.0, legacy support
 usb0: SMM does not respond, resetting

Workaround code responds here - I can't say if successfully, but I've
seen other negative reports with OHCI part of Acer chips.

 usb0: AcerLabs M5237 (Aladdin-V) USB controller on ohci0
 usb0: USB revision 1.0
 uhub0: AcerLabs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self powered
 uhub0: device problem, disabling port 2

The connected device failed!
I asume it's the axe device, which should get probed here, because
the ehci controller is not active yet.
USB_DEBUG should tell us more about the cause.
Note: we are strictly USB1.x at this time.

 ohci1: AcerLabs M5237 (Aladdin-V) USB controller mem 0xe8002000-0xe8002fff irq 10 
 at device 15.1 on pci0
 usb1: OHCI version 1.0, legacy support
 usb1: AcerLabs M5237 (Aladdin-V) USB controller on ohci1
 usb1: USB revision 1.0
 uhub1: AcerLabs OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub1: 2 ports with 2 removable, self powered

No problems reported with the second OHCI controller.

 ehci0: EHCI (generic) USB 2.0 controller mem 0xe8003400-0xe80034ff irq 7 at device 
 15.3 on pci0
 ehci_pci_attach: companion usb0
 ehci_pci_attach: companion usb1
 usb2: EHCI version 1.0
 usb2: companion controllers, 2 ports each: usb0 usb1
 usb2: EHCI (generic) USB 2.0 controller on ehci0
 usb2: USB revision 2.0
 uhub2: AcerLabs EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub2: 6 ports with 6 removable, self powered

6 Ports, but we have only 4 found by companion controllers!
Those numbers should be identic.
I'm missing PCI device 15:2!
Maybe that should be a third OHCI controller with the remaning 2 ports.
I'm not saying, that this is the reason for the reported problems, but
I can tell you for shure that EHCI depends on working companion
controllers.
After this mass of problems I could also easily imagine hardwarebugs in
the EHCI controller as well.

We should get the hardware working first, befor we start fixing
possibly symptoms in upper layers.

 This seems to indicate something in the USB code is re-using a
 free()ed memory buffer. Unfortunately, I don't have this particular
 hardware available to me, and I don't know how much debugging support
 the individual at Transmeta will be able to offer. (He has his own
 problems.) Hopefully this will at least help spur some investigation.

It would be good if theres a chance to retry this test with a NEC based
controller.
Currently I'm not shure if it's really a software bug.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possible EHCI bugs

2003-06-15 Thread Terry Lambert
Bernd Walter wrote:
 Note: we are strictly USB1.x at this time.

There was a recent PCI attach patch that I thought fixed this?

I know it hasn't been integrated yet (for God knows why), but
it seemed to fix the problems.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possible EHCI bugs

2003-06-15 Thread John-Mark Gurney
Terry Lambert wrote this message on Sun, Jun 15, 2003 at 19:40 -0700:
 Bernd Walter wrote:
  Note: we are strictly USB1.x at this time.
 
 There was a recent PCI attach patch that I thought fixed this?

Are you talking about the patch to check for multifunction devices?

 I know it hasn't been integrated yet (for God knows why), but
 it seemed to fix the problems.

If this is the patch you are talking about, I asked for x86 testing,
but I haven't heard any responses that it fixes (or even runs) on
x86 systems.

But also, I need to improve the sparc arch bit some too.  Jake has
some suggestions to make it a bit cleaner.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possible EHCI bugs

2003-06-15 Thread Bernd Walter
On Sun, Jun 15, 2003 at 07:40:21PM -0700, Terry Lambert wrote:
 Bernd Walter wrote:
  Note: we are strictly USB1.x at this time.
 
 There was a recent PCI attach patch that I thought fixed this?
 
 I know it hasn't been integrated yet (for God knows why), but
 it seemed to fix the problems.

The only one I remember was a pci_enable_busmaster thing, which was
required for cardbus and is already commited.
The symptoms were differently.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possible EHCI bugs

2003-06-15 Thread Terry Lambert
John-Mark Gurney wrote:
 Terry Lambert wrote this message on Sun, Jun 15, 2003 at 19:40 -0700:
  Bernd Walter wrote:
   Note: we are strictly USB1.x at this time.
 
  There was a recent PCI attach patch that I thought fixed this?
 
 Are you talking about the patch to check for multifunction devices?

Yes.  It got at least one USB 2.x device working.


  I know it hasn't been integrated yet (for God knows why), but
  it seemed to fix the problems.
 
 If this is the patch you are talking about, I asked for x86 testing,
 but I haven't heard any responses that it fixes (or even runs) on
 x86 systems.

Sorry, I really don't have any USB equipment.


 But also, I need to improve the sparc arch bit some too.  Jake has
 some suggestions to make it a bit cleaner.

Anything that works is better than anything that doesn't.  8-).

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possible EHCI bugs

2003-06-15 Thread John-Mark Gurney
Terry Lambert wrote this message on Sun, Jun 15, 2003 at 22:10 -0700:
 John-Mark Gurney wrote:
  Terry Lambert wrote this message on Sun, Jun 15, 2003 at 19:40 -0700:
   There was a recent PCI attach patch that I thought fixed this?
  
  Are you talking about the patch to check for multifunction devices?
 
 Yes.  It got at least one USB 2.x device working.
 
   I know it hasn't been integrated yet (for God knows why), but
   it seemed to fix the problems.
  
  If this is the patch you are talking about, I asked for x86 testing,
  but I haven't heard any responses that it fixes (or even runs) on
  x86 systems.
 
 Sorry, I really don't have any USB equipment.

Tis ok.  Hopefully someone else will speak up.

  But also, I need to improve the sparc arch bit some too.  Jake has
  some suggestions to make it a bit cleaner.
 
 Anything that works is better than anything that doesn't.  8-).

:)  True, but I don't want to catch a trap that shouldnh't be caught.

I'm working on it right now, so it should be added in the next day or
two.

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]