Re: [linux-usb-devel] EHCI Regression in 2.6.23-rc2

2007-08-13 Thread Stuart_Hayes
Daniel Exner wrote:
 Jiri Kosina wrote:
 On Fri, 10 Aug 2007, Daniel Exner wrote:
 Please CC me, as I'm currently not subscribed to this list, thx.
 
 Please also don't forget to CC relevant people/lists when reporting
 bugs, thanks.
 Guess its ok, now? Thanks anyway :)
 
 After some serious hangs with 2.6.23-rc2 I did some bisects and
 this was the result: 196705c9bbc03540429b0f7cf9ee35c2f928a534 is
 first bad commit commit 196705c9bbc03540429b0f7cf9ee35c2f928a534
 Author: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Date:   Thu May 3 08:58:49 2007 -0700
 
 I guess that the patch attached to bug 8535 in kernel.org bugzilla --
 http://bugzilla.kernel.org/attachment.cgi?id=12228action=view --
 solves your issues, right?
 Nope, this does _not_ fix my issue.
 
 Anything else I could try, or some files you need?
 I tried finding some clue in my logs, but without any results so far.
 
 Greetings
 Daniel Exner


It appears that the VIA controllers just ignore the inactivate bit
completely.

Normally, when I set the inactivate bit in the QH and then watch the
QH  overlay, I eventually see the controller clear the active bit in
the overlay token, and, of course, it doesn't do the transaction.

With the VIA controller I have, after I set the inactivate bit, I
eventually see the controller set bit 1 in the overlay token
(SplitXstate), indicating that it's running the transaction, and, a
couple microframes later, it clears that bit again.  The transaction is
not inactivated.

The problem occurs if a transaction completes when the inactivate bit
is set... qh_completions will ignore the transaction until the
inactivate bit is cleared, and then, when the transaction should be
re-activated, my patch will set the active bit back to 1 in the
overlay  qtd token, even though the transaction was already completed
by the controller...

To work around this, I'd have to re-write my patch so that it didn't
depend on the inactivate bit at all... I suppose it could possibly be
done just by directly manipulating the active bit in the overlay
token, since already the code doesn't mess with the overlay if there's
any chance that the transaction is alrady cached or in progress, but
that would be tricky.

Perhaps for now the best thing would just be to bypass the EHCI CPU
frequency notifier code (i.e., my patch) for VIA EHCI controllers, since
they are broken.  Would a hard-coded blacklist (just an if
(manufacturer==VIA)... type thing) be OK?

I've also acquired a card with an NEC EHCI controller on it, which I'm
going to look at while I'm into it...

Thanks
Stuart

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] EHCI Regression in 2.6.23-rc2

2007-08-10 Thread Stuart_Hayes
Jiri Kosina wrote:
 On Fri, 10 Aug 2007, Daniel Exner wrote:
 
 After some serious hangs with 2.6.23-rc2 I did some bisects and this
 was the result:
 196705c9bbc03540429b0f7cf9ee35c2f928a534 is first bad commit commit
 196705c9bbc03540429b0f7cf9ee35c2f928a534
 Author: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Date:   Thu May 3 08:58:49 2007 -0700
 
 I guess that the patch attached to bug 8535 in kernel.org bugzilla --
 http://bugzilla.kernel.org/attachment.cgi?id=12228action=view --
 solves your issues, right?  
 
 Stuart, did you submit this fix for upstream already please?

Yes... http://marc.info/?l=linux-usb-develm=118598561010046w=2

However, I have not tested this with a VIA EHCI controller (though it's
been tested with Intel, Broadcom, and nVidia).  This patch uses the
inactivate bit in the QH, which wasn't previously used by the linux
kernel, and I found that the different vendors of EHCI controllers
(Intel, Broadcom, nVidia) all handle this a little differently.  There's
probably something about the way VIA controllers respond to seeing this
bit set that is breaking things.

I'll try to get my hands on a VIA EHCI controller so I can look at
this... if you happen to know of an add-in card that has one of these,
please let me know!  It would be a lot easier for me to debug this
myself here than to try to get someone else to run test kernels for
me...

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] [PATCH] bug with EHCI cpufreq patch on nVidia controllers

2007-08-01 Thread Stuart_Hayes
Greg KH wrote:
 On Tue, Jul 31, 2007 at 04:00:15PM -0700, Pete Zaitcev wrote:
 On Tue, 31 Jul 2007 17:00:04 -0500, [EMAIL PROTECTED] wrote:
 
 +   list_for_each_safe (entry, tmp, qh-qtd_list) {
 +   qtd = list_entry (entry, struct ehci_qtd, qtd_list);
 +   if (cpu_to_le32 (qtd-qtd_dma) == qh-hw_current) +
return qtd;
 +   }
 
 Why use list_for_each_safe when mere list_for_each would do?
 In fact, some might ask for list_for_each_entry.
 
 Yes, I think Rusty has an automated email that checks for every new
 list_for_each that doesn't use list_for_each_entry, you don't want to
 hit that :)  
 
 So could you change it to use this?  I don't think that the safe
 version is needed here either, do you?
 
 thanks,
 
 greg k-h

Yes, sorry, I'm not sure what I was thinking.  Here's the changed patch
along with the description (again).  Thanks!

This patch fixes a bug with the cpu frequency change notifier and nVidia
EHCI controllers.  The nVidia controllers write the transfer overlay
back to the qtd when they see the inactivate bit set in the qh, which
clears the active bit in qtd-hw_token.  When the qh was reactivated,
the active bit in the overlay's hw_token was turned back on, but not
the one in the qtd.  This caused qh_completions to think that qtd was
finished even though it wasn't.

This patch (against 2.6.23-rc1) fixes this bug.

Signed-off-by: Stuart Hayes [EMAIL PROTECTED]



ehci_cpufreq_nvidia_30july_c.patch
Description: ehci_cpufreq_nvidia_30july_c.patch
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] USB-related instability on 2.6.23-rc1

2007-07-31 Thread Stuart_Hayes
Greg KH wrote:
 On Tue, Jul 31, 2007 at 03:06:07PM -0600, Grant Likely wrote:
 I've observed instability on my Macbook c2d with the Linus' latest
 tree.  Symptom is that USB devices start behaving badly and the
 kernel seems to be registering incorrect HID events; (ie. moving the
 mouse causes odd keyboard events).  If I continue to attempt to use
 the 
 mouse, it results in a system freeze (and I don't see any kind of
 backtrace in /var/log/kern.log).  To reproduce the problem, I just
 need to be using the keyboard/mouse.  If I leave the system idle, the
 problem does not seem to occur.  If I actively use the
 keyboard/mouse, the problem can occur at any time; usually within
 5-10 minutes. 
 
 In my particular case, I've got both a USB key/mouse pair on the
 laptop and an external key/mouse pair plugged in via a hub.
 Typically, it's the external mouse that exhibits badness first.
 
 USB HID devices are definitely affected.  I have not tested USB block
 devices.  Isochronous transfers to a USB headset appear to be
 unaffected, but that's under minimal testing.
 
 I've bisected down to commit
 196705c9bbc03540429b0f7cf9ee35c2f928a534, USB: EHCI cpufreq fix 
 
 Reverting the patch at the head of the tree makes the kernel stable
 again. 
 
 I've attached my kernel config file and a dmesg output
 
 Can you enable CONFIG_USB_DEBUG and send the linux-usb-devel list and
 us, the output when you have these problems? 
 
 thanks,
 
 greg k-h

I'm actually in the process right now of attaching a patch for this
problem (well, I hope it's a patch for the problem you're seeing) right
now on the bugzilla (bug 8535 at bugzilla.kernel.org).  If you'd like to
test it, that would be great.

Thanks,
Stuart

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] [PATCH] bug with EHCI cpufreq patch on nVidia controllers

2007-07-31 Thread Stuart_Hayes
This patch fixes a bug with the cpu frequency change notifier and nVidia
EHCI controllers.  The nVidia controllers write the transfer overlay
back to the qtd when they see the inactivate bit set in the qh, which
clears the active bit in qtd-hw_token.  When the qh was reactivated,
the active bit in the overlay's hw_token was turned back on, but not
the one in the qtd.  This caused qh_completions to think that qtd was
finished even though it wasn't.

This patch (against 2.6.23-rc1) fixes this bug.

Signed-off-by: Stuart Hayes [EMAIL PROTECTED]



ehci_cpufreq_nvidia_30july_b.patch
Description: ehci_cpufreq_nvidia_30july_b.patch
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] Bug in EHCI split-interrupt handling

2007-07-25 Thread Stuart_Hayes

I don't see how my patch could cause a transfer to return an
actual_length of -4.

If it is my patch causing this problem, I suspect it would be because
the nVidia EHCI controller handles the inactivate bit in an unexpected
(and probably out of spec) way--I was able to test with Broadcom 
Intel, but I didn't have an nVidia EHCI controller to test on.  I guess
it's possible that, when the inactivate bit is set, the nVidia EHCI
controller sets the status to make it look like that transaction is
finished, and maybe the inactivate bit was set early enough that
actual_length never got initialized to anything and the -4 was just
leftover in that memory space...?  I suggest this without looking at the
code--I don't know if that's actually possible.

I'll try to locate a box with an nVidia EHCI controller today and see if
I can reproduce this.

Stuart
 

-Original Message-
From: Alan Stern [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 5:07 PM
To: Pete Zaitcev
Cc: Hayes, Stuart; USB development list
Subject: Re: Bug in EHCI split-interrupt handling

On Tue, 24 Jul 2007, Pete Zaitcev wrote:

 On Tue, 24 Jul 2007 15:44:23 -0400 (EDT), Alan Stern
[EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=8535
  
  contains logs suggestive of problems with EHCI split-interrupt 
  handling.  See in particular comment #33; the usbmon log in comment 
  #32 contains a line in which a low-speed interrupt URB returns with 
  status equal to 0 and actual_length set to -4.
 
 Nicolas sent me a trace out of band which had a similar entry:
 8100057f6100 1467276128 C Ii:2:004:1 0:8 -4
 
  Obviously this should never happen.  Can anybody offer tips on where

  to go searching through the driver code for a solution?
 
 I'm going to double-check that usbmon is not deceiving you. It sure 
 looks totally bogus. Maybe I mixed up fields in a union somewhere.

Maybe.  Stuart Hayes has just added comment #42 in which he suggests
this problem may be caused by the cpufreq adjustment patch he wrote.  
Stuart, could your patch cause a 4-byte low-speed interrupt transfer to
return an actual_length of -4?

Alan Stern

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] Bug in EHCI split-interrupt handling

2007-07-25 Thread Stuart_Hayes

OK, I'm seeing an issue that I'm pretty sure is the same thing... the
keyboard is all kind of goofy (loses keys, repeats keys), then quits
working, then the system locks up, only when my patch is enabled and I'm
getting (faked) cpu frequency transitions.  It definitely appears to be
some incompatibilty between the nVidia EHCI controller and my patch.

I'm debugging now.  (And struggling to remember how this all works!  I
don't work with the USB code every day.)

Thanks
Stuart 

-Original Message-
From: Hayes, Stuart 
Sent: Wednesday, July 25, 2007 1:50 PM
To: 'Pete Zaitcev'
Cc: [EMAIL PROTECTED]; linux-usb-devel@lists.sourceforge.net
Subject: RE: Bug in EHCI split-interrupt handling


I'm working on it, Pete.  I've got a system with an nVidia EHCI
controller (unfortunately it's an Intel box, not AMD, since the failing
systems are AMD), and I'm working to reproduce the issue.  I acknowledge
that this issue is probably caused by this patch.

I suspect that what's going on is that the nVidia EHCI controller isn't
responding to the inactivate bit correctly.  I think I will find the
answer more quickly (and I'll be more certain that I've got the right
answer) if I can work on a system that exhibits the problem.

Thanks
Stuart

-Original Message-
From: Pete Zaitcev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 25, 2007 1:45 PM
To: Hayes, Stuart
Cc: [EMAIL PROTECTED]; linux-usb-devel@lists.sourceforge.net;
[EMAIL PROTECTED]
Subject: Re: Bug in EHCI split-interrupt handling

On Wed, 25 Jul 2007 08:13:45 -0500, [EMAIL PROTECTED] wrote:

 [...] and maybe the inactivate bit was set early enough that 
 actual_length never got initialized to anything and the -4 was just 
 leftover in that memory space...?  I suggest this without looking at 
 the code--I don't know if that's actually possible.

No, Stuart, this won't do. I need you to look at the code, because:
 a) I have explored the obvious avenues already, and
 b) We know that unsetting CONFIG_CPU_FREQ clears the issue.
The initialization of actual_length is done unconditionally in
usb_submit_urb, it was the first thing I checked!

I have two vague hypotheses (-sii?) currently:
 #1 Somehow your patch conflicts with the insertion code which
moves dummy qTD around.
 #2 The length in QH gets desynched from length in QTD, and we
have a pice of code which takes the qh-hw_token and uses it
for length calculation against qtd-length.

-- Pete

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] Bug in EHCI split-interrupt handling

2007-07-25 Thread Stuart_Hayes

I'm working on it, Pete.  I've got a system with an nVidia EHCI
controller (unfortunately it's an Intel box, not AMD, since the failing
systems are AMD), and I'm working to reproduce the issue.  I acknowledge
that this issue is probably caused by this patch.

I suspect that what's going on is that the nVidia EHCI controller isn't
responding to the inactivate bit correctly.  I think I will find the
answer more quickly (and I'll be more certain that I've got the right
answer) if I can work on a system that exhibits the problem.

Thanks
Stuart

-Original Message-
From: Pete Zaitcev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 25, 2007 1:45 PM
To: Hayes, Stuart
Cc: [EMAIL PROTECTED]; linux-usb-devel@lists.sourceforge.net;
[EMAIL PROTECTED]
Subject: Re: Bug in EHCI split-interrupt handling

On Wed, 25 Jul 2007 08:13:45 -0500, [EMAIL PROTECTED] wrote:

 [...] and maybe the inactivate bit was set early enough that 
 actual_length never got initialized to anything and the -4 was just 
 leftover in that memory space...?  I suggest this without looking at 
 the code--I don't know if that's actually possible.

No, Stuart, this won't do. I need you to look at the code, because:
 a) I have explored the obvious avenues already, and
 b) We know that unsetting CONFIG_CPU_FREQ clears the issue.
The initialization of actual_length is done unconditionally in
usb_submit_urb, it was the first thing I checked!

I have two vague hypotheses (-sii?) currently:
 #1 Somehow your patch conflicts with the insertion code which
moves dummy qTD around.
 #2 The length in QH gets desynched from length in QTD, and we
have a pice of code which takes the qh-hw_token and uses it
for length calculation against qtd-length.

-- Pete

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] [PATCH] bug removing ehci-hcd

2007-05-31 Thread Stuart_Hayes

I wasn't actually able to reproduce the bug myself, but I guess it is
pretty obvious that I shouldn't have called cpufreq_unregister_notifier
with a spinlock held.  I haven't been doing this long enough to know
exactly which kernel this patch should be against, so let me know if
this ins't good.  Thanks!


This patch (for the 2.6.21.3 kernel plus previously sent cpufreq
notifier patch) fixes a bug caused by calling
cpufreq_unregister_notifier (which can sleep) while holding a spinlock.

Signed-off-by: Stuart Hayes [EMAIL PROTECTED]


ehci_unregister_cpufreq_notifier.patch
Description: ehci_unregister_cpufreq_notifier.patch
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] [PATCH] ehci errors during cpu frequency changes

2007-04-12 Thread Stuart_Hayes
Hayes, Stuart wrote:
 EHCI controllers that don't cache enough microframes can get MMF
 errors when CPU frequency changes occur between the start and
 completion of split interrupt transactions, due to delays in reading
 main memory (caused by CPU cache snoop delays).   
 
 This patch adds a cpufreq notifier to the EHCI driver that will
 inactivate split interrupt transactions during frequency transitions.
 It was tested on Intel ICH7 and Serverworks/Broadcom HT1000 EHCI
 controllers.   
 
 Signed-off-by: Stuart Hayes [EMAIL PROTECTED]


Does the lack of response mean that this patch won't be accepted?  Is
there anything I could do to make it more acceptable?  There's
definitely an issue that this addresses... if this patch isn't good for
some reason I'll be happy to try another approach.  The only other thing
I can think of--other than setting the I bit--is to unlink all of the
split interrupt QHs during CPU frequency transitions.

Or... I could add a kernel parameter so that this patch wouldn't be
enabled by default, so that only users who are seeing the problem could
turn it on... that way this could get some wider testing without risk of
everybody's USB subsystems failing mysteriously...

Thanks
Stuart

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] [PATCH] ehci errors during cpu frequency changes

2007-04-03 Thread Stuart_Hayes

EHCI controllers that don't cache enough microframes can get MMF errors
when CPU frequency changes occur between the start and completion of
split interrupt transactions, due to delays in reading main memory
(caused by CPU cache snoop delays).

This patch adds a cpufreq notifier to the EHCI driver that will
inactivate split interrupt transactions during frequency transitions.
It was tested on Intel ICH7 and Serverworks/Broadcom HT1000 EHCI
controllers.

Signed-off-by: Stuart Hayes [EMAIL PROTECTED]



take9_i_bit_b.patch
Description: take9_i_bit_b.patch
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

2007-03-26 Thread Stuart_Hayes

Sorry for the delayed response on this... I've been working on it and
other stuff at the same time.  I found a couple bugs with my last patch,
and have tested quite a bit on both Intel  ServerWorks systems.  Here's
the new patch (very similar to the previous one I sent), and responses
to your questions below.

David Brownell wrote:
 On Wednesday 07 March 2007 1:10 pm, [EMAIL PROTECTED] wrote:
 
 OK, I've made an attempt to solve this issue without using such a
 large hammer.  Instead of shutting off the entire periodic schedule
 during cpufreq changes, this patch will just inactivate (using the
 I 
 bit in
 qh-hw_info1, currently unused in the EHCI driver) any queue heads
 qh-that
 are to full/low speed interrupt endpoints, during the cpufreq
 changes. 
 
 Makes sense ... although I don't trust your safe_to_inactivate()
 logic, that's inherently racey.  You could check to see if the
 transaction has started, and decide it didn't (and thus set the I
 bit) just when the controller starts that transaction.  The race
 seems unavoidable
 

safe_to_inactivate() shouldn't be racy, because I wait one full
microframe after the transaction should have been started before
checking QTD_STS_STS... the controller should write this bit during the
microframe that the transaction was started.  I did actually see
problems (rarely) when I checked QTD_STS_STS in the microframe
immediately after the transaction should have started, since I guess
QTD_STS_STS sometimes gets updated right at the end of the microframe.

 I had way too much ... fun isn't quite the word ... chasing issues
 like that with fault handling in the async ring.  Such issues might
 be less common with the periodic schedule; maybe.  When the driver
 loses that race, this comes up as an MMF error right?   

 
Yes, that's what happened when I was checking QTD_STS_STS the uframe
immedately after the transaction should have started.

 
 I have found that the ICH7 works without this patch, because it
 caches 7 uframes of the periodic schedule, so it never has to read
 main 
 memory in order to complete a split transaction in time.  The
 Broadcom/Serverworks HT1000 only caches 1 uframe, so it fails
 without this patch. 
 
 I've always wondered how that caching is supposed to work.  I suspect
 that you could make the ICH7 fail too, if the periodic schedule were
 busy enough ... it's not like they'll allocate enough silicon to
 cache every possible QH in SRAM inside that controller!  That could
 mean a LOT of memory for caching.  Right?
 
 Not to poke too much at all your good work here, but how would ICH7
 fail in that case, given the logic in this patch? 
 

Good question... maybe the caching indicates how many microframes in
advance the controller COULD cache the transactions--but it won't
necessarily do so if there are too many of them.

So the thing to do is to assume that the controller might have the QH
cached like it claims (i.e., don't mess with the QH if we're within
ehci-i_thresh of starting the transaction), but don't assume it will
necessarily cache everything (i.e., register the cpufreq notifier even
if i_thresh appears to be big enough that you wouldn't need it), just in
case.

 
 I have tested this code on both the HT1000 and ICH7 (I tested the
 patch on the ICH7 with the cpufreq notifier registered, too, even
 though the patch won't even register it, just to make sure it would
 work correctly).
 
 How heavily were your periodic schedules loaded?  The
 /sys/class/usb_host debug file labeled periodic should provide a
 textual display.  It would be good to test both hardware configs with
 a lot of USB mice hooked up.   
 
 Not that the current EHCI periodic scheduler handles very many, but
 ISTR that three full/low speed devices per TT was realistic.  And
 high speed hubs with multiple TTs are easy enough to find, given that
 Cypress chip.  At Dell, I expect you have one or two extra USB mice
 and keyboards lying around that could be used to try abusing your
 patch.  :) 
 

I guess this isn't all that relevant if I enable the notifier even for
the controllers that say that they'll cache 7 uframes, but I ran for 4
days with this load with no problems (I added line feeds, and deleted
repeated QHs  high-speed QHs, in the text below).  (I ran this on both
an AMD/ServerWorks HT1000 system, and an Intel ICH7 system.)

   7:  qh8-0e01/810001665480 (l5 ep1in [1/2] q1 p8)
qh8-1c02/810001665780 (l7 ep1in [1/2] q1 p8)
qh8-0e01/810001665d80 (l12 ep1in [1/2] q1 p8)
qh8-0e01/810001665e40 (l12 ep2in [1/2] q1 p8)
qh8-7008/810037fee6c0 (l5 ep2in [1/2] q1 p8)
qh8-0e01/810037fee780 (l13 ep1in [1/2] q1 p4)
  15:  qh16-3804/810001665900 (f8 ep1in [1/2] q1 p8)
qh16-3804/810001665b40 (l10 ep1in [1/2] q1 p8)
qh16-7008/810037fee0c0 (f14 ep1in [1/2] q1 p8)
  31:  qh32-7008/810037fee180 (f14 ep2in [1/2] q1 p4)
 127:  qh128-0e01/810001665600 (f6 ep1in [1/2] q1 p1)

 

Re: [linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

2007-03-09 Thread Stuart_Hayes
David Brownell wrote:
 On Wednesday 07 March 2007 1:10 pm, [EMAIL PROTECTED] wrote:
 
 OK, I've made an attempt to solve this issue without using such a
 large hammer.  Instead of shutting off the entire periodic schedule
 during cpufreq changes, this patch will just inactivate (using the
 I 
 bit in
 qh-hw_info1, currently unused in the EHCI driver) any queue heads
 qh-that
 are to full/low speed interrupt endpoints, during the cpufreq
 changes. 
 
 Makes sense ... although I don't trust your safe_to_inactivate()
 logic, that's inherently racey.  You could check to see if the
 transaction has started, and decide it didn't (and thus set the I
 bit) just when the controller starts that transaction.  The race
 seems unavoidable
 
 I had way too much ... fun isn't quite the word ... chasing issues
 like that with fault handling in the async ring.  Such issues might
 be less common with the periodic schedule; maybe.  When the driver
 loses that race, this comes up as an MMF error right?   
 

I think I am avoiding a race with the controller here, by not checking
QTD_STS_STS until the second uframe after the controller should have
started the split transaction.  If it HAD started the split transaction,
the controller should have written QTD_STS_STS back at the end of the
uframe in which it started it.  Originally I was checking QTD_STS_STS
the uframe immediately after it should have started the transaction, but
there was a tiny race there, though it usually took 12 hours running on
my system to show up.

I'm also not setting the I bit in any uframes that might already be
cached, or the uframe before that (i_thresh is set to the number of
cached uframes + 2).  So as long as the code can set the I bit within
one uframe after deciding that it's safe to do so, I don't think there
should be a race.  And, since interrupts are masked, and the CPU isn't
changing frequency, I would expect any system with an EHCI controller to
be able to do some minor math and set a bit within 125us.

 
 I have found that the ICH7 works without this patch, because it
 caches 7 uframes of the periodic schedule, so it never has to read
 main 
 memory in order to complete a split transaction in time.  The
 Broadcom/Serverworks HT1000 only caches 1 uframe, so it fails
 without this patch. 
 
 I've always wondered how that caching is supposed to work.  I suspect
 that you could make the ICH7 fail too, if the periodic schedule were
 busy enough ... it's not like they'll allocate enough silicon to
 cache every possible QH in SRAM inside that controller!  That could
 mean a LOT of memory for caching.  Right?
 
 Not to poke too much at all your good work here, but how would ICH7
 fail in that case, given the logic in this patch? 
 

Well, right now it would fail just like 2.6.20, because the patch won't
even register the cpufreq notifier if the EHCI controller says it caches
7 or more uframes!  :)  That is, it would get an MMF error and return
-EPROTO.

But, if we did register the cpufreq notifier even for the ICH7, I would
expect it not to fail regardless of whether it was caching or not, since
any split interrupt transactions would be inactivated before any CPU
frequency changes.  When the controller is caching 7 uframes, there's a
smaller window during which safe_to_inactivate() will return 1, but
there is still a window, unless you have a device with a polling period
of 1 frame (in which case the code will just leave the transaction
active, and we'll probably still get an MMF error).

 
 I have tested this code on both the HT1000 and ICH7 (I tested the
 patch on the ICH7 with the cpufreq notifier registered, too, even
 though the patch won't even register it, just to make sure it would
 work correctly).
 
 How heavily were your periodic schedules loaded?  The
 /sys/class/usb_host debug file labeled periodic should provide a
 textual display.  It would be good to test both hardware configs with
 a lot of USB mice hooked up.   
 
 Not that the current EHCI periodic scheduler handles very many, but
 ISTR that three full/low speed devices per TT was realistic.  And
 high speed hubs with multiple TTs are easy enough to find, given that
 Cypress chip.  At Dell, I expect you have one or two extra USB mice
 and keyboards lying around that could be used to try abusing your
 patch.  :) 
 

Yes, I'll load these systems up with lots of USB keyboards and mice, and
re-run.  I was able to see failures pretty often with no patch, or with
a patch that wasn't quite working, though, with the configs I have--I've
been testing with 3-4 keyboards/mice behind hub(s).

 
 This patch adds more code, but it avoids a couple of the issues that
 were brought up with the previous patch:
 
 1--This patch doesn't mess with isochronous transfers (or high speed
 interrupt transfers) at all.
 
 That seems to be a good resolution in both cases.  ISO transfers are
 best effort in any case, and you hadn't identified issues except
 with full/low speed interrupt 

Re: [linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

2007-03-07 Thread Stuart_Hayes
David Brownell wrote:
 On Tuesday 20 February 2007 12:27 pm, [EMAIL PROTECTED] wrote:
 
 This patch turns off periodic list processing in the EHCI controller
 for the duration of processor frequency changes.
 
 Did you test this with any kind of isochronous device active, like a
 set of USB speakers playing?  ISO transfers work differently than the
 interrupt transfers you tested (with a HID device).  
 
 Also ... what kind of hardware did you test this with?  I wonder how
 much of the (potential) 1.125 msec it was spinning while waiting for
 the periodic schedule to actually turn off.  That seems like it
 should be modifying the latency metrics used by cpufreq... is there
 even a feedback mechanism whereby the system can say that _right now_
 it would take this much more time?  Last time I remember looking at
 cpufreq, it had static metrics basically relating only to the CPUs;
 so costs related to components like EHCI were hidden.  (Much less
 costs associated with re-clocking other peripherals, which is a big
 issue with embedded SOC chips!) 
 
 
 The patch looks mostly OK, but the issue with isochronous transfers
 is a difference in how transfer completion is handled in the
 hardware.  
 
 Rather than scanning a queue head node in the schedule tree, which
 will be re-scanned after the periodic list is re-activated.  Instead,
 isochronous transfers use an entirely different hardware mechanism.  
 Their ITDs go before that schedule tree, and won't get re-scanned
 after the relevant frame passes. 
 
 I'd not be sure that the current iso scanning logic would behave
 correctly with this kind of on/off mechanism.  In fact I'd kind of
 expect it to break.  
 
 That's in addition to the issue that it might not be a Good Thing to
 let cpufreq create audio (or video etc) dropouts ... 
 
 Maybe the best solution to this issue would be to reject the cpufreq
 change if ISO transfers are active on EHCI. 
 
 - Dave

OK, I've made an attempt to solve this issue without using such a large
hammer.  Instead of shutting off the entire periodic schedule during
cpufreq changes, this patch will just inactivate (using the I bit in
qh-hw_info1, currently unused in the EHCI driver) any queue heads that
are to full/low speed interrupt endpoints, during the cpufreq changes.

I have found that the ICH7 works without this patch, because it caches 7
uframes of the periodic schedule, so it never has to read main memory in
order to complete a split transaction in time.  The Broadcom/Serverworks
HT1000 only caches 1 uframe, so it fails without this patch.

I have tested this code on both the HT1000 and ICH7 (I tested the patch
on the ICH7 with the cpufreq notifier registered, too, even though the
patch won't even register it, just to make sure it would work
correctly).

This patch adds more code, but it avoids a couple of the issues that
were brought up with the previous patch:

1--This patch doesn't mess with isochronous transfers (or high speed
interrupt transfers) at all.
2--The notifier will take very little time unless the cpufreq change
happens while a split transaction is in progress (or is already cached
in the controller).

I can repost it with a more concise summarization of the problem/fix and
a signed-off-by if it looks ok.

Thanks!
Stuart






take7_i_bit.patch
Description: take7_i_bit.patch
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

2007-02-21 Thread Stuart_Hayes
David Brownell wrote:
 On Tuesday 20 February 2007 12:27 pm, [EMAIL PROTECTED] wrote:
 
 This patch turns off periodic list processing in the EHCI controller
 for the duration of processor frequency changes.
 
 Did you test this with any kind of isochronous device active, like a
 set of USB speakers playing?  ISO transfers work differently than the
 interrupt transfers you tested (with a HID device).  
 
 Also ... what kind of hardware did you test this with?  I wonder how
 much of the (potential) 1.125 msec it was spinning while waiting for
 the periodic schedule to actually turn off.  That seems like it
 should be modifying the latency metrics used by cpufreq... is there
 even a feedback mechanism whereby the system can say that _right now_
 it would take this much more time?  Last time I remember looking at
 cpufreq, it had static metrics basically relating only to the CPUs;
 so costs related to components like EHCI were hidden.  (Much less
 costs associated with re-clocking other peripherals, which is a big
 issue with embedded SOC chips!) 
 

No--I'll try that (with  without the patch).  I also only tried this on
a Serverworks EHCI controller with AMD processors... I'll test an Intel
system, too, and see if I see the MMF errors on it without this patch.

 
 The patch looks mostly OK, but the issue with isochronous transfers
 is a difference in how transfer completion is handled in the
 hardware.  
 
 Rather than scanning a queue head node in the schedule tree, which
 will be re-scanned after the periodic list is re-activated.  Instead,
 isochronous transfers use an entirely different hardware mechanism.  
 Their ITDs go before that schedule tree, and won't get re-scanned
 after the relevant frame passes. 
 
 I'd not be sure that the current iso scanning logic would behave
 correctly with this kind of on/off mechanism.  In fact I'd kind of
 expect it to break.  
 
 That's in addition to the issue that it might not be a Good Thing to
 let cpufreq create audio (or video etc) dropouts ... 
 
 Maybe the best solution to this issue would be to reject the cpufreq
 change if ISO transfers are active on EHCI. 
 
 - Dave

You might get audio/video dropouts during cpu frequency changes without
or with this patch.  But... if cpufreq changes are rejected when ISO
transfers are active, wouldn't that have the potential to lock the
processors into a low-frquency state, if an isochronous device starts up
while the CPUs are otherwise idle?  How would the user know that his
performance is crap because of his isochronous device(s)?


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] [PATCH] EHCI interrupt transactions failing during CPU frequency changes

2007-02-20 Thread Stuart_Hayes

Many interrupt transactions to low/full speed devices are getting
QTD_STS_MMF from the EHCI controller because the split transaction isn't
completed in time.  This happens when the EHCI controller tries to read
main memory during a CPU frequency change, and it can't get the data
back in time, presumably due to the processor not being able to complete
a cache snoop in a timely manner when changing frequency.

This results in ehci_hcd returning -EPROTO to drivers like hid-core, and
results in lost keystrokes, as well as messages like usb 3-3.1: reset
low speed USB device using ehci_hcd and address 7.  This could
presumably also affect non-HID devices.

This patch turns off periodic list processing in the EHCI controller for
the duration of processor frequency changes.

Signed-off-by: Stuart Hayes [EMAIL PROTECTED]



ehci_cpufreq_2.patch
Description: ehci_cpufreq_2.patch
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] errors with split transactions powernow

2007-02-15 Thread Stuart_Hayes
Greg KH wrote:
 On Wed, Feb 14, 2007 at 04:49:11PM -0600, [EMAIL PROTECTED] wrote:
 
 Hello... While running minor stress on a system, I'm seeing messages
 like this quite frequently:
 
 What kernel version are you using?
 

2.6.20

 
 It seems to me that ehci-hcd itself should perhaps retry a
 transaction immediately--at least a couple of times--if it gets
 QTD_STS_MMF.  If 
 that is wrong for some reason, it might be nice if there was some way
 for hid-core (et al?) to know what's going on so it could opt to
 resubmit the URB right away--or at least not reset the device so
 hastily--if this is the cause of the URB failure... -EPROTO covers
 several other errors, too.
 
 Hm, care to make up a patch that would test this out?
 

Now that I've thought about it, I don't think that it makes sense for
ehci-hcd to automatically reactivate the failed TD... because, when we
get QTD_STS_MMF, it means that the transaction was already completed to
the device on the low/full speed bus.  If the broken split transaction
was reading/writing data from a hard drive, and we just re-issue the
read/write TD, I think we'll end up with data corruption... we'd need
the usb-storage or scsi driver to realize that something went wrong and
redo the whole CDB.

But, even worse than that, other types of devices might lose data
irretrievably.  For example, if the broken split transaction had just
read a keypress from a keyboard, and the TT threw away the keypress
because we didn't complete the transaction in time, we couldn't get it
back at all.  (I've actually verified that my typing gets corrupted if I
type for a few minutes during a kernel compile, though I'm sure most of
the broken split transactions got a NAK from the keyboard.)

So I think ideally linux needs some way of coordinating ehci-hcd and
powernow, so that the EHCI controller can be stopped temporarily while
the processors are changing speed... I'm going to try to hack something
to do that, just to verify that it will fix the problem, but I'm not
sure what the right way to implement that would be.

(Or maybe it isn't worth implementing that, if everything non-HID could
recover from this by re-doing stuff... I'm not familiar with all the USB
device types.)

The controller is getting quite a few QTD_STS_MMF errors,
though--several per minute while compiling a kernel on my system with 4
HID devices behind USB 2.0 hubs.  (I put in a printk so I could see
every time qtd_copy_status() sees QTD_STS_MMF.)

Stuart

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] errors with split transactions powernow

2007-02-14 Thread Stuart_Hayes

Hello... While running minor stress on a system, I'm seeing messages
like this quite frequently:

usb 3-1.1: reset full speed USB device using ehci_hcd and address 5

These messages are occurring for all 4 of my low/full speed HID devices
that are connected to a USB2.0 hub (so all are using split transactions
on an interrupt pipe).  This is occurring on an AMD system, and the
problem only occurs when the processors are changing speed
(powernow/cpufreq must be enabled).

I added a few printks, and the failures are all because the EHCI
controller is setting QTD_STS_MMF (missed micro-frame), which means that
the EHCI controller started a split transaction but wasn't able to
complete it in time.

I think this is happening because the EHCI controller is getting delayed
trying reading main memory while the processors are changing speed (this
is where the cpufreq/powernow stuff comes in).  If this happens between
the start  complete parts of a split transaction, the EHCI controller
can't complete the split transaction in time, and it sets QTD_STS_MMF,
causing ehci-hcd to return -EPROTO to the hid-core driver.

The function hid_io_error() in hid-core will in theory retry this
transaction every so often for up to a second, but, in reality, the
transaction isn't getting retried at all before the second is up,
because the system is loaded fairly heavily and the code that retries
the transaction is called by a timer and not run immediately in the
interrupt handler.

Could anyone recommend a good way to fix this?

It seems to me that ehci-hcd itself should perhaps retry a transaction
immediately--at least a couple of times--if it gets QTD_STS_MMF.  If
that is wrong for some reason, it might be nice if there was some way
for hid-core (et al?) to know what's going on so it could opt to
resubmit the URB right away--or at least not reset the device so
hastily--if this is the cause of the URB failure... -EPROTO covers
several other errors, too.

Thanks in advance for any advice!
Stuart

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] ehci HCRESET

2005-04-12 Thread Stuart_Hayes
 
I notice that ehci-hcd.c only waits up to 25ms after setting CMD_RESET
in the EHCI command register before timing out and giving up.

Is that 25ms based on observation, or is it actually specified
somewhere?  I couldn't find anything in the EHCI 1.0 spec that says how
long a controller can take to finish a reset, and I've got a controller
that seems to be taking a little bit longer than that.

Thanks!
Stuart


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
___
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] URB,USB requests and CBW/CSW

2004-11-11 Thread Stuart_Hayes

I'm not the world's top authority on the subject, but my understanding
is that requests on endpoint 0 will either be standard device requests
(such as reading basic information about the device), or device specific
requests (things such as SET_IDLE for keyboard/mouse devices, which can
be used to them to not to send any reports unless something has changed,
or GET_MAX_LUN for usb bulk only transport storage devices).

CBWs and CSWs are part of the USB mass storage bulk only transport
device spec, and are transferred on the bulk pipe (not endpoint 0), and
are used to actually transfer data to and from the mass storage device,
rather than to configure the device.

Hope that helps (and is correct),
Stuart

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hou
Xiang ZHU
Sent: Wednesday, November 10, 2004 11:00 PM
To: [EMAIL PROTECTED]
Subject: [linux-usb-devel] URB,USB requests and CBW/CSW

hello,
A simple question related to linux usb concept here:

I am reading the USB Spec2.0 , Mass Storage Spec. as well as linux usb
code. I am little confused by some concept concerning URB,USB requests
and CBW/CSW.

In my understanding, each time host want to send request or data to usb
device, it has to use usb_submit_urb(),

The USB request has standard device request and Class-specific request,
for standard request, it is used to get general USB device info.
but when it is going to use class-specific request?

what is the relationship between CBW/CSW and Class-specific request?

so a URB contains either USB standard request or class-specific request
or CBW? how about actual data(for example when writing a file to USB
stick)?

Thanks,
Houxiang Zhu



---
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE LinuxWorld
Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_idU88alloc_id065op=ick
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel




---
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_idU88alloc_id065op=click
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


Re: [linux-usb-devel] ep0 timeout followed by device removal, ep0

2004-10-25 Thread Stuart_Hayes

 is in uninterruptible sleep state. I can reproduce the failure by
cat'ing 
 /proc/bus/usb/devices while performing a mount on the usbdrive. Below
is a 

This problem sounds a lot like one that I encountered a few months ago
with a Teac USB CD-ROM drive--the drive couldn't handle being accessed
through ep0 (like it is when you cat /proc/bus/usb/devices) at the same
time that it is getting storage requests through the other
endpoints--especially if one of the storage commands required clearing a
stalled pipe, which also has to be done on ep0.

I submitted a patch (in two parts) that uses an extra semaphore to keep
these accesses from overlapping, and Pete incorporated at least some of
these patches into the Red Hat RHEL3 kernel.  However, I don't think
this was accepted into the mainstream kernel.

That might be worth trying.  I don't think it was very difficult to
retool this patch to work with the 2.6 kernel (I'm sorry, I wasn't
following this issue from the start, and I don't know which kernel you
are using).

Here are the patches (again, these are for the 2.4 kernel).

--Stuart


diff -BurN linux-2.4.21-11.EL/drivers/usb/devio.c
linux-2.4.21-11-usbfix2-temp.EL/drivers/usb/devio.c
--- linux-2.4.21-11.EL/drivers/usb/devio.c  2004-03-08
22:15:09.0 -0600
+++ linux-2.4.21-11-usbfix2-temp.EL/drivers/usb/devio.c 2004-05-03
09:57:22.0 -0500
@@ -1171,6 +1171,12 @@
up_read(ps-devsem);
return -ENODEV;
}
+   /*
+* grab device's exclusive_access mutex to prevent driver from
+* using this device while it is being accessed by us
+*/
+   down ((ps-dev-exclusive_access));
+
switch (cmd) {
case USBDEVFS_CONTROL:
ret = proc_control(ps, (void *)arg);
@@ -1250,6 +1256,7 @@
ret = proc_ioctl(ps, (void *) arg);
break;
}
+   up ((ps-dev-exclusive_access));
up_read(ps-devsem);
if (ret = 0)
inode-i_atime = CURRENT_TIME;
diff -BurN linux-2.4.21-11.EL/drivers/usb/storage/transport.c
linux-2.4.21-11-usbfix2-temp.EL/drivers/usb/storage/transport.c
--- linux-2.4.21-11.EL/drivers/usb/storage/transport.c  2004-03-08
22:15:15.0 -0600
+++ linux-2.4.21-11-usbfix2-temp.EL/drivers/usb/storage/transport.c
2004-05-03 09:57:23.0 -0500
@@ -627,8 +627,17 @@
int need_auto_sense;
int result;
 
+   /*
+* Grab device's exclusive_access mutex to prevent /usbfs from
+* sending out a command in the middle of ours (if libusb sends
a
+* get_descriptor or something on pipe 0 after our CBW and
before
+* our CSW, and then we get a stall, we have trouble)
+*/
+   down ((us-pusb_dev-exclusive_access));
+
/* send the command to the transport layer */
result = us-transport(srb, us);
+   up ((us-pusb_dev-exclusive_access));
 
/* if the command gets aborted by the higher layers, we need to
 * short-circuit all other processing
@@ -748,7 +757,9 @@
srb-use_sg = 0;
 
/* issue the auto-sense command */
+   down ((us-pusb_dev-exclusive_access));
temp_result = us-transport(us-srb, us);
+   up ((us-pusb_dev-exclusive_access));
 
/* let's clean up right away */
srb-request_buffer = old_request_buffer;
diff -BurN linux-2.4.21-11.EL/drivers/usb/usb.c
linux-2.4.21-11-usbfix2-temp.EL/drivers/usb/usb.c
--- linux-2.4.21-11.EL/drivers/usb/usb.c2004-03-08
22:15:09.0 -0600
+++ linux-2.4.21-11-usbfix2-temp.EL/drivers/usb/usb.c   2004-05-03
09:57:23.0 -0500
@@ -997,6 +997,7 @@
INIT_LIST_HEAD(dev-filelist);
 
init_MUTEX(dev-serialize);
+   init_MUTEX(dev-exclusive_access);
 
dev-bus-op-allocate(dev);
 
diff -BurN linux-2.4.21-11.EL/include/linux/usb.h
linux-2.4.21-11-usbfix2-temp.EL/include/linux/usb.h
--- linux-2.4.21-11.EL/include/linux/usb.h  2004-03-08
22:15:40.0 -0600
+++ linux-2.4.21-11-usbfix2-temp.EL/include/linux/usb.h 2004-05-03
09:57:40.0 -0500
@@ -828,6 +828,8 @@
 
atomic_t refcnt;/* Reference count */
struct semaphore serialize;
+   struct semaphore exclusive_access; /* prevent driver  proc
accesses 
+ from overlapping cmds on
bus  */
 
unsigned int toggle[2]; /* one bit for each endpoint
([0] = IN, [1] = OUT) */
unsigned int halted[2]; /* endpoint halts; one bit per
endpoint #  direction; */



diff -urN linux-2.4.21-15.18.EL/drivers/usb/devices.c
linux-2.4.21-15.18.ELusb/drivers/usb/devices.c
--- linux-2.4.21-15.18.EL/drivers/usb/devices.c 2002-11-28
17:53:14.0 -0600
+++ linux-2.4.21-15.18.ELusb/drivers/usb/devices.c  2004-07-16
10:08:24.0 -0500
@@ -387,22 +387,32 @@
 
if (start  end)
return start;
-   
+
+   /*
+   

RE: [linux-usb-devel] PATCH: (as357) Set QH bit in UHCI framelist entries

2004-08-11 Thread Stuart_Hayes
Pete Zaitcev wrote:
 On Wed, 11 Aug 2004 11:42:48 -0400 (EDT) Alan Stern
 [EMAIL PROTECTED] wrote: 
 
 This patch fixes the error in the UHCI driver found by Stuart Hayes.
 It adds the UHCI_PTR_QH bit into the initial entries stored in the
 hardware framelist.  It's not entirely clear how the driver ever
 managed to work with these bits not set; apparently by coincidence
 the 
 QH entries resembled TD entries sufficiently closely to fool the
 hardware. 
 
  /* Only place we don't use the frame list routines */
 -uhci-fl-frame[i] =
cpu_to_le32(uhci-skelqh[irq]-dma_handle);
 +uhci-fl-frame[i] = UHCI_PTR_QH |
 +
cpu_to_le32(uhci-skelqh[irq]-dma_handle);
 
 What about uhci_insert_td_frame_list() and uhci_remove_td() ?
 
 -- Pete

My two cents worth (which is really not worth much more than that):

This patch only affects the initial entries in the framelist,
which are all QHs.  uhci_insert_td_frame_list() and uhci_remove_td()
don't need to add the UHCI_PTR_QH bit since they are putting a link 
to a TD (not a QH) into the frame list.  These routines don't delete
this bit in the original QH entry.




---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] uhci host controller process error

2004-08-10 Thread Stuart_Hayes

I have several systems that are failing with the messages below when 
the uhci-hcd module is loaded.  This only seems to happen when the 
QHs and/or TDs (I haven't narrowed this down yet) are at a physical 
address above roughly 2.5GB.  If I limit the memory with a mem=2500m 
command line, or if I modify the driver to ensure that all of the 
QHs and TDs are mapped below that address, uhci-hcd loads fine.

I am seeing this problem with a Red Hat kernel based on a 2.6.5 
kernel, but I assume this would happen on a non-Red Hat kernel as 
well.

Since the host controller process error is supposed to indicate that
the host controller thinks there's something wrong with the TD, I'm 
guessing that maybe the HC can't access memory at certain regions, 
but I've checked the chipset registers, and I don't see anything.
Also, I'm seeing this on two completely different systems--one with
an ICH3, and one with an ICH5.

The error is seen on the first IRQ that comes back after the host
controller is started, and there are no USB devices plugged in.

Any idea what might be causing this?  Is anyone else seeing this?


--
May 25 11:54:20 discovery kernel: uhci_hcd :00:1d.0: new USB bus
registered, assigned bus number 1
May 25 11:54:20 discovery kernel: uhci_hcd :00:1d.0: host controller
process error, something bad happened!
May 25 11:54:20 discovery: uhci_hcd :00:1d.0: host controller
halted, very bad!

--

Thanks
Stuart




---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] RE: uhci host controller process error

2004-08-10 Thread Stuart_Hayes

 Any idea what might be causing this?  Is anyone else seeing this?
 


--
 May 25 11:54:20 discovery kernel: uhci_hcd :00:1d.0: new USB bus
 registered, assigned bus number 1 May 25 11:54:20 discovery kernel:
 uhci_hcd :00:1d.0: host controller process error, something bad
 happened!   
 May 25 11:54:20 discovery: uhci_hcd :00:1d.0: host controller
 halted, very bad!


--
 

Never mind, I figured it out.  It looks like the uhci-hcd driver
doesn't add a | UHCI_PTR_QH to the pointers that it puts
in the frame list.  This causes the ICH to think that the frame list
is pointing to a bunch of TDs instead of QHs for purposes of
checking for TD errors.  I can only assume that the ICH
is actually treating the frame list entries as QH pointers in spite
of that bit not being set when it is actually executing the
schedule, or else I don't think it would work generally.

I guess the high addresses were just making the QH look like an
invalid TD instead of a valid TD... not sure exactly what the ICH
is checking for!






---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] [PATCH] proper bios handoff in ehci-hcd

2004-07-13 Thread Stuart_Hayes
Will Beers wrote:
   though maybe 500 msec is too short a period to wait.
   See if 5000 msec helps.
 
 I went all the way up to 2 msec and it still didn't help.  I'm
 sure it's a bad idea, but removing that whole if-block below it makes
 it work (which is effectively what switching the and/or did).  I
 don't know enough about it to judge whether it's correct, but what
 exactly is it checking for there?
 
 -Will

Without the patch, Linux would just ignore the BIOS handoff--Linux was
writing 0 to the bit that it was supposed to wait for the BIOS to
clear, so it never waited for the BIOS to let go of the controller.

I bet you have a bad BIOS that won't hand off, but I would try the other
thing David suggested--change the write to a byte write.  It seems
unlikely, but, since Linux is writing a 1 to the BIOS owns the
controller bit right now, you might be hitting something like this, if
the system is breaking up the write into multiple smaller writes:

the OS wants the controller bit is getting written to 1 (first part of
the Linux write, which the system broke into pieces)
the system BIOS (SMI handler) sees that bit set to 1, and clears the
BIOS owns bit
the BIOS owns bit is getting written back to a 1 (the second part of
the Linux write)
Linux waits in vain for BIOS to clear the BIOS owns bit\

Again, seems unlikely, but worth a try if you're recompiling and
testing.




---
This SF.Net email sponsored by Black Hat Briefings  Training.
Attend Black Hat Briefings  Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] Re: [usb-storage] Re: Clear_feature doesn't happen when get_descriptor on pipe

2004-05-06 Thread Stuart_Hayes
David Brownell wrote:
 Wouldn't a plain old atomic BUSY bitflag work better?  If it's
 set, usbcore would reject urbs with -EAGAIN ... from all
 contexts, including the many that can't acquire semaphores!
 
 
 I thought (and I might be confused with other OSes here) that there
 was a way to attempt to acquire a semaphore from an 'unfriendly'
 context -- either you got the semaphore or you got an indication
 that you can't get it.
 
 As in down_trylock()?  Thing is, one can't necessarily know
 whether the current context is friendly or not.  So one API
 design strategy thus starts out assuming none of them are, and
 then sees how well that applies system-wide.  Hence a bit flag.
 

But wouldn't a bit flag require continuous retries on the part of
whomever gets the -EAGAIN when they try to submit an URB?  And, if so,
wouldn't that leave open the possibility of them missing the window
when the BUSY bit isn't set (unless they retry in a very tight loop,
in which case they could waste a lot of processor time)?

 
 The problem is in figuring out which devices need this treatment. 
 Since the only way to discover this is by crashing the device, it's
 not very friendly.
 
 But at that point you'll know, and can add it to the list.  It's also
 not friendly to re-define USB so that it won't work well with devices
 that don't have this rare bug ... :)
 
 - Dave

I don't understand why this would make USB not work well with devices
that don't have this problem.  I would think grabbing a semaphore would
be extremely quick unless someone else already has it, which should be
a very rare occurrance.




---
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson  Lucent use to
deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] Re: [usb-storage] Re: Clear_feature doesn't happen when get_descriptor on pipe

2004-05-06 Thread Stuart_Hayes
David Brownell wrote:
 
 Wouldn't a plain old atomic BUSY bitflag work better?  If it's set,
 usbcore would reject urbs with -EAGAIN ... from all contexts,
 including the many that can't acquire semaphores!  
 

Going back a bit--what context were you thinking of that wouldn't be
able to grab the semaphore--are you talking about a userspace program
that talks to a USB device URB-by-URB, instead of using a driver like
usb-storage?



---
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson  Lucent use to
deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] RE: [usb-storage] Re: Clear_feature doesn't happen when get_descriptor on pipe

2004-05-03 Thread Stuart_Hayes
Matthew Dharm wrote:
 On Fri, Apr 30, 2004 at 06:32:14PM -0700, Pete Zaitcev wrote:
 On Fri, 30 Apr 2004 16:34:48 -0500
 [EMAIL PROTECTED] wrote:
 
 Here's the patch (for 2.4) if you're curious.
 
 Do you want me to include all relevant transport types, or you want
 to do it yourself?
 
 Or the lock could be moved to where the transport is invoked, instead
 of putting it in every transport.
 
 Matt

I guess that would be easier, assuming the only path to the transport
(usb_stor_Bulk_transport(), etc.) is through
usb_stor_invoke_transport(), which appears to be the case.

I'll try it.

Thanks
Stuart


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] Clear_feature doesn't happen when get_descriptor on pipe

2004-04-30 Thread Stuart_Hayes
My apologies--my bugzilla account doesn't have the right permissions.
It's not that I'm hiding...

I'm the one who made  submitted the patch, and I'm the one who posted 
the message about Robert having found an easy way to reproduce the 
issue.

--Stuart Hayes
[EMAIL PROTECTED]


-Original Message-
From: Pete Zaitcev [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 29, 2004 11:59 PM
To: Johannes Erdfelt
Cc: [EMAIL PROTECTED]; Hayes, Stuart;
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [linux-usb-devel] Clear_feature doesn't happen when
get_descriptor on pipe


On Thu, 29 Apr 2004 21:24:27 -0700
Johannes Erdfelt [EMAIL PROTECTED] wrote:

 Although, I'm curious what other thread is sending a GET_DESCRIPTOR
 query that usb-storage is conflicting with. Was this a rare occurance
or
 is this repeatable?

Dell people (who for some unimaginable reason prefer to hide behind
a corporate alias, so I cannot tell if it's Gary, Stuart, or other)
describe it this way:

 Oh, I forgot to mention--Robert figured out how to reproduce this 
 easily:  do a while true; do updfstab; done on one console, which 
 results in continuous accesses of the USB CD-ROM through the SCSI 
 layer.  Then, on a second console, do a lsusb -v, which will result 
 in a lot of GET_DESCRIPTOR commands being sent to pipe 0 through 
 the /proc filesystem.  You should immediately see the failure.

-- Pete


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] RE: Clear_feature doesn't happen when get_descriptor on pipe

2004-04-30 Thread Stuart_Hayes
Pete Zaitcev wrote:
 On Wed, 28 Apr 2004 15:09:39 -0400 (EDT) Alan Stern
 [EMAIL PROTECTED] wrote: 
 
 Perhaps I could keep the proc_ioctls from running until the device
 isn't being used by usb-storage,
 
 That sounds feasible.
 
 It is feasible, but it runs into difficulties with locking order.
 I tried to fiddle with the patch Stuart dropped into RH Bugzilla.
 I do not have a confidence in it, and unfortunately, the 2.4
 development is closed. I do not want to push it to Marcelo and
 collect failure reports at this stage of development.  
 
  or maybe just make the transport layer of usb-storage retry the
 clear_feature command for a while if the pipe already has an URB
 going, instead of simply failing and moving on to the next step
 without clearing the stall...?
 
 You could do that.  Check for -ENXIO error codes and retry in a short
 loop with usb_stor_control_msg().
 
 Stuart, why don't we do something like the attached? Please back out
 your patch with semaphores and give it a try, tell me how it went. 
 
 Busy waits are bad, I know. HOWEVER, this particular thing is clearly
 local in its scope. THEREFORE, if it solves your problem, I can ship
 it with Red Hat and even push to Marcelo.  
 
 Critique and testing are appreciated.
 
 -- Pete
 
 diff -urN -X dontdiff linux-2.4.26/drivers/usb/storage/transport.c
 linux-2.4.26-nip/drivers/usb/storage/transport.c ---
 linux-2.4.26/drivers/usb/storage/transport.c  2004-02-26
 14:09:59.0 -0800 +++
   linux-2.4.26-nip/drivers/usb/storage/transport.c
2004-04-30
   11:07:18.0 -0700 @@ -471,6 +471,7 @@ int result; int
endp =
 usb_pipeendpoint(pipe) | (usb_pipein(pipe)  7); 
 
 +retry_devio:
   result = usb_stor_control_msg(us,
   usb_sndctrlpipe(us-pusb_dev, 0),
   USB_REQ_CLEAR_FEATURE, USB_RECIP_ENDPOINT, 0, @@ -478,8
+479,21 @@
   US_DEBUGP(usb_stor_clear_halt: result=%d\n, result);
 
   /* this is a failure case */
 - if (result  0)
 + if (result  0) {
 + if (result == -ENXIO) {
 + /*
 +  * usb-uhci cannot queue. If lsusb ran on this
device
 +  * (by a startup script or something), we lose.
 +  * But we cannot afford for halts not to clear
 +  * for a reason so silly, so we persist rudely.
 +  */
 + set_current_state(TASK_UNINTERRUPTIBLE);
 + schedule_timeout(2);
 + set_current_state(TASK_RUNNING);
 + goto retry_devio;
 + }
   return result;
 + }
 
   /* reset the toggles and endpoint flags */
   usb_endpoint_running(us-pusb_dev, usb_pipeendpoint(pipe),

I haven't had any trouble with the patch I submitted.  I rebooted the 
system repeatedly overnight with 2 empty CD-ROMs (lots of stalls--this
is 
the config that would lock up almost every boot before), and I also ran 
updfstab  lsusb simultaneously for a while.

I tried your patch (I actually had tried my own version of this a day or

two ago), and it does not work.  It does make it work a little bit
longer, 
but it eventually fails.  I'm not sure exactly why it fails.  What I see
on 
the USB bus when the failure occurs is this:

READ_CAPACITY CBW on endpoint 2
GET_DESCRIPTOR issued on endpoint 0
READ_CAPACITY  GET_DESCRIPTOR attempt data reads on endpoints 0  1
(NAKed)
endpoint 1 stalls
GET_DESCRIPTOR continues to attempt data reads on endpoint 0 until it
times out (100-ish ms later)
CLEAR_FEATURE is issued on endpoint 0
0 length data is read on endpoint 0 to ACK the CLEAR_FEATURE
GET_DESCRIPTOR is re-issued on endpoint 0 (I believe lsusb retries this
when it fails, not the driver)
the CSW from the GET_DESCRIPTOR is read from endpoint 1 (it failed of
course)
the data for the GET_DESCRIPTOR is read from endpoint 0
0 length data is written to endpoint 0 to ACK the GET_DESCRIPTOR data
a new GET_DESCRIPTOR is issued on endpoint 0
a REQUEST_SENSE CBW goes out on endpoint 2
the data for the GET_DESCRIPTOR is read from endpoint 0
0 length data is written to endpoint 0 to ACK the GET_DESCRIPTOR data
the sense data (the data for the REQUEST_SENSE) is read from endpoint 1
a new GET_DESCRIPTOR is issued on endpoint 0
the data for the GET_DESCRIPTOR is read from endpoint 0
0 length data is written to endpoint 0 to ACK the GET_DESCRIPTOR data
the CSW for the REQUEST_SENSE command is read from endpoint 1 -- *it has
status=04, which is reserved*
a BULK_ONLY_MASS_STORAGE_RESET is issued on endpoint 0 (because the
status was invalid)
0 length data is read from endpoint 0 to ACK the reset
a new GET_DESCRIPTOR is issued on endpoint 0, which is NAKed until it
times out 100ms later
(this happens many times in a row)

(Many NAKs are omitted for brevity.  I can send you the USB trace if you
want it.)

So, it looks like several things aren't working right:  First, I would 
*expect* the GET_DESCRIPTOR command to 

[linux-usb-devel] RE: Clear_feature doesn't happen when get_descriptor on pipe

2004-04-30 Thread Stuart_Hayes
Alan Stern wrote:
 On Fri, 30 Apr 2004 [EMAIL PROTECTED] wrote:
 
 So, it looks like several things aren't working right:  First, I
 would *expect* the GET_DESCRIPTOR command to finish executing on
 endpoint 0 even after the bulk in pipe stalls, but this doesn't
 appear to be the case. It
 looks like a stall on endpoint 1 also stalls endpoint 0.  I can't
 find anything in the USB spec that explicity says how this should
 work. 
 
 That should never happen; endpoints are supposed to be independent. 
 Also, endpoint 0 isn't supposed to stall unless the device is given a
 request that it can't carry out.  
 
 Second,
 the Teac drive is returning in invalid status of 4, which I'm
 assuming is related to all the overlapping commands, since it
 doesn't happen with the semaphore in place.  And, finally, the drive
 doesn't respond to GET_DESCRIPTOR commands at all after the bulk
 reset... but this may be caused by the same thing that caused it to
 return a status of 4. 
 
 Granted, it is possible that these problems *may* be caused by the
 Teac CD-ROM not being as robust as it should be.  However, if the
 Teac has this problem, I'm betting other devices do, too.  It looks
 like the semaphore patch may fix another problem we're seeing here
 with a USB key--Gary's still checking it out.
 
 I think you're right that the drive itself is not robust.  I've heard
 others mention devices that get mixed up when they receive control
 requests while executing a SCSI command.  
 
 We have this problem even under 2.6.  Ultimately the solution will
 have to be to make the device single-access only.  It's not clear how
 to do that, and usbfs doesn't enforce such limitations at the moment.
 
 Alan Stern

With Pete's patch, I think that is all that's going wrong.

My proposal to fix the issue (I didn't post the patch on here) was to
add a semaphore to struct usb_device, and make the usbdev_ioctl() (in
devio.c) grab that semaphore before doing the ioctl, and also make
transport.c's usb_stor_Bulk_transfer() grab the semaphore before
sending out the CBW (and release it only after getting back the CSW).
Same thing for the other mass storage transports (though I hadn't
actually included that in the patch I sent Pete).  As far as I can
tell, usbdev_ioctl is the only way that a control message would go
out to thet device at the same time that usb-storage is sending out
a SCSI/ATAPI command.

Stuart


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] RE: Clear_feature doesn't happen when get_descriptor on pipe

2004-04-30 Thread Stuart_Hayes
Here's the patch (for 2.4) if you're curious.






2cdromfix2.patch
Description: 2cdromfix2.patch


[linux-usb-devel] Clear_feature doesn't happen when get_descriptor on pipe

2004-04-28 Thread Stuart_Hayes

Hello...

I'm seeing a problem where I have an empty CD-ROM drive (or two)
connected
 to the system, and, every now and then, I get hangs during boot (or
 issues with commands not completing once the system is up).

I've tracked down the issue to a sequence of events similar to the 
following:

---
Issue a READ_TOC CBW to bulk out pipe

Issue a GET_DESCRIPTOR to pipe 0 (a different thread does this,
presumably)

Try to read data for READ_TOC from bulk in pipe -- pipe stalls because 
there's no CD in the drive and therefore no TOC data

Try to issue a CLEAR_FEATURE to clear the stall to pipe 0 -- this fails 
because there's already an URB in progress to pipe 0

USB driver then tries to read the CSW (never mind that the stall wasn't 
cleared)--this also fails because pipe is stalled

Try to issue a CLEAR_FEATURE again, with similar results

USB bus then gets continuous READ_TOC commands on the USB bus, which all

get NAKed (pipe still stalled)
---

How should this get handled?  I would assume that the CLEAR_FEATURE
should 
get retried after the GET_DESCRIPTOR happens, but this certainly isn't 
what's happening.

I'm working with RedHat kernel 2.4.21-11, but I've tried 2.4.26, which
also 
fails at similar places doing similar things, though I saw kernel panics

instead of hangs, so I'm not sure exactly what's happening in 2.4.26.

Thanks!
Stuart



---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] Clear_feature doesn't happen when get_descriptor on pipe

2004-04-28 Thread Stuart_Hayes
Alan Stern wrote:
 On Wed, 28 Apr 2004 [EMAIL PROTECTED] wrote:
 
 
 Hello...
 
 I'm seeing a problem where I have an empty CD-ROM drive (or two)
 connected  to the system, and, every now and then, I get hangs during
 boot (or  issues with commands not completing once the system is up).
 
 I've tracked down the issue to a sequence of events similar to the
 following: 
 
 ---
 Issue a READ_TOC CBW to bulk out pipe
 
 Issue a GET_DESCRIPTOR to pipe 0 (a different thread does this,
 presumably) 
 
 Try to read data for READ_TOC from bulk in pipe -- pipe stalls
 because there's no CD in the drive and therefore no TOC data
 
 Try to issue a CLEAR_FEATURE to clear the stall to pipe 0 -- this
 fails because there's already an URB in progress to pipe 0
 
 USB driver then tries to read the CSW (never mind that the stall
 wasn't cleared)--this also fails because pipe is stalled
 
 Try to issue a CLEAR_FEATURE again, with similar results
 
 USB bus then gets continuous READ_TOC commands on the USB bus, which
 all 
 
 get NAKed (pipe still stalled)
 ---
 
 How should this get handled?
 
 Under Linux 2.4, one way to handle this is by preventing the other
 thread from issuing the GET_DESCRIPTOR to pipe 0 while the CDROM
 drive is busy. You could find out what that other thread is and
 suspend or kill it.  
 
 Under Linux 2.6 there shouldn't be any problem.
 
  I would assume that the CLEAR_FEATURE should get retried after the
 GET_DESCRIPTOR happens, but this certainly isn't what's happening.
 
 The Host Controller drivers in 2.4 -- the UHCI drivers at least --
 can't handle multiple simultaneous requests to the same control
 endpoint. That's not going to change; if you need that capability
 then you have to use 2.6.  
 
 I'm working with RedHat kernel 2.4.21-11, but I've tried 2.4.26,
 which also fails at similar places doing similar things, though I
 saw kernel panics 
 
 instead of hangs, so I'm not sure exactly what's happening in 2.4.26.
 
 Thanks!
 Stuart
 
 Alan Stern

I think one of the threads is updfstab, lsusb, or kudzu, which access 
the device's pipe 0 through the /proc directory, while the other thread 
is just cdrom.c trying to do some stuff with the drive.  I'm not sure
how 
one could keep these things from running simultaneously.  But, once this

problem happens, the USB driver can't even be unloaded--one must reboot 
the system to get the USB subsystem working again!

I'm not looking to re-architect the drivers--I just want to do something
to keep the system from failing when this happens.

Perhaps I could keep the proc_ioctls from running until the device isn't
being used by usb-storage, or maybe just make the transport layer of
usb-storage retry the clear_feature command for a while if the pipe
already has an URB going, instead of simply failing and moving on to
the next step without clearing the stall...?

Stuart


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs are n't started right in E HCI driver (2.4 2.6)

2004-01-28 Thread Stuart_Hayes

That's what I was missing:  the queue advance is aborted if the QTD's active
bit isn't set.
Thanks again!
Stuart


-Original Message-
From: David Brownell [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 5:56 PM
To: Hayes, Stuart
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs
are n't started right in E HCI driver (2.4  2.6)


[EMAIL PROTECTED] wrote:
 One question on that, though.  With the fix that was put into 2.4.22, the
qh
 is now initialized as live by qh_make (i.e., the halt bit isn't set).
Then,
 in intr_submit, the qh is scheduled with no QTDs, and the QTDs are added
to

Or as the comment says, scheduling errors are forced (in the rare case
the QH isn't yet scheduled, AND there's no space); periodic qTDs are
always added to a QH that's live (which can't fail).


 the live qh.  When the QTDs are inserted by qh_append_tds, it sets the QTD
 token to have the halt bit set, moves the QTD into position, and, finally,
 puts the correct token into the QTD.

That is, the old dummy is overwritten by the first QTD in the list,
and the original first QTD becomes the new dummy.

Why?  Because the dummy is already in position (but not ACTIVE),
and trying to substitute another QTD there gets racey.


 It seems like there would be a race condition in qh_append_tds between the
 software and host controller--if the host controller looks at this qh
after
 the QTD is added to the qh, but before the real token is copied into the
QTD
 (which is pretty much the last thing that qh_append_tds does), the host
 controller will copy the incomplete QTD into the overlay area, see that
the
 active bit isn't set (because it copied the token that just has the halt
bit
 set), and go on to the next qh.  The next time around, the host controller
 will look at the overlay, see the halt bit is set, and move on to the next
 qh, and the complete QTD will never be seen.

See section 4.10.2 of the EHCI spec, which says the controller won't
overlay any QTD unless its ACTIVE bit is set.

On the other hand ... I believe some EHCI 0.95 controllers don't always
do that.  I'm pretty sure I've seen both NEC and Philips 0.95 controllers
overlay the (inactive) dummy; see comments at the end of qh_completions().
That type of failure was pretty uncommon, at least in the system I was
using at that time and using the tests I had then, and I think it's got
a complete-enough workaround.



 Granted, this isn't very likely to happen, but it seems possible.  It
seems
 like there would be two ways to avoid this possibility:  One way would be
to
 leave the halt bit set in the overlay until all of the QTDs are added to
the
 qh (this is what I had proposed).  The second would be to change

And that's in fact similar to what the original driver version was doing;
as bugs got fixed, and deep queues got to behaving, the code evolved to the
current simpler approach.


 qh_append_tds to put a 0 (instead of a halt) in the QTD's token until it
is
 done adding the QTD--in this way, if the HC *did* copy the incomplete QTD
 into the overlay area, it would see the that the active bit isn't set and
 move on to the next QH, but the next time around it would see that neither
 halt nor active were set in the overlay, and it would go fetch the QTD
 again.
 
 Does this seem reasonable, or am I missing something?

I remember thinking leaving HALT in qh_append_tds() should pretty much
guarantee that qh_completions() would _always_ patch up after up that
EHCI 0.95 bug at the next IRQ (or watchdog).

Also, think of the other race:  the incomplete QTD would have hw_next
pointing to some rather irrelevant location in memory ... but that's
what the HC would fetch and try to use!  So it'd die after a DMA fault,
in at least some cases.

- Dave



 
 Thanks
 Stuart
 
 
 



---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


[linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs aren't started right in E HCI driver (2.4 2.6)

2004-01-27 Thread Stuart_Hayes

David et al--

I've been working on a problem where devices plugged into a USB 2.0 hub
(which is plugged into an EHCI controller) aren't seen until there is some
other activity in the USB subsystem.  To reproduce this, I booted up a
system with a USB2.0 hub connected to my EHCI controller, and, after the
system was up, I tried plugging a device into the hub.  Linux did not see
the device.  If I then plugged a device directly into the EHCI controller,
Linux would see both of the devices that I plugged in.

I've found, in the EHCI driver, that queue heads set up for an interrupt
pipe (such as the one set up to check for connection changes on the USB2.0
hub) are set up correctly, but the halt (QTD_STS_HALT) bit in the hw_token
in the overlay area is never cleared after the QH and QTDs are set up and
scheduled.  With this halt bit set, the HC will ignore the QH, so it doesn't
get executed.  If the hw_token in the overlay is cleared to 0, the HC will
advance the queue (copy the first QTD into the overlay area) and do the
transaction.

The bit eventually gets cleared if there is other activity on the
bus--scan_periodic will see the halt bit set, and, among other things, clear
the bit.  However, if there is no other activity in the EHCI controller,
scan_periodic never gets called.

(For async transactions, this bit is cleared in qh_link_async after the QH
is set up.)

Attached is a patch.  It just modifies intr_submit (in ehci-sched.c) to
clear qh-hw_token to 0 after all of the TDs are added to the QH.

I believe this patch is applicable to both the 2.4 and 2.6 kernels.

Thanks
Stuart Hayes
[EMAIL PROTECTED]




--- ehci-sched.c.orig   2004-01-26 16:10:14.0 -0500
+++ ehci-sched.c2004-01-26 14:02:45.0 -0500
@@ -483,6 +483,10 @@
/* ... update usbfs periodic stats */
hcd_to_bus (ehci-hcd)-bandwidth_int_reqs++;
 
+   /* clear token in overlay so HC will advance queue and */
+   /* execute QH (QTD_STS_HALT is set when QH is created) */
+   qh-hw_token = 0;
+
 done:
spin_unlock_irqrestore (ehci-lock, flags);
if (status)



---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs are n't started right in E HCI driver (2.4 2.6)

2004-01-27 Thread Stuart_Hayes

Right, thanks.  Sorry I didn't see that--it was fixed it in a different way
than I was looking for.  I didn't actually test the latest kernels, but just
looked at the source.

Stuart


-Original Message-
From: David Brownell [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 2:27 PM
To: Hayes, Stuart
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs
aren't started right in E HCI driver (2.4  2.6)


[EMAIL PROTECTED] wrote:
 
 I've found, in the EHCI driver, that queue heads set up for an interrupt
 pipe (such as the one set up to check for connection changes on the USB2.0
 hub) are set up correctly, but the halt (QTD_STS_HALT) bit in the
hw_token
 in the overlay area is never cleared after the QH and QTDs are set up and

On 2.4, that doesn't happen since about 20-June-2003; and on 2.6,
just a few days earlier (BK describes it as a micro-patch).


 scheduled.  With this halt bit set, the HC will ignore the QH, so it
doesn't
 get executed.  If the hw_token in the overlay is cleared to 0, the HC will
 advance the queue (copy the first QTD into the overlay area) and do the
 transaction.

Sounds to me like your base kernel is missing EHCI patches I'd call
essential.  More current kernels should be fine.

- Dave




---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel


RE: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs are n't started right in E HCI driver (2.4 2.6)

2004-01-27 Thread Stuart_Hayes

One question on that, though.  With the fix that was put into 2.4.22, the qh
is now initialized as live by qh_make (i.e., the halt bit isn't set).  Then,
in intr_submit, the qh is scheduled with no QTDs, and the QTDs are added to
the live qh.  When the QTDs are inserted by qh_append_tds, it sets the QTD
token to have the halt bit set, moves the QTD into position, and, finally,
puts the correct token into the QTD.

It seems like there would be a race condition in qh_append_tds between the
software and host controller--if the host controller looks at this qh after
the QTD is added to the qh, but before the real token is copied into the QTD
(which is pretty much the last thing that qh_append_tds does), the host
controller will copy the incomplete QTD into the overlay area, see that the
active bit isn't set (because it copied the token that just has the halt bit
set), and go on to the next qh.  The next time around, the host controller
will look at the overlay, see the halt bit is set, and move on to the next
qh, and the complete QTD will never be seen.

Granted, this isn't very likely to happen, but it seems possible.  It seems
like there would be two ways to avoid this possibility:  One way would be to
leave the halt bit set in the overlay until all of the QTDs are added to the
qh (this is what I had proposed).  The second would be to change
qh_append_tds to put a 0 (instead of a halt) in the QTD's token until it is
done adding the QTD--in this way, if the HC *did* copy the incomplete QTD
into the overlay area, it would see the that the active bit isn't set and
move on to the next QH, but the next time around it would see that neither
halt nor active were set in the overlay, and it would go fetch the QTD
again.

Does this seem reasonable, or am I missing something?

Thanks
Stuart



-Original Message-
From: Hayes, Stuart 
Sent: Tuesday, January 27, 2004 3:05 PM
To: 'David Brownell'
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs
aren't started right in E HCI driver (2.4  2.6)



Right, thanks.  Sorry I didn't see that--it was fixed it in a different way
than I was looking for.  I didn't actually test the latest kernels, but just
looked at the source.

Stuart


-Original Message-
From: David Brownell [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 27, 2004 2:27 PM
To: Hayes, Stuart
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [linux-usb-devel] PATCH for ehci-sched.c -- interrupt QHs
aren't started right in E HCI driver (2.4  2.6)


[EMAIL PROTECTED] wrote:
 
 I've found, in the EHCI driver, that queue heads set up for an interrupt
 pipe (such as the one set up to check for connection changes on the USB2.0
 hub) are set up correctly, but the halt (QTD_STS_HALT) bit in the
hw_token
 in the overlay area is never cleared after the QH and QTDs are set up and

On 2.4, that doesn't happen since about 20-June-2003; and on 2.6,
just a few days earlier (BK describes it as a micro-patch).


 scheduled.  With this halt bit set, the HC will ignore the QH, so it
doesn't
 get executed.  If the hw_token in the overlay is cleared to 0, the HC will
 advance the queue (copy the first QTD into the overlay area) and do the
 transaction.

Sounds to me like your base kernel is missing EHCI patches I'd call
essential.  More current kernels should be fine.

- Dave




---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel