Stuart, thanks for the feedback. I'm mindful of the timeouts related to low bitrates and buffers potentially timing out before they're full - although thanks for pointing that out. What usually happens is that the buffer times out and nothing happens in the analyzer for a while, and then a short burst of reads work reliably and are marked as overrun. Suggesting we aren't servicing the usb device fast enough - when in reality I have multiple read requests posted.
What I don't see on the analyzer is any traffic when the timeout occur, no partially filled urbs (thats easy to spot), and I don't think I see a posted read either. Its as if the kernel has no queued read requests, even though I have N pending. In general, my driver model has a three URB lists, all mutex protected (all sanity checked), one for submitted urbs (readpipe), one for free unused urbs (ready for re-scheulding), the other for completed urbs pending dequeue. The urbs move between the free -> submitted -> completed -> free, lists are the state machine runs. ALl pretty standard stuff. The total number of urbs is defined statically and is typically 8. During testing, each urb completes usually twice, meaning the devices starts for a couple of seconds, payload is working reliably, then and I end up with a stall. No protocol miss-handing in the analyzer, that's also easy to spot. All urbs (ReadPipeAsyncTo) calls are on the busy/submitted list, no activity on the analyzer, none of those urbs returned an error during submissions (timeout is 500ms), they eventually all timeout. I resubmit completed urbs that contain and error, and usually (eventually) get data from the usb device and an overrun indicator. Its as if the kernel has a new race condition (or slightly different timing in 10.10 vs 10.9) related to ReadPipeAsyncTo, and either silently discards urbs without notifying the callback, and I'm left with a driver model that's in the right state - but the kernel has nothing to queue. My original assertion was power related due to running over budget, this was ruled out. If I reduce the maximum urbs to 1, everything runs perfectly, if I increase the maximum number of urbs past 8, into crazy land, the problem happens much sooner, almost no data is received before the issue occurs. If I increase the maxiumum number of urbs to 512, then call abort on the pipe (figure I'll check that the kernel hasn't lost track of a request), only 17 or so complete, the rest appear to be lost. The behavior feels like ReadPipeAsyncTo() (and its underlying implementation) is now racey, and that submitted a read request during some critical time, results in miss-behaviour and no reads being posted to the physical bus. In an earlier email I questioned whether my assertion that it was perfectly valid to post multiple ReadPipeAsyncTO calls was valid, I'm still not sure if this is truly the case..... Even though in the USB spec, multiple bulk transfer calls can easily be placed with a full expectation of expected behavior. My assuming is ReadPipeAsyncTo is a wrapper around that. Maybe its changed recently. I did look at the IOKit implementation of this call and it quickly buries down into a IOn call which I assume ends up in a general USB IOKit user_client framework done by Apple, or directly into the kernel itself. (Incidentally: I've had someone contact me off list with a libusb issue, relate to and issue where not all of his urbs are completing on error, some are being lost. Perhaps the same issue, or perhaps a simple list management problem in his driver). Grr. - Steve -- Steven Toth - Kernel Labs http://www.kernellabs.com On Sun, Feb 1, 2015 at 1:58 PM, Stuart Smith <[email protected]> wrote: > Steve > I'm not sure what is going on with aborted calls that don't return with an > "aborted" error, but instead disappear. That happened to me some years ago > with queued isoch calls, but it was due to a bug long since fixed, and > there was a reasonable workaround. But I've never seen it happen to bulk > calls. > You say you're seeing timeouts from the controller driver, but the calls > are not timing out on the bus. This might happen if you have very large > buffers and a rather small amount of data coming from the device (say you > size all your buffers for HD but you're capturing interlaced SD). > You can dynamically resize the buffers and the number of them depending on > the expected data rate. You can also deal gracefully with timeout errors, > which don't necessarily mean that the hardware is not responding. If your > hardware has no data (yet) to deliver, _all_ of your reads may time out, > but that doesn't mean that you have to give up entirely. > Stuart > > > On 1/31/15, 12:10 PM, "Steven Toth" <[email protected]> wrote: > >>Stuart, thanks for the feedback. >> >>I looked at the issue with a fresh pair of eyes this morning and >>indeed, you are partially correct, its not a power issue. ... but >>neither is it a protocol problem. >> >>I'm seeing a very reproducible case with ReadPipeAsyncTo() where, >>issued multiple concurrent calls to this creates issues under OSX >>10.10, but not 10.9 >> >>struct buf_s { >> unsigned char *ptr; >> int len; /* total size of allocation in ptr */ >> int readlen; /* bytes returned from readpipeasyncto() */ >>/// other buffer stats >>}; >> >>I submit the buf->ptr and buf->len to ReadPipeAsyncTo() and pass the >>buffer struct as the context. A fairly standard thing to do. My USB >>interface is in the run loop so I get callbacks and timeouts as >>expected.... Except that I've previously 'submitted' 8-16 of these >>readPipeAsyncTO() calls concurrently (much like any driver would do >>for usb bulk transfers, queue up a few). >> >>I'm finding that after a small number of completions, the callbacks >>only timeout (wire protocol to the hardware is perfect). Adjusting the >>number of concurrent ReadPipeAsyncTo() calls varies the failure rate >>dramatically. >> >>I've always had an assumption that calls to ReadPipeAsyncTO() were >>queued by iokit or the kernel, as a thin wrapper around a more >>standard usb_bulk_transfer() type implementation. I'm starting to >>doubt that now, or doubt thats how its intended to work in 10.10. >> >>Also, interestingly, assuming I queue a large number of these (all >>calls return success) and immediately abort the pipe, only a small >>handful of those are returned to the completion handler, the rest >>'disappear'. The also feels new and unexpected. >> >>Something's going on inside ReadPipeAsyncTo() that's new to 10.10. Grrr. >> >>Thanks again for your earlier comments. >> >>- Steve >> >>-- >>Steven Toth - Kernel Labs >>http://www.kernellabs.com >> >>On Fri, Jan 30, 2015 at 6:23 PM, Stuart Smith <[email protected]> wrote: >>> I don't think you're running into a power issue. If you consume more >>> current than the port is able to deliver, the hardware current-limits >>>and >>> this is reported at a very low level to the OS - you'll see a "This >>>device >>> is drawing too much power" notification, and the port won't work at all >>> until the offender is removed. >>> You could also monitor the power supply to the device - if it stays >>>above >>> 4.75V, you should be fine (it will probably work well below that, but >>> AFAIR the USB spec limit is down to 4.75V at the device power pins). >>> You could also run the device from a USB hub which you know can provide >>> more than 500mA per port (i.e. almost any powered USB hub). >>> >>> Although the USB 2.0 spec says that a USB device shouldn't consume more >>> than 500mA, USB 3.0 devices are allowed to take up to 900mA and many >>>Apple >>> devices negotiate much more. The USB ports usually have a fixed current >>> limit. >>> >>> I think that you probably need to look closer at the analyzer trace - >>> something before the timeout caused your device to hang. Are you sure >>>that >>> your device is enumerated as a High-Speed device? >>> >>> hth, Stuart >>> >>> >>> On 1/30/15, 12:00 PM, "[email protected]" >>> <[email protected]> wrote: >>> >>>>Message: 1 >>>>Date: Thu, 29 Jan 2015 15:48:08 -0500 >>>>From: Steven Toth <[email protected]> >>>>To: [email protected] >>>>Subject: USB power budget - New issues with 10.10 and/or new iMacs? >>>>Message-ID: >>>> >>>><CALzAhNWeh3_Zh0vmSJgN=K_2OO0ZfbT_ae7q2OMrHF-cBSJR=w...@mail.gmail.com> >>>>Content-Type: text/plain; charset=UTF-8 >>>> >>>>Hey folks, >>>> >>>>I'd welcome some feedback on this, before we're forced to withdraw our >>>>software product from general sale. Yes, today is a bad day. :( >>>> >>>>We produce a retail s/w application that provides support for a USB HD >>>>H.264 video compressor device. It works well on OSX 10.7/8/9 on >>>>multiple systems including older Mac Pros, MBP's, MBA's etc. >>>> >>>>Its not working well on all the 10.10 based Macs we have, namely a >>>>iMac 5K and a MBP 13" retina, both (probably) using usb3 controllers, >>>>older machines above are probably USB2 controllers. We have customers >>>>in the field reporting the same issue "Used to work great, upgraded to >>>>10.10 now it hangs". >>>> >>>>The USB2.0 device we're controlling has always ran (overbudget) at >>>>around 560ma during peak use, idling around 420ma. (Same power >>>>measurements under windows also). We have no issues with the device >>>>when its running around 420ma on 10.10, although the video compressor >>>>is not running at this point, we're doing basic status calls. >>>> >>>>The behavior we see under 10.10 is that when the device starts to >>>>compress video, and the power starts to peak, climbing to 530ma and >>>>potentially beyond, we start to see our urbs timing out, the device >>>>stops responsing to AsyncBulkAsync reads. Rarely does an urb complete >>>>without error, and if it does it's marked as overrun. The important >>>>point to note is that the device never gets to 560ma. >>>> >>>>I've noticed on the Macs running 10.10 that the current never seems to >>>>go beyond 530, suggesting some kind of operating system USB current >>>>limit, or physical USB3 port current limit that doesn't occur on >>>>slightly older systems (or on 10.9). >>>> >>>>Looking at the usb analyzer we see no protocol issues, only timeouts >>>>waiting for posted urbs to be filled. No resets, not failed controll >>>>transfers, no visible errors other than timeouts. >>>> >>>>I should point out that the application works very well with other USB >>>>Capture devices on 10.10, all of which run at less than 500ma, I'm >>>>confident the application is fine. >>>> >>>>Are their any known differences between 10.9 and 10.10 with regards to >>>>allowable current that can be drawn from either a USB2 or USB3 port? I >>>>realize the device runs overbudget, but is the OS (or USB controllers) >>>>starting to enforce 500ma limits - that we're only just seeing? >>>> >>>>Many thanks, >>>> >>>>- Steve >>>> >>>>-- >>>>Steven Toth - Kernel Labs >>>>http://www.kernellabs.com >>> >>> >>> >> >> > > > _______________________________________________ Do not post admin requests to the list. They will be ignored. Usb mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/usb/archive%40mail-archive.com This email sent to [email protected]
