Stuart, We don't use realtime threads but we do submit larger reads, so that the kernel gets to do more per submit, if/when our process is being heavily context switched in a stressed environment. I will look into realtime threads though, interesting.
Thanks again for your support. - Steve -- Steven Toth - Kernel Labs http://www.kernellabs.com +1.646.355.8490 On Mon, Feb 2, 2015 at 7:41 PM, Stuart Smith <[email protected]> wrote: > Steve, > we run our USB requests in a real-time thread. Something did change in > 10.10 regarding process and thread scheduling, perhaps that has bitten you? > Our buffers are generally about 32KB in size, so we have between 256KB and > 512KB of outstanding reads. We're usually asking for a round number of > pages (i.e. an integer multiple of 4K, which is also an integer multiple > of the pipe's maxPacketSize). We de-block into chunks of integral numbers > of TS packets on receipt. > > Stuart > > On 2/2/15, 3:20 PM, "Steven Toth" <[email protected]> wrote: > >>Stuart, thanks again for your feedback. >> >>comments line. >> >>On Mon, Feb 2, 2015 at 3:02 PM, Stuart Smith <[email protected]> wrote: >>> Steve, >>> a couple of things. >>> First is, where are you specifying a timeout of just 500ms, and for what >>> call? Although the completion and no-data timeout parameters to >>> ReadPipeAsyncTO are in milliseconds, the granularity of timeout handling >>> is 1 second, so there's nothing gained by specifying a timeout less than >>> one second. >> >>500ms for NoDataTimeout and CompleteionTimeout. >> >>Understood. I tried 1500ms but the effect didn't change, that being >>said - some interesting news below. >> >>> >>> I don't think you should ever see overrun errors for bulk transactions. >>>An >>> overrun doesn't mean that your device has more data to deliver, it means >>> that you supplied a buffer smaller than, or not an integer multiple of >>>the >>> endpoint size. Are your buffers page-aligned? >> >>They're not page aligned. I've routinely passed buffers to usb >>controllers that are less than the maxpacketsize, the usb controller >>typically breaks up the read request length into multiple transaction >>of (pipe) MaxPacketSize, as witnessed in the bus analyzer. In my >>particular case, its pretty common to see 440 bytes in the last packet >>of a very large transaction for example. Pretty common to have buffer >>sizes as a multiple of 188 for example. Transfers to/from USB devices >>don't need to be exact multiples. >> >>> >>> We routinely queue up 8 to 16 bulk requests (generally calculated to >>>keep >>> the controller buy for about a second), and have no problems on 10.10. I >>> suspect that the behavior of ReadPipeAsyncTO has not changed. Your >>>device >>> is misbehaving immediately after initialization - perhaps it is being >>> initialized differently? I know this isn't much help, it is a bit of a >>> head-scratcher. >> >>I too generally size for 8-16 pending urbs at a typical user expected >>app defaults. So lower bitrates fill and complete more slowly, higher >>bitrates complete more quickly. I don't dynamically adjust the sizes >>based on bitrate as I don't have to meet any latency/delivery >>requirements for downstream apps. >> >>This drivers is configured for 8 large urbs, giving around 2.5 seconds >>of latency. >> >>Here's the interesting news. I've isolated the issue to the size of >>the ReadPipeAsyncTO() buffer length requested. >> >>My buffers were sized as a multiple of 188bytes (x 3120 packets, 188 * >>3120 bytes ), that's a large transfer but when running at 20Mbps >>that's only 4-5 URBs per second worst case, very light lifting in a >>worse case scenario. This works fine in 10.9 (along with timeouts of >>500ms). >> >>In 10.10, this leads to catastrophic failure. Time-outs, underruns and >>all sorts of trouble. I went back and resized the buffers at different >>sizes and see odd behaviors depending on whether Im using multiples of >>MaxPacketSize or 188. I went small to very large. >> >>Generally, anything over 250KB is now completely non-functional for me >>on 10.10 (10.9 it was fine). >> >>I've reduced the buffering to 1394 * 188 and its running well (still >>using 500ms timeouts) >> >>Just to clarify, I checked out the production code, make a one line >>change to reduce the buffer sizes, recompiled and the problem goes >>away. In sort, I can life with smaller buffers as they're still >>reasonably sized for my needs. >> >>Buffers sizes that worked in 10.9 should have worked in 10.10. >> >>Side note: I will bump up the timeout from 500 to 1500ms as it feels >>like a good thing to do, if the minimum practical value is 1000. >> >>> >>> We assign our buffers an index so we can keep track of them. We don't >>>lose >>> any under various circumstances (unplugging devices, calling Abort on >>>the >>> pipe, timeouts because the device has hung). >> >>Yes, I also assign a unique int/id to each buffer, it simplifies and >>eases debug when I have items moving between lists. >> >>> >>> What version of the IOUSBInterfaceInterface and IOUSBDeviceInterface are >>> you talking to? >> >>245. I did switch to 500 when testing some Request/ReturnExtraPower() >>calls that I initially started investigating, but that turned out to >>be a red herring. >> >>Stuart, once again, thank you for your time (and patience). >> >>- Steve >> >>> >>> Stuart >>> >>> On 2/2/15, 5:48 AM, "Steven Toth" <[email protected]> wrote: >>> >>>>Stuart, thanks for the feedback. >>>> >>>>I'm mindful of the timeouts related to low bitrates and buffers >>>>potentially timing out before they're full - although thanks for >>>>pointing that out. What usually happens is that the buffer times out >>>>and nothing happens in the analyzer for a while, and then a short >>>>burst of reads work reliably and are marked as overrun. Suggesting we >>>>aren't servicing the usb device fast enough - when in reality I have >>>>multiple read requests posted. >>>> >>>>What I don't see on the analyzer is any traffic when the timeout >>>>occur, no partially filled urbs (thats easy to spot), and I don't >>>>think I see a posted read either. Its as if the kernel has no queued >>>>read requests, even though I have N pending. >>>> >>>>In general, my driver model has a three URB lists, all mutex protected >>>>(all sanity checked), one for submitted urbs (readpipe), one for free >>>>unused urbs (ready for re-scheulding), the other for completed urbs >>>>pending dequeue. The urbs move between the free -> submitted -> >>>>completed -> free, lists are the state machine runs. ALl pretty >>>>standard stuff. >>>> >>>>The total number of urbs is defined statically and is typically 8. >>>>During testing, each urb completes usually twice, meaning the devices >>>>starts for a couple of seconds, payload is working reliably, then and >>>>I end up with a stall. No protocol miss-handing in the analyzer, >>>>that's also easy to spot. All urbs (ReadPipeAsyncTo) calls are on the >>>>busy/submitted list, no activity on the analyzer, none of those urbs >>>>returned an error during submissions (timeout is 500ms), they >>>>eventually all timeout. I resubmit completed urbs that contain and >>>>error, and usually (eventually) get data from the usb device and an >>>>overrun indicator. >>>> >>>>Its as if the kernel has a new race condition (or slightly different >>>>timing in 10.10 vs 10.9) related to ReadPipeAsyncTo, and either >>>>silently discards urbs without notifying the callback, and I'm left >>>>with a driver model that's in the right state - but the kernel has >>>>nothing to queue. >>>> >>>>My original assertion was power related due to running over budget, >>>>this was ruled out. >>>> >>>>If I reduce the maximum urbs to 1, everything runs perfectly, if I >>>>increase the maximum number of urbs past 8, into crazy land, the >>>>problem happens much sooner, almost no data is received before the >>>>issue occurs. If I increase the maxiumum number of urbs to 512, then >>>>call abort on the pipe (figure I'll check that the kernel hasn't lost >>>>track of a request), only 17 or so complete, the rest appear to be >>>>lost. >>>> >>>>The behavior feels like ReadPipeAsyncTo() (and its underlying >>>>implementation) is now racey, and that submitted a read request during >>>>some critical time, results in miss-behaviour and no reads being >>>>posted to the physical bus. >>>> >>>>In an earlier email I questioned whether my assertion that it was >>>>perfectly valid to post multiple ReadPipeAsyncTO calls was valid, I'm >>>>still not sure if this is truly the case..... Even though in the USB >>>>spec, multiple bulk transfer calls can easily be placed with a full >>>>expectation of expected behavior. My assuming is ReadPipeAsyncTo is a >>>>wrapper around that. Maybe its changed recently. I did look at the >>>>IOKit implementation of this call and it quickly buries down into a >>>>IOn call which I assume ends up in a general USB IOKit user_client >>>>framework done by Apple, or directly into the kernel itself. >>>> >>>>(Incidentally: I've had someone contact me off list with a libusb >>>>issue, relate to and issue where not all of his urbs are completing on >>>>error, some are being lost. Perhaps the same issue, or perhaps a >>>>simple list management problem in his driver). >>>> >>>>Grr. >>>> >>>>- Steve >>>> >>>>-- >>>>Steven Toth - Kernel Labs >>>>http://www.kernellabs.com >>>> >>>> >>>>On Sun, Feb 1, 2015 at 1:58 PM, Stuart Smith <[email protected]> wrote: >>>>> Steve >>>>> I'm not sure what is going on with aborted calls that don't return >>>>>with >>>>>an >>>>> "aborted" error, but instead disappear. That happened to me some years >>>>>ago >>>>> with queued isoch calls, but it was due to a bug long since fixed, and >>>>> there was a reasonable workaround. But I've never seen it happen to >>>>>bulk >>>>> calls. >>>>> You say you're seeing timeouts from the controller driver, but the >>>>>calls >>>>> are not timing out on the bus. This might happen if you have very >>>>>large >>>>> buffers and a rather small amount of data coming from the device (say >>>>>you >>>>> size all your buffers for HD but you're capturing interlaced SD). >>>>> You can dynamically resize the buffers and the number of them >>>>>depending >>>>>on >>>>> the expected data rate. You can also deal gracefully with timeout >>>>>errors, >>>>> which don't necessarily mean that the hardware is not responding. If >>>>>your >>>>> hardware has no data (yet) to deliver, _all_ of your reads may time >>>>>out, >>>>> but that doesn't mean that you have to give up entirely. >>>>> Stuart >>>>> >>>>> >>>>> On 1/31/15, 12:10 PM, "Steven Toth" <[email protected]> wrote: >>>>> >>>>>>Stuart, thanks for the feedback. >>>>>> >>>>>>I looked at the issue with a fresh pair of eyes this morning and >>>>>>indeed, you are partially correct, its not a power issue. ... but >>>>>>neither is it a protocol problem. >>>>>> >>>>>>I'm seeing a very reproducible case with ReadPipeAsyncTo() where, >>>>>>issued multiple concurrent calls to this creates issues under OSX >>>>>>10.10, but not 10.9 >>>>>> >>>>>>struct buf_s { >>>>>> unsigned char *ptr; >>>>>> int len; /* total size of allocation in ptr */ >>>>>> int readlen; /* bytes returned from readpipeasyncto() */ >>>>>>/// other buffer stats >>>>>>}; >>>>>> >>>>>>I submit the buf->ptr and buf->len to ReadPipeAsyncTo() and pass the >>>>>>buffer struct as the context. A fairly standard thing to do. My USB >>>>>>interface is in the run loop so I get callbacks and timeouts as >>>>>>expected.... Except that I've previously 'submitted' 8-16 of these >>>>>>readPipeAsyncTO() calls concurrently (much like any driver would do >>>>>>for usb bulk transfers, queue up a few). >>>>>> >>>>>>I'm finding that after a small number of completions, the callbacks >>>>>>only timeout (wire protocol to the hardware is perfect). Adjusting the >>>>>>number of concurrent ReadPipeAsyncTo() calls varies the failure rate >>>>>>dramatically. >>>>>> >>>>>>I've always had an assumption that calls to ReadPipeAsyncTO() were >>>>>>queued by iokit or the kernel, as a thin wrapper around a more >>>>>>standard usb_bulk_transfer() type implementation. I'm starting to >>>>>>doubt that now, or doubt thats how its intended to work in 10.10. >>>>>> >>>>>>Also, interestingly, assuming I queue a large number of these (all >>>>>>calls return success) and immediately abort the pipe, only a small >>>>>>handful of those are returned to the completion handler, the rest >>>>>>'disappear'. The also feels new and unexpected. >>>>>> >>>>>>Something's going on inside ReadPipeAsyncTo() that's new to 10.10. >>>>>>Grrr. >>>>>> >>>>>>Thanks again for your earlier comments. >>>>>> >>>>>>- Steve >>>>>> >>>>>>-- >>>>>>Steven Toth - Kernel Labs >>>>>>http://www.kernellabs.com >>>>>> >>>>>>On Fri, Jan 30, 2015 at 6:23 PM, Stuart Smith <[email protected]> >>>>>>wrote: >>>>>>> I don't think you're running into a power issue. If you consume more >>>>>>> current than the port is able to deliver, the hardware >>>>>>>current-limits >>>>>>>and >>>>>>> this is reported at a very low level to the OS - you'll see a "This >>>>>>>device >>>>>>> is drawing too much power" notification, and the port won't work at >>>>>>>all >>>>>>> until the offender is removed. >>>>>>> You could also monitor the power supply to the device - if it stays >>>>>>>above >>>>>>> 4.75V, you should be fine (it will probably work well below that, >>>>>>>but >>>>>>> AFAIR the USB spec limit is down to 4.75V at the device power pins). >>>>>>> You could also run the device from a USB hub which you know can >>>>>>>provide >>>>>>> more than 500mA per port (i.e. almost any powered USB hub). >>>>>>> >>>>>>> Although the USB 2.0 spec says that a USB device shouldn't consume >>>>>>>more >>>>>>> than 500mA, USB 3.0 devices are allowed to take up to 900mA and many >>>>>>>Apple >>>>>>> devices negotiate much more. The USB ports usually have a fixed >>>>>>>current >>>>>>> limit. >>>>>>> >>>>>>> I think that you probably need to look closer at the analyzer trace >>>>>>>- >>>>>>> something before the timeout caused your device to hang. Are you >>>>>>>sure >>>>>>>that >>>>>>> your device is enumerated as a High-Speed device? >>>>>>> >>>>>>> hth, Stuart >>>>>>> >>>>>>> >>>>>>> On 1/30/15, 12:00 PM, "[email protected]" >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>>>Message: 1 >>>>>>>>Date: Thu, 29 Jan 2015 15:48:08 -0500 >>>>>>>>From: Steven Toth <[email protected]> >>>>>>>>To: [email protected] >>>>>>>>Subject: USB power budget - New issues with 10.10 and/or new iMacs? >>>>>>>>Message-ID: >>>>>>>> >>>>>>>><CALzAhNWeh3_Zh0vmSJgN=K_2OO0ZfbT_ae7q2OMrHF-cBSJR=w...@mail.gmail.com> >>>>>>>>Content-Type: text/plain; charset=UTF-8 >>>>>>>> >>>>>>>>Hey folks, >>>>>>>> >>>>>>>>I'd welcome some feedback on this, before we're forced to withdraw >>>>>>>>our >>>>>>>>software product from general sale. Yes, today is a bad day. :( >>>>>>>> >>>>>>>>We produce a retail s/w application that provides support for a USB >>>>>>>>HD >>>>>>>>H.264 video compressor device. It works well on OSX 10.7/8/9 on >>>>>>>>multiple systems including older Mac Pros, MBP's, MBA's etc. >>>>>>>> >>>>>>>>Its not working well on all the 10.10 based Macs we have, namely a >>>>>>>>iMac 5K and a MBP 13" retina, both (probably) using usb3 >>>>>>>>controllers, >>>>>>>>older machines above are probably USB2 controllers. We have >>>>>>>>customers >>>>>>>>in the field reporting the same issue "Used to work great, upgraded >>>>>>>>to >>>>>>>>10.10 now it hangs". >>>>>>>> >>>>>>>>The USB2.0 device we're controlling has always ran (overbudget) at >>>>>>>>around 560ma during peak use, idling around 420ma. (Same power >>>>>>>>measurements under windows also). We have no issues with the device >>>>>>>>when its running around 420ma on 10.10, although the video >>>>>>>>compressor >>>>>>>>is not running at this point, we're doing basic status calls. >>>>>>>> >>>>>>>>The behavior we see under 10.10 is that when the device starts to >>>>>>>>compress video, and the power starts to peak, climbing to 530ma and >>>>>>>>potentially beyond, we start to see our urbs timing out, the device >>>>>>>>stops responsing to AsyncBulkAsync reads. Rarely does an urb >>>>>>>>complete >>>>>>>>without error, and if it does it's marked as overrun. The important >>>>>>>>point to note is that the device never gets to 560ma. >>>>>>>> >>>>>>>>I've noticed on the Macs running 10.10 that the current never seems >>>>>>>>to >>>>>>>>go beyond 530, suggesting some kind of operating system USB current >>>>>>>>limit, or physical USB3 port current limit that doesn't occur on >>>>>>>>slightly older systems (or on 10.9). >>>>>>>> >>>>>>>>Looking at the usb analyzer we see no protocol issues, only timeouts >>>>>>>>waiting for posted urbs to be filled. No resets, not failed controll >>>>>>>>transfers, no visible errors other than timeouts. >>>>>>>> >>>>>>>>I should point out that the application works very well with other >>>>>>>>USB >>>>>>>>Capture devices on 10.10, all of which run at less than 500ma, I'm >>>>>>>>confident the application is fine. >>>>>>>> >>>>>>>>Are their any known differences between 10.9 and 10.10 with regards >>>>>>>>to >>>>>>>>allowable current that can be drawn from either a USB2 or USB3 >>>>>>>>port? I >>>>>>>>realize the device runs overbudget, but is the OS (or USB >>>>>>>>controllers) >>>>>>>>starting to enforce 500ma limits - that we're only just seeing? >>>>>>>> >>>>>>>>Many thanks, >>>>>>>> >>>>>>>>- Steve >>>>>>>> >>>>>>>>-- >>>>>>>>Steven Toth - Kernel Labs >>>>>>>>http://www.kernellabs.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> >> >>-- >>Steven Toth - Kernel Labs >>http://www.kernellabs.com >>+1.646.355.8490 >> >> > > > _______________________________________________ Do not post admin requests to the list. They will be ignored. Usb mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/usb/archive%40mail-archive.com This email sent to [email protected]
