Re: USB power budget - New issues with 10.10 and/or new iMacs?

Steven Toth Tue, 03 Feb 2015 04:29:14 -0800

Stuart,

We don't use realtime threads but we do submit larger reads, so that
the kernel gets to do more per submit, if/when our process is being
heavily context switched in a stressed environment. I will look into
realtime threads though, interesting.


Thanks again for your support.

- Steve

-- 
Steven Toth - Kernel Labs
http://www.kernellabs.com
+1.646.355.8490

On Mon, Feb 2, 2015 at 7:41 PM, Stuart Smith <[email protected]> wrote:
> Steve,
>  we run our USB requests in a real-time thread. Something did change in
> 10.10 regarding process and thread scheduling, perhaps that has bitten you?
> Our buffers are generally about 32KB in size, so we have between 256KB and
> 512KB of outstanding reads. We're usually asking for a round number of
> pages (i.e. an integer multiple of 4K, which is also an integer multiple
> of the pipe's maxPacketSize). We de-block into chunks of integral numbers
> of TS packets on receipt.
>
> Stuart
>
> On 2/2/15, 3:20 PM, "Steven Toth" <[email protected]> wrote:
>
>>Stuart, thanks again for your feedback.
>>
>>comments line.
>>
>>On Mon, Feb 2, 2015 at 3:02 PM, Stuart Smith <[email protected]> wrote:
>>> Steve,
>>>  a couple of things.
>>> First is, where are you specifying a timeout of just 500ms, and for what
>>> call? Although the completion and no-data timeout parameters to
>>> ReadPipeAsyncTO are in milliseconds, the granularity of timeout handling
>>> is 1 second, so there's nothing gained by specifying a timeout less than
>>> one second.
>>
>>500ms for NoDataTimeout and CompleteionTimeout.
>>
>>Understood. I tried 1500ms but the effect didn't change, that being
>>said - some interesting news below.
>>
>>>
>>> I don't think you should ever see overrun errors for bulk transactions.
>>>An
>>> overrun doesn't mean that your device has more data to deliver, it means
>>> that you supplied a buffer smaller than, or not an integer multiple of
>>>the
>>> endpoint size. Are your buffers page-aligned?
>>
>>They're not page aligned. I've routinely passed buffers to usb
>>controllers that are less than the maxpacketsize, the usb controller
>>typically breaks up the read request length into multiple transaction
>>of (pipe) MaxPacketSize, as witnessed in the bus analyzer. In my
>>particular case, its pretty common to see 440 bytes in the last packet
>>of a very large transaction for example. Pretty common to have buffer
>>sizes as a multiple of 188 for example. Transfers to/from USB devices
>>don't need to be exact multiples.
>>
>>>
>>> We routinely queue up 8 to 16 bulk requests (generally calculated to
>>>keep
>>> the controller buy for about a second), and have no problems on 10.10. I
>>> suspect that the behavior of ReadPipeAsyncTO has not changed. Your
>>>device
>>> is misbehaving immediately after initialization - perhaps it is being
>>> initialized differently? I know this isn't much help, it is a bit of a
>>> head-scratcher.
>>
>>I too generally size for 8-16 pending urbs at a typical user expected
>>app defaults. So lower bitrates fill and complete more slowly, higher
>>bitrates complete more quickly. I don't dynamically adjust the sizes
>>based on bitrate as I don't have to meet any latency/delivery
>>requirements for downstream apps.
>>
>>This drivers is configured for 8 large urbs, giving around 2.5 seconds
>>of latency.
>>
>>Here's the interesting news. I've isolated the issue to the size of
>>the ReadPipeAsyncTO() buffer length requested.
>>
>>My buffers were sized as a multiple of 188bytes (x 3120 packets, 188 *
>>3120 bytes ), that's a large transfer but when running at 20Mbps
>>that's only 4-5 URBs per second worst case, very light lifting in a
>>worse case scenario. This works fine in 10.9 (along with timeouts of
>>500ms).
>>
>>In 10.10, this leads to catastrophic failure. Time-outs, underruns and
>>all sorts of trouble. I went back and resized the buffers at different
>>sizes and see odd behaviors depending on whether Im using multiples of
>>MaxPacketSize or 188. I went small to very large.
>>
>>Generally, anything over 250KB is now completely non-functional for me
>>on 10.10 (10.9 it was fine).
>>
>>I've reduced the buffering to 1394 * 188 and its running well (still
>>using 500ms timeouts)
>>
>>Just to clarify, I checked out the production code, make a one line
>>change to reduce the buffer sizes, recompiled and the problem goes
>>away. In sort, I can life with smaller buffers as they're still
>>reasonably sized for my needs.
>>
>>Buffers sizes that worked in 10.9 should have worked in 10.10.
>>
>>Side note: I will bump up the timeout from 500 to 1500ms as it feels
>>like a good thing to do, if the minimum practical value is 1000.
>>
>>>
>>> We assign our buffers an index so we can keep track of them. We don't
>>>lose
>>> any under various circumstances (unplugging devices, calling Abort on
>>>the
>>> pipe, timeouts because the device has hung).
>>
>>Yes, I also assign a unique int/id to each buffer, it simplifies and
>>eases debug when I have items moving between lists.
>>
>>>
>>> What version of the IOUSBInterfaceInterface and IOUSBDeviceInterface are
>>> you talking to?
>>
>>245. I did switch to 500 when testing some Request/ReturnExtraPower()
>>calls that I initially started investigating, but that turned out to
>>be a red herring.
>>
>>Stuart, once again, thank you for your time (and patience).
>>
>>- Steve
>>
>>>
>>> Stuart
>>>
>>> On 2/2/15, 5:48 AM, "Steven Toth" <[email protected]> wrote:
>>>
>>>>Stuart, thanks for the feedback.
>>>>
>>>>I'm mindful of the timeouts related to low bitrates and buffers
>>>>potentially timing out before they're full - although thanks for
>>>>pointing that out. What usually happens is that the buffer times out
>>>>and nothing happens in the analyzer for a while, and then a short
>>>>burst of reads work reliably and are marked as overrun. Suggesting we
>>>>aren't servicing the usb device fast enough - when in reality I have
>>>>multiple read requests posted.
>>>>
>>>>What I don't see on the analyzer is any traffic when the timeout
>>>>occur, no partially filled urbs (thats easy to spot), and I don't
>>>>think I see a posted read either. Its as if the kernel has no queued
>>>>read requests, even though I have N pending.
>>>>
>>>>In general, my driver model has a three URB lists, all mutex protected
>>>>(all sanity checked), one for submitted urbs (readpipe), one for free
>>>>unused urbs (ready for re-scheulding), the other for completed urbs
>>>>pending dequeue. The urbs move between the free -> submitted ->
>>>>completed -> free, lists are the state machine runs. ALl pretty
>>>>standard stuff.
>>>>
>>>>The total number of urbs is defined statically and is typically 8.
>>>>During testing, each urb completes usually twice, meaning the devices
>>>>starts for a couple of seconds, payload is working reliably, then and
>>>>I end up with a stall. No protocol miss-handing in the analyzer,
>>>>that's also easy to spot. All urbs (ReadPipeAsyncTo) calls are on the
>>>>busy/submitted list, no activity on the analyzer, none of those urbs
>>>>returned an error during submissions (timeout is 500ms), they
>>>>eventually all timeout. I resubmit completed urbs that contain and
>>>>error, and usually (eventually) get data from the usb device and an
>>>>overrun indicator.
>>>>
>>>>Its as if the kernel has a new race condition (or slightly different
>>>>timing in 10.10 vs 10.9) related to ReadPipeAsyncTo, and either
>>>>silently discards urbs without notifying the callback, and I'm left
>>>>with a driver model that's in the right state - but the kernel has
>>>>nothing to queue.
>>>>
>>>>My original assertion was power related due to running over budget,
>>>>this was ruled out.
>>>>
>>>>If I reduce the maximum urbs to 1, everything runs perfectly, if I
>>>>increase the maximum number of urbs past 8, into crazy land, the
>>>>problem happens much sooner, almost no data is received before the
>>>>issue occurs. If I increase the maxiumum number of urbs to 512, then
>>>>call abort on the pipe (figure I'll check that the kernel hasn't lost
>>>>track of a request), only 17 or so complete, the rest appear to be
>>>>lost.
>>>>
>>>>The behavior feels like ReadPipeAsyncTo() (and its underlying
>>>>implementation) is now racey, and that submitted a read request during
>>>>some critical time, results in miss-behaviour and no reads being
>>>>posted to the physical bus.
>>>>
>>>>In an earlier email I questioned whether my assertion that it was
>>>>perfectly valid to post multiple ReadPipeAsyncTO calls was valid, I'm
>>>>still not sure if this is truly the case..... Even though in the USB
>>>>spec, multiple bulk transfer calls can easily be placed with a full
>>>>expectation of expected behavior. My assuming is ReadPipeAsyncTo is a
>>>>wrapper around that. Maybe its changed recently. I did look at the
>>>>IOKit implementation of this call and it quickly buries down into a
>>>>IOn call which I assume ends up in a general USB IOKit user_client
>>>>framework done by Apple, or directly into the kernel itself.
>>>>
>>>>(Incidentally: I've had someone contact me off list with a libusb
>>>>issue, relate to and issue where not all of his urbs are completing on
>>>>error, some are being lost. Perhaps the same issue, or perhaps a
>>>>simple list management problem in his driver).
>>>>
>>>>Grr.
>>>>
>>>>- Steve
>>>>
>>>>--
>>>>Steven Toth - Kernel Labs
>>>>http://www.kernellabs.com
>>>>
>>>>
>>>>On Sun, Feb 1, 2015 at 1:58 PM, Stuart Smith <[email protected]> wrote:
>>>>> Steve
>>>>> I'm not sure what is going on with aborted calls that don't return
>>>>>with
>>>>>an
>>>>> "aborted" error, but instead disappear. That happened to me some years
>>>>>ago
>>>>> with queued isoch calls, but it was due to a bug long since fixed, and
>>>>> there was a reasonable workaround. But I've never seen it happen to
>>>>>bulk
>>>>> calls.
>>>>> You say you're seeing timeouts from the controller driver, but the
>>>>>calls
>>>>> are not timing out on the bus. This might happen if you have very
>>>>>large
>>>>> buffers and a rather small amount of data coming from the device (say
>>>>>you
>>>>> size all your buffers for HD but you're capturing interlaced SD).
>>>>> You can dynamically resize the buffers and the number of them
>>>>>depending
>>>>>on
>>>>> the expected data rate. You can also deal gracefully with timeout
>>>>>errors,
>>>>> which don't necessarily mean that the hardware is not responding. If
>>>>>your
>>>>> hardware has no data (yet) to deliver, _all_ of your reads may time
>>>>>out,
>>>>> but that doesn't mean that you have to give up entirely.
>>>>> Stuart
>>>>>
>>>>>
>>>>> On 1/31/15, 12:10 PM, "Steven Toth" <[email protected]> wrote:
>>>>>
>>>>>>Stuart, thanks for the feedback.
>>>>>>
>>>>>>I looked at the issue with a fresh pair of eyes this morning and
>>>>>>indeed, you are partially correct, its not a power issue. ... but
>>>>>>neither is it a protocol problem.
>>>>>>
>>>>>>I'm seeing a very reproducible case with ReadPipeAsyncTo() where,
>>>>>>issued multiple concurrent calls to this creates issues under OSX
>>>>>>10.10, but not 10.9
>>>>>>
>>>>>>struct buf_s {
>>>>>> unsigned char *ptr;
>>>>>> int len; /* total size of allocation in ptr */
>>>>>> int readlen; /* bytes returned from readpipeasyncto() */
>>>>>>/// other buffer stats
>>>>>>};
>>>>>>
>>>>>>I submit the buf->ptr and buf->len to ReadPipeAsyncTo() and pass the
>>>>>>buffer struct as the context. A fairly standard thing to do. My USB
>>>>>>interface is in the run loop so I get callbacks and timeouts as
>>>>>>expected.... Except that I've previously 'submitted' 8-16 of these
>>>>>>readPipeAsyncTO() calls concurrently (much like any driver would do
>>>>>>for usb bulk transfers, queue up a few).
>>>>>>
>>>>>>I'm finding that after a small number of completions, the callbacks
>>>>>>only timeout (wire protocol to the hardware is perfect). Adjusting the
>>>>>>number of concurrent ReadPipeAsyncTo() calls varies the failure rate
>>>>>>dramatically.
>>>>>>
>>>>>>I've always had an assumption that calls to ReadPipeAsyncTO() were
>>>>>>queued by iokit or the kernel, as a thin wrapper around a more
>>>>>>standard usb_bulk_transfer() type implementation. I'm starting to
>>>>>>doubt that now, or doubt thats how its intended to work in 10.10.
>>>>>>
>>>>>>Also, interestingly, assuming I queue a large number of these (all
>>>>>>calls return success) and immediately abort the pipe, only a small
>>>>>>handful of those are returned to the completion handler, the rest
>>>>>>'disappear'. The also feels new and unexpected.
>>>>>>
>>>>>>Something's going on inside ReadPipeAsyncTo() that's new to 10.10.
>>>>>>Grrr.
>>>>>>
>>>>>>Thanks again for your earlier comments.
>>>>>>
>>>>>>- Steve
>>>>>>
>>>>>>--
>>>>>>Steven Toth - Kernel Labs
>>>>>>http://www.kernellabs.com
>>>>>>
>>>>>>On Fri, Jan 30, 2015 at 6:23 PM, Stuart Smith <[email protected]>
>>>>>>wrote:
>>>>>>> I don't think you're running into a power issue. If you consume more
>>>>>>> current than the port is able to deliver, the hardware
>>>>>>>current-limits
>>>>>>>and
>>>>>>> this is reported at a very low level to the OS - you'll see a "This
>>>>>>>device
>>>>>>> is drawing too much power" notification, and the port won't work at
>>>>>>>all
>>>>>>> until the offender is removed.
>>>>>>> You could also monitor the power supply to the device - if it stays
>>>>>>>above
>>>>>>> 4.75V, you should be fine (it will probably work well below that,
>>>>>>>but
>>>>>>> AFAIR the USB spec limit is down to 4.75V at the device power pins).
>>>>>>> You could also run the device from a USB hub which you know can
>>>>>>>provide
>>>>>>> more than 500mA per port (i.e. almost any powered USB hub).
>>>>>>>
>>>>>>> Although the USB 2.0 spec says that a USB device shouldn't consume
>>>>>>>more
>>>>>>> than 500mA, USB 3.0 devices are allowed to take up to 900mA and many
>>>>>>>Apple
>>>>>>> devices negotiate much more. The USB ports usually have a fixed
>>>>>>>current
>>>>>>> limit.
>>>>>>>
>>>>>>> I think that you probably need to look closer at the analyzer trace
>>>>>>>-
>>>>>>> something before the timeout caused your device to hang. Are you
>>>>>>>sure
>>>>>>>that
>>>>>>> your device is enumerated as a High-Speed device?
>>>>>>>
>>>>>>> hth, Stuart
>>>>>>>
>>>>>>>
>>>>>>> On 1/30/15, 12:00 PM, "[email protected]"
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>>Message: 1
>>>>>>>>Date: Thu, 29 Jan 2015 15:48:08 -0500
>>>>>>>>From: Steven Toth <[email protected]>
>>>>>>>>To: [email protected]
>>>>>>>>Subject: USB power budget - New issues with 10.10 and/or new iMacs?
>>>>>>>>Message-ID:
>>>>>>>>
>>>>>>>><CALzAhNWeh3_Zh0vmSJgN=K_2OO0ZfbT_ae7q2OMrHF-cBSJR=w...@mail.gmail.com>
>>>>>>>>Content-Type: text/plain; charset=UTF-8
>>>>>>>>
>>>>>>>>Hey folks,
>>>>>>>>
>>>>>>>>I'd welcome some feedback on this, before we're forced to withdraw
>>>>>>>>our
>>>>>>>>software product from general sale. Yes, today is a bad day. :(
>>>>>>>>
>>>>>>>>We produce a retail s/w application that provides support for a USB
>>>>>>>>HD
>>>>>>>>H.264 video compressor device. It works well on OSX 10.7/8/9 on
>>>>>>>>multiple systems including older Mac Pros, MBP's, MBA's etc.
>>>>>>>>
>>>>>>>>Its not working well on all the 10.10 based Macs we have, namely a
>>>>>>>>iMac 5K and a MBP 13" retina, both (probably) using usb3
>>>>>>>>controllers,
>>>>>>>>older machines above are probably USB2 controllers. We have
>>>>>>>>customers
>>>>>>>>in the field reporting the same issue "Used to work great, upgraded
>>>>>>>>to
>>>>>>>>10.10 now it hangs".
>>>>>>>>
>>>>>>>>The USB2.0 device we're controlling has always ran (overbudget) at
>>>>>>>>around 560ma during peak use, idling around 420ma. (Same power
>>>>>>>>measurements under windows also). We have no issues with the device
>>>>>>>>when its running around 420ma on 10.10, although the video
>>>>>>>>compressor
>>>>>>>>is not running at this point, we're doing basic status calls.
>>>>>>>>
>>>>>>>>The behavior we see under 10.10 is that when the device starts to
>>>>>>>>compress video, and the power starts to peak, climbing to 530ma and
>>>>>>>>potentially beyond, we start to see our urbs timing out, the device
>>>>>>>>stops responsing to AsyncBulkAsync reads. Rarely does an urb
>>>>>>>>complete
>>>>>>>>without error, and if it does it's marked as overrun. The important
>>>>>>>>point to note is that the device never gets to 560ma.
>>>>>>>>
>>>>>>>>I've noticed on the Macs running 10.10 that the current never seems
>>>>>>>>to
>>>>>>>>go beyond 530, suggesting some kind of operating system USB current
>>>>>>>>limit, or physical USB3 port current limit that doesn't occur on
>>>>>>>>slightly older systems (or on 10.9).
>>>>>>>>
>>>>>>>>Looking at the usb analyzer we see no protocol issues, only timeouts
>>>>>>>>waiting for posted urbs to be filled. No resets, not failed controll
>>>>>>>>transfers, no visible errors other than timeouts.
>>>>>>>>
>>>>>>>>I should point out that the application works very well with other
>>>>>>>>USB
>>>>>>>>Capture devices on 10.10, all of which run at less than 500ma, I'm
>>>>>>>>confident the application is fine.
>>>>>>>>
>>>>>>>>Are their any known differences between 10.9 and 10.10 with regards
>>>>>>>>to
>>>>>>>>allowable current that can be drawn from either a USB2 or USB3
>>>>>>>>port? I
>>>>>>>>realize the device runs overbudget, but is the OS (or USB
>>>>>>>>controllers)
>>>>>>>>starting to enforce 500ma limits - that we're only just seeing?
>>>>>>>>
>>>>>>>>Many thanks,
>>>>>>>>
>>>>>>>>- Steve
>>>>>>>>
>>>>>>>>--
>>>>>>>>Steven Toth - Kernel Labs
>>>>>>>>http://www.kernellabs.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>Steven Toth - Kernel Labs
>>http://www.kernellabs.com
>>+1.646.355.8490
>>
>>
>
>
>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Usb mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/usb/archive%40mail-archive.com

This email sent to [email protected]

Re: USB power budget - New issues with 10.10 and/or new iMacs?

Reply via email to