Re: fake bufmgr and aperture sizing.
Hi Eric, others: This may be a larger problem (I'd be interested in how TTM solves this also). So I've hit a problem with the fake bufmgr and the size of the objects referenced by a batchbuffer being bigger than the current aperture. So when we have a batchbuffer and we are emitting a number of operations, the list of references buffers becomes 32MB for me and we fall down in a heap (compiz can do this really easy...) My first hack involved failing the relocation emit and falling back and having the caller flush the current batchbuffer and retry, however this fails on i915 as the only place we emit the relocs is in the middle of a state emission, so we would need to rollback and re-do the state emit. I'm just thinking of ways to fix it, we could pre-validate all the buffers in i915_vtbl.c that we reference and keep a list of buffers that are on the list and flush and fallback at that point, however this involves some changes to the bufmgr interface, and fake internals... Okay because I'm nice and really really need this working I've committed a fix for fake bufmgr with support for 915, Eric please review this, and if you get time please explain how I could make 965 work, as I'll be trying that tomorrow. Dave. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fake bufmgr and aperture sizing.
Dave Airlie wrote: Hi Eric, others: This may be a larger problem (I'd be interested in how TTM solves this also). So I've hit a problem with the fake bufmgr and the size of the objects referenced by a batchbuffer being bigger than the current aperture. So when we have a batchbuffer and we are emitting a number of operations, the list of references buffers becomes 32MB for me and we fall down in a heap (compiz can do this really easy...) My first hack involved failing the relocation emit and falling back and having the caller flush the current batchbuffer and retry, however this fails on i915 as the only place we emit the relocs is in the middle of a state emission, so we would need to rollback and re-do the state emit. I'm just thinking of ways to fix it, we could pre-validate all the buffers in i915_vtbl.c that we reference and keep a list of buffers that are on the list and flush and fallback at that point, however this involves some changes to the bufmgr interface, and fake internals... Dave. Dave, At least for TTM this is part of a larger problem where you can hit problems both when the pinned page quota is hit, and when you can't fit an object in the aperture. The pinned page quota exceeded is a nasty error, since it can happen anywhere from the driver's point of view and is thus highly undesirable. The proposed solution is to pre-allocate all pinned quota associated with a buffer object and only fail during buffer creation, fence creation and mapping. The only non-fatal reason for execbuf failure is signal received and out-of-aperture space. Fence creation failures are dealt with by idling the hardware and return a NULL fence. The other problem is the one you mention. Since we're dealing with multiple clients and only evict one buffer at a time at aperture space-shortage and even may have pinned buffers scattered in the aperture, there is a probability that the execbuf call will fail with -ENOMEM. I guess before doing that, the kernel could retry and evict all evictable buffers before starting validation. That would eliminate all fragmentation issues except those arising from pinned buffers. The problem remains how to avoid this situation completely. I guess the drm driver can reserve a global safe aperture size, and communicate that to the 3D client, but the current TTM drivers don't deal with this situation. My first idea would probably be your first alternative. Flush and re-do the state-emit if the combined buffer size is larger than the safe aperture size. /Thomas - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fake bufmgr and aperture sizing.
At least for TTM this is part of a larger problem where you can hit problems both when the pinned page quota is hit, and when you can't fit an object in the aperture. The other problem is the one you mention. Since we're dealing with multiple clients and only evict one buffer at a time at aperture space-shortage and even may have pinned buffers scattered in the aperture, there is a probability that the execbuf call will fail with -ENOMEM. I guess before doing that, the kernel could retry and evict all evictable buffers before starting validation. That would eliminate all fragmentation issues except those arising from pinned buffers. IMHO with a complete kernel driver we can avoid fragmentation issues with a cost, i.e. if only the kernel can pin buffers (scanout/cursor etc) we should be able to fence and move them by having special pinned move handlers that would only be used in extreme situations, these handlers would know how to turn cursors off and even move the display base address, it may have to flicker the screen but really anything is better than failing due to fragged memory. The problem remains how to avoid this situation completely. I guess the drm driver can reserve a global safe aperture size, and communicate that to the 3D client, but the current TTM drivers don't deal with this situation. My first idea would probably be your first alternative. Flush and re-do the state-emit if the combined buffer size is larger than the safe aperture size. I think a dynamically sized safe aperture size that can be used per batch submission, is probably the best plan, this might also allow throttling in multi-app situations to help avoid thrashing, by reducing the per-app limits. For cards with per-process we could make it the size of the per-process aperture. The case where an app manages to submit a working set for a single operation that is larger than the GPU can deal with, should be considered a bug in the driver I suppose. Dave. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fake bufmgr and aperture sizing.
The problem remains how to avoid this situation completely. I guess the drm driver can reserve a global safe aperture size, and communicate that to the 3D client, but the current TTM drivers don't deal with this situation. My first idea would probably be your first alternative. Flush and re-do the state-emit if the combined buffer size is larger than the safe aperture size. I think a dynamically sized safe aperture size that can be used per batch submission, is probably the best plan, this might also allow throttling in multi-app situations to help avoid thrashing, by reducing the per-app limits. For cards with per-process we could make it the size of the per-process aperture. The case where an app manages to submit a working set for a single operation that is larger than the GPU can deal with, should be considered a bug in the driver I suppose. The trouble with the safe limit is that it can change in a timeframe that is inconvenient for the driver -- ie, if it changes when a driver has already constructed most of a scene, what happens? This is a lot like the old cliprect problem, where driver choices can be invalidated later on, leaving it in a difficult position. Trying to chop an already-constructed command stream up after the fact is unappealing, even on simple architectures like the i915 in classic mode. Add zone rendering or some other wrinkle it looses appeal fast. What about two limits -- hard soft? If the hard limit can avoid changing, that makes things a lot nicer for the driver. When the soft one changes, the driver can respect that next frame, but submit the current command stream as is. Keith - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fake bufmgr and aperture sizing.
Dave Airlie wrote: At least for TTM this is part of a larger problem where you can hit problems both when the pinned page quota is hit, and when you can't fit an object in the aperture. The other problem is the one you mention. Since we're dealing with multiple clients and only evict one buffer at a time at aperture space-shortage and even may have pinned buffers scattered in the aperture, there is a probability that the execbuf call will fail with -ENOMEM. I guess before doing that, the kernel could retry and evict all evictable buffers before starting validation. That would eliminate all fragmentation issues except those arising from pinned buffers. IMHO with a complete kernel driver we can avoid fragmentation issues with a cost, i.e. if only the kernel can pin buffers (scanout/cursor etc) we should be able to fence and move them by having special pinned move handlers that would only be used in extreme situations, these handlers would know how to turn cursors off and even move the display base address, it may have to flicker the screen but really anything is better than failing due to fragged memory. I agree. It's probably possible to come up with a clever scheme for this, and even to update the display base address during vblank. User space doesn't need to care, since the virtual address will stay the same, but in the end we need to do something about locking in the fb layer. We need to be able to modify the kernel virtual address and GPU offset while fb is running. The problem remains how to avoid this situation completely. I guess the drm driver can reserve a global safe aperture size, and communicate that to the 3D client, but the current TTM drivers don't deal with this situation. My first idea would probably be your first alternative. Flush and re-do the state-emit if the combined buffer size is larger than the safe aperture size. I think a dynamically sized safe aperture size that can be used per batch submission, is probably the best plan, this might also allow throttling in multi-app situations to help avoid thrashing, by reducing the per-app limits. For cards with per-process we could make it the size of the per-process aperture. Actually, thrashing TT memory shouldn't be that horribly bad, as there is generally no caching attribute flipping going on, but it will temporarily stall the driver from working ahead with a new batch and thus drain the pipeline. Thrashing will go on anyway in the multi-app situation, since the driver needs to throttle due to an aperture space shortage, but it will be more driver-induced and perhaps a bit more efficient. The case where an app manages to submit a working set for a single operation that is larger than the GPU can deal with, should be considered a bug in the driver I suppose. Yes, I agree, but we must make sure the kernel can _really_ honor the advertized working set size, because otherwise it's an OOM situation we can't recover from other than by perhaps skipping a frame. This is increasingly important with binning hardware that likes to submit a whole scene in a single batch, but OTOH they usually have a very large aperture / GPU virtual space. Dave. /Thomas - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
fake bufmgr and aperture sizing.
Hi Eric, others: This may be a larger problem (I'd be interested in how TTM solves this also). So I've hit a problem with the fake bufmgr and the size of the objects referenced by a batchbuffer being bigger than the current aperture. So when we have a batchbuffer and we are emitting a number of operations, the list of references buffers becomes 32MB for me and we fall down in a heap (compiz can do this really easy...) My first hack involved failing the relocation emit and falling back and having the caller flush the current batchbuffer and retry, however this fails on i915 as the only place we emit the relocs is in the middle of a state emission, so we would need to rollback and re-do the state emit. I'm just thinking of ways to fix it, we could pre-validate all the buffers in i915_vtbl.c that we reference and keep a list of buffers that are on the list and flush and fallback at that point, however this involves some changes to the bufmgr interface, and fake internals... Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie Linux kernel - DRI, VAX / pam_smb / ILUG - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel