Re: fake bufmgr and aperture sizing.

2008-04-16 Thread Dave Airlie

 
 Hi Eric,
 
 others: This may be a larger problem (I'd be interested in how TTM solves 
 this also).
 
 So I've hit a problem with the fake bufmgr and the size of the objects 
 referenced by a batchbuffer being bigger than the current aperture. So 
 when we have a batchbuffer and we are emitting a number of operations, the 
 list of references buffers becomes  32MB for me and we fall down in a 
 heap (compiz can do this really easy...)
 
 My first hack involved failing the relocation emit and falling back and 
 having the caller flush the current batchbuffer and retry, however this 
 fails on i915 as the only place we emit the relocs is in the middle of a 
 state emission, so we would need to rollback and re-do the state emit.
 
 I'm just thinking of ways to fix it, we could pre-validate all the buffers 
 in i915_vtbl.c that we reference and keep a list of buffers that are on 
 the list and flush and fallback at that point, however this involves some 
 changes to the bufmgr interface, and fake internals...

Okay because I'm nice and really really need this working I've committed a 
fix for fake bufmgr with support for 915, Eric please review this, and if 
you get time please explain how I could make 965 work, as I'll be trying 
that tomorrow.

Dave.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fake bufmgr and aperture sizing.

2008-04-16 Thread Thomas Hellström
Dave Airlie wrote:
 Hi Eric,

 others: This may be a larger problem (I'd be interested in how TTM solves 
 this also).

 So I've hit a problem with the fake bufmgr and the size of the objects 
 referenced by a batchbuffer being bigger than the current aperture. So 
 when we have a batchbuffer and we are emitting a number of operations, the 
 list of references buffers becomes  32MB for me and we fall down in a 
 heap (compiz can do this really easy...)

 My first hack involved failing the relocation emit and falling back and 
 having the caller flush the current batchbuffer and retry, however this 
 fails on i915 as the only place we emit the relocs is in the middle of a 
 state emission, so we would need to rollback and re-do the state emit.

 I'm just thinking of ways to fix it, we could pre-validate all the buffers 
 in i915_vtbl.c that we reference and keep a list of buffers that are on 
 the list and flush and fallback at that point, however this involves some 
 changes to the bufmgr interface, and fake internals...

 Dave.

   
Dave,
At least for TTM this is part of a larger problem where you can hit 
problems both when the pinned page quota is hit, and when
you can't fit an object in the aperture.

The pinned page quota exceeded is a nasty error, since it can happen 
anywhere from the driver's point of view and is thus highly undesirable. 
The proposed solution is to pre-allocate all pinned quota associated 
with a buffer object and only fail during buffer creation, fence 
creation and mapping.  The only non-fatal reason for execbuf failure is 
signal received and out-of-aperture space. Fence creation failures are 
dealt with by idling the hardware and return a NULL fence.

The other problem is the one you mention. Since we're dealing with 
multiple clients and only evict one buffer at a time at aperture 
space-shortage and even may have pinned buffers scattered in the 
aperture, there is a probability that the execbuf call will fail with 
-ENOMEM. I guess before doing that, the kernel could retry and evict all 
evictable buffers before starting validation. That would eliminate all 
fragmentation issues except those arising from pinned buffers.

The problem remains how to avoid this situation completely. I guess the 
drm driver can reserve a global safe aperture size, and communicate 
that to the 3D client, but the current TTM drivers don't deal with this 
situation.
My first idea would probably be your first alternative. Flush and re-do 
the state-emit if the combined buffer size is larger than the safe 
aperture size.

/Thomas




-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fake bufmgr and aperture sizing.

2008-04-16 Thread Dave Airlie

 At least for TTM this is part of a larger problem where you can hit 
 problems both when the pinned page quota is hit, and when
 you can't fit an object in the aperture.
 
 The other problem is the one you mention. Since we're dealing with 
 multiple clients and only evict one buffer at a time at aperture 
 space-shortage and even may have pinned buffers scattered in the 
 aperture, there is a probability that the execbuf call will fail with 
 -ENOMEM. I guess before doing that, the kernel could retry and evict all 
 evictable buffers before starting validation. That would eliminate all 
 fragmentation issues except those arising from pinned buffers.
 

IMHO with a complete kernel driver we can avoid fragmentation issues with 
a cost, i.e. if only the kernel can pin buffers (scanout/cursor etc) we 
should be able to fence and move them by having special pinned move 
handlers that would only be used in extreme situations, these handlers 
would know how to turn cursors off and even move the display base address, 
it may have to flicker the screen but really anything is better than 
failing due to fragged memory.

 The problem remains how to avoid this situation completely. I guess the 
 drm driver can reserve a global safe aperture size, and communicate 
 that to the 3D client, but the current TTM drivers don't deal with this 
 situation.
 My first idea would probably be your first alternative. Flush and re-do 
 the state-emit if the combined buffer size is larger than the safe 
 aperture size.

I think a dynamically sized safe aperture size that can be used per batch 
submission, is probably the best plan, this might also allow throttling in 
multi-app situations to help avoid thrashing, by reducing the per-app 
limits. For cards with per-process we could make it the size of the 
per-process aperture.

The case where an app manages to submit a working set for a single 
operation that is larger than the GPU can deal with, should be considered 
a bug in the driver I suppose.

Dave.

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fake bufmgr and aperture sizing.

2008-04-16 Thread Keith Whitwell

  The problem remains how to avoid this situation completely. I guess the 
  drm driver can reserve a global safe aperture size, and communicate 
  that to the 3D client, but the current TTM drivers don't deal with this 
  situation.
  My first idea would probably be your first alternative. Flush and re-do 
  the state-emit if the combined buffer size is larger than the safe 
  aperture size.
 
 I think a dynamically sized safe aperture size that can be used per batch 
 submission, is probably the best plan, this might also allow throttling in 
 multi-app situations to help avoid thrashing, by reducing the per-app 
 limits. For cards with per-process we could make it the size of the 
 per-process aperture.
 
 The case where an app manages to submit a working set for a single 
 operation that is larger than the GPU can deal with, should be considered 
 a bug in the driver I suppose.


The trouble with the safe limit is that it can change in a timeframe that is 
inconvenient for the driver -- ie, if it changes when a driver has already 
constructed most of a scene, what happens?  This is a lot like the old cliprect 
problem, where driver choices can be invalidated later on, leaving it in a 
difficult position.

Trying to chop an already-constructed command stream up after the fact is 
unappealing, even on simple architectures like the i915 in classic mode.  Add 
zone rendering or some other wrinkle  it looses appeal fast.  

What about two limits -- hard  soft?  If the hard limit can avoid changing, 
that makes things a lot nicer for the driver.  When the soft one changes, the 
driver can respect that next frame, but submit the current command stream as is.

Keith










-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fake bufmgr and aperture sizing.

2008-04-16 Thread Thomas Hellström
Dave Airlie wrote:
 At least for TTM this is part of a larger problem where you can hit 
 problems both when the pinned page quota is hit, and when
 you can't fit an object in the aperture.

 The other problem is the one you mention. Since we're dealing with 
 multiple clients and only evict one buffer at a time at aperture 
 space-shortage and even may have pinned buffers scattered in the 
 aperture, there is a probability that the execbuf call will fail with 
 -ENOMEM. I guess before doing that, the kernel could retry and evict all 
 evictable buffers before starting validation. That would eliminate all 
 fragmentation issues except those arising from pinned buffers.

 

 IMHO with a complete kernel driver we can avoid fragmentation issues with 
 a cost, i.e. if only the kernel can pin buffers (scanout/cursor etc) we 
 should be able to fence and move them by having special pinned move 
 handlers that would only be used in extreme situations, these handlers 
 would know how to turn cursors off and even move the display base address, 
 it may have to flicker the screen but really anything is better than 
 failing due to fragged memory.
   
I agree. It's probably possible to come up with a clever scheme for this, and 
even to update the display base address during vblank. User space doesn't need 
to care, since the virtual address will stay the same, but in the end we need 
to do something about locking in the fb layer. We need to be able to modify  
the kernel virtual address and GPU offset while fb is running.

 The problem remains how to avoid this situation completely. I guess the 
 drm driver can reserve a global safe aperture size, and communicate 
 that to the 3D client, but the current TTM drivers don't deal with this 
 situation.
 My first idea would probably be your first alternative. Flush and re-do 
 the state-emit if the combined buffer size is larger than the safe 
 aperture size.
 

 I think a dynamically sized safe aperture size that can be used per batch 
 submission, is probably the best plan, this might also allow throttling in 
 multi-app situations to help avoid thrashing, by reducing the per-app 
 limits. For cards with per-process we could make it the size of the 
 per-process aperture.
   
Actually, thrashing TT memory shouldn't be that horribly bad, as there 
is generally no caching attribute flipping going on, but it will 
temporarily stall the driver from working ahead with a new batch and 
thus drain the pipeline.
Thrashing will go on anyway in the multi-app situation, since the driver 
needs to throttle due to an aperture space shortage, but it will be more 
driver-induced and perhaps a bit more efficient.


 The case where an app manages to submit a working set for a single 
 operation that is larger than the GPU can deal with, should be considered 
 a bug in the driver I suppose.
   
Yes, I agree, but we must make sure the kernel can _really_ honor the 
advertized working set size, because otherwise it's an OOM situation we 
can't recover from other than by perhaps skipping a frame. This is 
increasingly important with binning hardware that likes to submit a 
whole scene in a single batch, but OTOH they usually have a very large 
aperture / GPU virtual space.

 Dave.
   
/Thomas




-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


fake bufmgr and aperture sizing.

2008-04-15 Thread Dave Airlie

Hi Eric,

others: This may be a larger problem (I'd be interested in how TTM solves 
this also).

So I've hit a problem with the fake bufmgr and the size of the objects 
referenced by a batchbuffer being bigger than the current aperture. So 
when we have a batchbuffer and we are emitting a number of operations, the 
list of references buffers becomes  32MB for me and we fall down in a 
heap (compiz can do this really easy...)

My first hack involved failing the relocation emit and falling back and 
having the caller flush the current batchbuffer and retry, however this 
fails on i915 as the only place we emit the relocs is in the middle of a 
state emission, so we would need to rollback and re-do the state emit.

I'm just thinking of ways to fix it, we could pre-validate all the buffers 
in i915_vtbl.c that we reference and keep a list of buffers that are on 
the list and flush and fallback at that point, however this involves some 
changes to the bufmgr interface, and fake internals...

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
Linux kernel - DRI, VAX / pam_smb / ILUG


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel