Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, 6 Dec 2017 16:57:45 -0800 James Joneswrote: > On 12/06/2017 03:25 AM, Nicolai Hähnle wrote: > > On 06.12.2017 08:07, James Jones wrote: > > [snip] > >> So lets say you have a setup where both display and GPU supported > >> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached > >> (FOO/cached). But the GPU supported the following transitions: > >> > >> trans_a: FOO/CC -> null > >> trans_b: FOO/cached -> null > >> > >> Then the sets for each device (in order of preference): > >> > >> GPU: > >> 1: caps(FOO/tiled, FOO/CC, FOO/cached); > >> constraints(alignment=32k) > >> 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) > >> 3: caps(FOO/tiled); constraints(alignment=32k) > >> > >> Display: > >> 1: caps(FOO/tiled); constraints(alignment=64k) > >> > >> Merged Result: > >> 1: caps(FOO/tiled, FOO/CC, FOO/cached); > >> constraints(alignment=64k); > >> transition(GPU->display: trans_a, trans_b; display->GPU: none) > >> 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); > >> transition(GPU->display: trans_a; display->GPU: none) > >> 3: caps(FOO/tiled); constraints(alignment=64k); > >> transition(GPU->display: none; display->GPU: none) > > > > > > We definitely don't want to expose a way of getting uncached rendering > > surfaces for radeonsi. I mean, I think we are supposed to be able > > to program > > our hardware so that the backend bypasses all caches, but (a) nobody > > validates that and (b) it's basically suicide in terms of > > performance. Let's > > build fewer footguns :) > > sure, this was just a hypothetical example. But to take this case as > another example, if you didn't want to expose uncached rendering (or > cached w/ cache flushes after each draw), you would exclude the entry > from the GPU set which didn't have FOO/cached (I'm adding back a > cached but not CC config just to make it interesting), and end up > with: > > trans_a: FOO/CC -> null > trans_b: FOO/cached -> null > > GPU: > 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) > 2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k) > > Display: > 1: caps(FOO/tiled); constraints(alignment=64k) > > Merged Result: > 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); > transition(GPU->display: trans_a, trans_b; display->GPU: none) > 2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k); > transition(GPU->display: trans_b; display->GPU: none) > > So there isn't anything in the result set that doesn't have GPU cache, > and the cache-flush transition is always in the set of required > transitions going from GPU -> display > > Hmm, I guess this does require the concept of a required cap.. > >>> > >>> Which we already introduced to the allocator API when we realized we > >>> would need them as we were prototyping. > >> > >> Note I also posed the question of whether things like cached (and > >> similarly compression, since I view compression as roughly an > >> equivalent mechanism to a cache) in one of the open issues on my XDC > >> 2017 slides because of this very problem of over-pruning it causes. > >> It's on slide 15, as "No device-local capabilities". You'll have to > >> listen to my coverage of it in the recorded presentation for that > >> slide to make any sense, but it's the same thing Nicolai has laid out > >> here. > >> > >> As I continued working through our prototype driver support, I found I > >> didn't actually need to include cached or compressed as capabilities: > >> The GPU just applies them as needed and the usage transitions make it > >> transparent to the non-GPU engines. That does mean the GPU driver > >> currently needs to be the one to realize the allocation from the > >> capability set to get optimal behavior. We could fix that by > >> reworking our driver though. At this point, not including > >> device-local properties like on-device caching in capabilities seems > >> like the right solution to me. I'm curious whether this applies > >> universally though, or if other hardware doesn't fit the "compression > >> and stuff all behaves like a cache" idiom. > > > > Compression is a part of the memory layout for us: framebuffer > > compression uses an additional "meta surface". At the most basic level, > > an allocation with loss-less compression support is by necessity bigger > > than an allocation without. > > > > We can allocate this meta surface separately, but then we're forced to > > decompress when passing the surface around (e.g. to a compositor.) > > > > Consider also the example I gave elsewhere, where a
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 12/06/2017 03:25 AM, Nicolai Hähnle wrote: On 06.12.2017 08:07, James Jones wrote: [snip] So lets say you have a setup where both display and GPU supported FOO/tiled, but only GPU supported compressed (FOO/CC) and cached (FOO/cached). But the GPU supported the following transitions: trans_a: FOO/CC -> null trans_b: FOO/cached -> null Then the sets for each device (in order of preference): GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) 3: caps(FOO/tiled); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); transition(GPU->display: trans_a; display->GPU: none) 3: caps(FOO/tiled); constraints(alignment=64k); transition(GPU->display: none; display->GPU: none) We definitely don't want to expose a way of getting uncached rendering surfaces for radeonsi. I mean, I think we are supposed to be able to program our hardware so that the backend bypasses all caches, but (a) nobody validates that and (b) it's basically suicide in terms of performance. Let's build fewer footguns :) sure, this was just a hypothetical example. But to take this case as another example, if you didn't want to expose uncached rendering (or cached w/ cache flushes after each draw), you would exclude the entry from the GPU set which didn't have FOO/cached (I'm adding back a cached but not CC config just to make it interesting), and end up with: trans_a: FOO/CC -> null trans_b: FOO/cached -> null GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_b; display->GPU: none) So there isn't anything in the result set that doesn't have GPU cache, and the cache-flush transition is always in the set of required transitions going from GPU -> display Hmm, I guess this does require the concept of a required cap.. Which we already introduced to the allocator API when we realized we would need them as we were prototyping. Note I also posed the question of whether things like cached (and similarly compression, since I view compression as roughly an equivalent mechanism to a cache) in one of the open issues on my XDC 2017 slides because of this very problem of over-pruning it causes. It's on slide 15, as "No device-local capabilities". You'll have to listen to my coverage of it in the recorded presentation for that slide to make any sense, but it's the same thing Nicolai has laid out here. As I continued working through our prototype driver support, I found I didn't actually need to include cached or compressed as capabilities: The GPU just applies them as needed and the usage transitions make it transparent to the non-GPU engines. That does mean the GPU driver currently needs to be the one to realize the allocation from the capability set to get optimal behavior. We could fix that by reworking our driver though. At this point, not including device-local properties like on-device caching in capabilities seems like the right solution to me. I'm curious whether this applies universally though, or if other hardware doesn't fit the "compression and stuff all behaves like a cache" idiom. Compression is a part of the memory layout for us: framebuffer compression uses an additional "meta surface". At the most basic level, an allocation with loss-less compression support is by necessity bigger than an allocation without. We can allocate this meta surface separately, but then we're forced to decompress when passing the surface around (e.g. to a compositor.) Consider also the example I gave elsewhere, where a cross-vendor tiling layout is combined with vendor-specific compression: Device 1, rendering: caps(BASE/foo-tiling, VND1/compression) Device 2, sampling/scanout: caps(BASE/foo-tiling, VND2/compression) Some more thoughts on caching or "device-local" properties below. Compression requires extra resources for us as well. That's probably universal. I think the distinction between the two approaches is whether the allocating driver deduces that compression can be used with a given capability set and hence adds the resources implicitly, or whether the capability set indicates it explicitly. My theory is that the implicit path is possible, but it has downsides. The explicit path is attractive due to its exact nature, as I
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 06.12.2017 14:25, Rob Clark wrote: On Wed, Dec 6, 2017 at 2:07 AM, James Joneswrote: Note I also posed the question of whether things like cached (and similarly compression, since I view compression as roughly an equivalent mechanism to a cache) in one of the open issues on my XDC 2017 slides because of this very problem of over-pruning it causes. It's on slide 15, as "No device-local capabilities". You'll have to listen to my coverage of it in the recorded presentation for that slide to make any sense, but it's the same thing Nicolai has laid out here. As I continued working through our prototype driver support, I found I didn't actually need to include cached or compressed as capabilities: The GPU just applies them as needed and the usage transitions make it transparent to the non-GPU engines. That does mean the GPU driver currently needs to be the one to realize the allocation from the capability set to get optimal behavior. We could fix that by reworking our driver though. At this point, not including device-local properties like on-device caching in capabilities seems like the right solution to me. I'm curious whether this applies universally though, or if other hardware doesn't fit the "compression and stuff all behaves like a cache" idiom. Possibly a SoC(ish) type device which has a "system" cache that some but not all devices fall into. I *think* the intel chips w/ EDRAM might fall into this category. I know the idea has come up elsewhere, although not sure if anything like that ended up in production. It seems like something we'd at least want to have an idea how to deal with, even if it isn't used for device internal caches. Not sure if similar situation could come up w/ discrete GPU and video decode/encode engines on the same die? It definitely could. Our GPUs currently don't have shared caches between gfx and video engines, but moving more and more clients under a shared L2 cache has been a theme over the last few generations. I doubt that's going to happen for the video engines any time soon, but you never know. I don't think we really need caches as a capability for our current GPUs, but it may change, and in any case, we do want compression as a capability. [snip] I think I like the idea of having transitions being part of the per-device/engine cap sets, so that such information can be used upon merging to know which capabilities may remain or have to be dropped. I think James's proposal for usage transitions was intended to work with flows like: 1. App gets GPU caps for RENDER usage 2. App allocates GPU memory using a layout from (1) 3. App now decides it wants use the buffer for SCANOUT 4. App queries usage transition metadata from RENDER to SCANOUT, given the current memory layout. 5. Do the transition and hand the buffer off to display No, all usages the app intends to transition to must be specified up front when initially querying caps in the model I assumed. The app then specifies some subset (up to the full set) of the specified usages as a src and dst when querying transition metadata. The problem I see with this is that it isn't guaranteed that there will be a chain of transitions for the buffer to be usable by display. hmm, I guess if a buffer *can* be shared across all uses, there by definition has to be a chain of transitions to go from any usage+device to any other usage+device. Possibly a separate step to query transitions avoids solving for every possible transition when merging the caps set.. although until you do that query I don't think you know the resulting merged caps set is valid. Maybe in practice for every cap FOO there exists a FOO->null (or FOO->generic if you prefer) transition, ie. compressed->uncompressed, cached->clean, etc. I suppose that makes the problem easier to solve. It really would, to the extent that I would prefer if we could bake it into the system as an assumption. I have my doubts about how to manage calculating transitions cleanly at all without it. The metadata stuff is very vague to me. I hadn't thought hard about it, but my initial thoughts were that it would be required that the driver support transitioning to any single usage given the capabilities returned. However, transitioning to multiple usages (E.g., to simultaneously rendering and scanning out) could fail to produce a valid transition, in which case the app would have to fall back to a copy in that case, or avoid that simultaneous usage combination in some other way. Adding transition metadata to the original capability sets, and using that information when merging could give us a compatible memory layout that would be usable by both GPU and display. I'll look into extending the current merging logic to also take into account transitions. Yes, it'll be good to see whether this can be made to work. I agree Rob's example outcomes above are ideal, but it's not clear to me
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, Dec 6, 2017 at 6:25 AM, Nicolai Hähnlewrote: > On 06.12.2017 08:07, James Jones wrote: > [snip] > >> So lets say you have a setup where both display and GPU supported >> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached >> (FOO/cached). But the GPU supported the following transitions: >> >> trans_a: FOO/CC -> null >> trans_b: FOO/cached -> null >> >> Then the sets for each device (in order of preference): >> >> GPU: >> 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) >> 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) >> 3: caps(FOO/tiled); constraints(alignment=32k) >> >> Display: >> 1: caps(FOO/tiled); constraints(alignment=64k) >> >> Merged Result: >> 1: caps(FOO/tiled, FOO/CC, FOO/cached); >> constraints(alignment=64k); >>transition(GPU->display: trans_a, trans_b; display->GPU: none) >> 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); >>transition(GPU->display: trans_a; display->GPU: none) >> 3: caps(FOO/tiled); constraints(alignment=64k); >>transition(GPU->display: none; display->GPU: none) > > > > We definitely don't want to expose a way of getting uncached rendering > surfaces for radeonsi. I mean, I think we are supposed to be able to > program > our hardware so that the backend bypasses all caches, but (a) nobody > validates that and (b) it's basically suicide in terms of performance. > Let's > build fewer footguns :) sure, this was just a hypothetical example. But to take this case as another example, if you didn't want to expose uncached rendering (or cached w/ cache flushes after each draw), you would exclude the entry from the GPU set which didn't have FOO/cached (I'm adding back a cached but not CC config just to make it interesting), and end up with: trans_a: FOO/CC -> null trans_b: FOO/cached -> null GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_b; display->GPU: none) So there isn't anything in the result set that doesn't have GPU cache, and the cache-flush transition is always in the set of required transitions going from GPU -> display Hmm, I guess this does require the concept of a required cap.. >>> >>> >>> Which we already introduced to the allocator API when we realized we >>> would need them as we were prototyping. >> >> >> Note I also posed the question of whether things like cached (and >> similarly compression, since I view compression as roughly an equivalent >> mechanism to a cache) in one of the open issues on my XDC 2017 slides >> because of this very problem of over-pruning it causes. It's on slide 15, >> as "No device-local capabilities". You'll have to listen to my coverage of >> it in the recorded presentation for that slide to make any sense, but it's >> the same thing Nicolai has laid out here. >> >> As I continued working through our prototype driver support, I found I >> didn't actually need to include cached or compressed as capabilities: The >> GPU just applies them as needed and the usage transitions make it >> transparent to the non-GPU engines. That does mean the GPU driver currently >> needs to be the one to realize the allocation from the capability set to get >> optimal behavior. We could fix that by reworking our driver though. At >> this point, not including device-local properties like on-device caching in >> capabilities seems like the right solution to me. I'm curious whether this >> applies universally though, or if other hardware doesn't fit the >> "compression and stuff all behaves like a cache" idiom. > > > Compression is a part of the memory layout for us: framebuffer compression > uses an additional "meta surface". At the most basic level, an allocation > with loss-less compression support is by necessity bigger than an allocation > without. > > We can allocate this meta surface separately, but then we're forced to > decompress when passing the surface around (e.g. to a compositor.) > side note: I think this is pretty typical.. although afaict for adreno at least, when you start getting into sampling from things with multiple layers/levels, the meta surface needs to be interleaved with the "main" surface, so it can't really be allocated after the fact. Also for depth buffer, there is potentially an additional meta
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, Dec 6, 2017 at 2:07 AM, James Joneswrote: > On 12/01/2017 01:52 PM, Miguel Angel Vico wrote: >> >> >> >> On Fri, 1 Dec 2017 13:38:41 -0500 >> Rob Clark wrote: >> >>> >>> sure, this was just a hypothetical example. But to take this case as >>> another example, if you didn't want to expose uncached rendering (or >>> cached w/ cache flushes after each draw), you would exclude the entry >>> from the GPU set which didn't have FOO/cached (I'm adding back a >>> cached but not CC config just to make it interesting), and end up >>> with: >>> >>> trans_a: FOO/CC -> null >>> trans_b: FOO/cached -> null >>> >>> GPU: >>>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) >>>2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k) >>> >>> Display: >>>1: caps(FOO/tiled); constraints(alignment=64k) >>> >>> Merged Result: >>>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); >>> transition(GPU->display: trans_a, trans_b; display->GPU: none) >>>2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k); >>> transition(GPU->display: trans_b; display->GPU: none) >>> >>> So there isn't anything in the result set that doesn't have GPU cache, >>> and the cache-flush transition is always in the set of required >>> transitions going from GPU -> display >>> >>> Hmm, I guess this does require the concept of a required cap.. >> >> >> Which we already introduced to the allocator API when we realized we >> would need them as we were prototyping. > > > Note I also posed the question of whether things like cached (and similarly > compression, since I view compression as roughly an equivalent mechanism to > a cache) in one of the open issues on my XDC 2017 slides because of this > very problem of over-pruning it causes. It's on slide 15, as "No > device-local capabilities". You'll have to listen to my coverage of it in > the recorded presentation for that slide to make any sense, but it's the > same thing Nicolai has laid out here. > > As I continued working through our prototype driver support, I found I > didn't actually need to include cached or compressed as capabilities: The > GPU just applies them as needed and the usage transitions make it > transparent to the non-GPU engines. That does mean the GPU driver currently > needs to be the one to realize the allocation from the capability set to get > optimal behavior. We could fix that by reworking our driver though. At > this point, not including device-local properties like on-device caching in > capabilities seems like the right solution to me. I'm curious whether this > applies universally though, or if other hardware doesn't fit the > "compression and stuff all behaves like a cache" idiom. > Possibly a SoC(ish) type device which has a "system" cache that some but not all devices fall into. I *think* the intel chips w/ EDRAM might fall into this category. I know the idea has come up elsewhere, although not sure if anything like that ended up in production. It seems like something we'd at least want to have an idea how to deal with, even if it isn't used for device internal caches. Not sure if similar situation could come up w/ discrete GPU and video decode/encode engines on the same die? [snip] >> I think I like the idea of having transitions being part of the >> per-device/engine cap sets, so that such information can be used upon >> merging to know which capabilities may remain or have to be dropped. >> >> I think James's proposal for usage transitions was intended to work >> with flows like: >> >>1. App gets GPU caps for RENDER usage >>2. App allocates GPU memory using a layout from (1) >>3. App now decides it wants use the buffer for SCANOUT >>4. App queries usage transition metadata from RENDER to SCANOUT, >> given the current memory layout. >>5. Do the transition and hand the buffer off to display > > > No, all usages the app intends to transition to must be specified up front > when initially querying caps in the model I assumed. The app then specifies > some subset (up to the full set) of the specified usages as a src and dst > when querying transition metadata. > >> The problem I see with this is that it isn't guaranteed that there will >> be a chain of transitions for the buffer to be usable by display. > hmm, I guess if a buffer *can* be shared across all uses, there by definition has to be a chain of transitions to go from any usage+device to any other usage+device. Possibly a separate step to query transitions avoids solving for every possible transition when merging the caps set.. although until you do that query I don't think you know the resulting merged caps set is valid. Maybe in practice for every cap FOO there exists a FOO->null (or FOO->generic if you prefer) transition, ie. compressed->uncompressed, cached->clean, etc. I suppose that makes the problem easier to solve. > > I hadn't thought
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, Dec 6, 2017 at 12:52 AM, James Joneswrote: > On 11/30/2017 10:48 AM, Rob Clark wrote: >> >> On Thu, Nov 30, 2017 at 1:28 AM, James Jones wrote: >>> >>> On 11/29/2017 01:10 PM, Rob Clark wrote: On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand wrote: > > > On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: >> >> >> >> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand >> >> wrote: >>> >>> >>> I'm not quite some sure what I think about this. I think I would >>> like >>> to >>> see $new_thing at least replace the guts of GBM. Whether GBM becomes >>> a >>> wrapper around $new_thing or $new_thing implements the GBM API, I'm >>> not >>> sure. What I don't think I want is to see GBM development continuing >>> on >>> it's own so we have two competing solutions. >> >> >> >> I don't really view them as competing.. there is *some* overlap, ie. >> allocating a buffer.. but even if you are using GBM w/out $new_thing >> you could allocate a buffer externally and import it. I don't see >> $new_thing as that much different from GBM PoV. >> >> But things like surfaces (aka swap chains) seem a bit out of place >> when you are thinking about implementing $new_thing for non-gpu >> devices. Plus EGL<->GBM tie-ins that seem out of place when talking >> about a (for ex.) camera. I kinda don't want to throw out the baby >> with the bathwater here. > > > > > Agreed. GBM is very EGLish and we don't want the new allocator to be > that. > >> >> *maybe* GBM could be partially implemented on top of $new_thing. I >> don't quite see how that would work. Possibly we could deprecate >> parts of GBM that are no longer needed? idk.. Either way, I fully >> expect that GBM and mesa's implementation of $new_thing could perhaps >> sit on to of some of the same set of internal APIs. The public >> interface can be decoupled from the internal implementation. > > > > > Maybe I should restate things a bit. My real point was that modifiers > + > $new_thing + Kernel blob should be a complete and more powerful > replacement > for GBM. I don't know that we really can implement GBM on top of it > because > GBM has lots of wishy-washy concepts such as "cursor plane" which may > not > map well at least not without querying the kernel about specifc display > planes. In particular, I don't want someone to feel like they need to > use > $new_thing and GBM at the same time or together. Ideally, I'd like > them > to > never do that unless we decide gbm_bo is a useful abstraction for > $new_thing. > (just to repeat what I mentioned on irc) I think main thing is how do you create a swapchain/surface and know which is current front buffer after SwapBuffers().. that is the only bits of GBM that seem like there would still be useful. idk, maybe there is some other idea. >>> >>> >>> >>> I don't view this as terribly useful except for legacy apps that need an >>> EGL >>> window surface and can't be updated to use new methods. Wayland >>> compositors >>> certainly don't fall in that category. I don't know that any GBM apps >>> do. >> >> >> kmscube doesn't count? :-P >> >> Hmm, I assumed weston and the other wayland compositors where still >> using gbm to create EGL surfaces, but I confess to have not actually >> looked at weston src code for quite a few years now. >> >> Anyways, I think it is perfectly fine for GBM to stay as-is in it's >> current form. It can already import dma-buf fd's, and those can >> certainly come from $new_thing. >> >> So I guess we want an EGL extension to return the allocator device >> instance for the GPU. That also takes care of the non-bare-metal >> case. >> >>> Rather, I think the way forward for the classes of apps that need >>> something >>> like GBM or the generic allocator is more or less the path ChromeOS took >>> with their graphics architecture: Render to individual buffers (using >>> FBOs >>> bound to imported buffers in GL) and manage buffer exchanges/blits >>> manually. >>> >>> The useful abstraction surfaces provide isn't so much deciding which >>> buffer >>> is currently "front" and "back", but rather handling the >>> transition/hand-off >>> to the window system/display device/etc. in SwapBuffers(), and the whole >>> idea of the allocator proposals is to make that something the application >>> or >>> at least some non-driver utility library handles explicitly based on >>> where >>> exactly the buffer is being handed off to. >> >> >> Hmm, ok.. I guess the transition will need some hook into the driver. >> For freedreno and vc4 (and I suspect this is not uncommon for
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 06.12.2017 08:07, James Jones wrote: [snip] So lets say you have a setup where both display and GPU supported FOO/tiled, but only GPU supported compressed (FOO/CC) and cached (FOO/cached). But the GPU supported the following transitions: trans_a: FOO/CC -> null trans_b: FOO/cached -> null Then the sets for each device (in order of preference): GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) 3: caps(FOO/tiled); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); transition(GPU->display: trans_a; display->GPU: none) 3: caps(FOO/tiled); constraints(alignment=64k); transition(GPU->display: none; display->GPU: none) We definitely don't want to expose a way of getting uncached rendering surfaces for radeonsi. I mean, I think we are supposed to be able to program our hardware so that the backend bypasses all caches, but (a) nobody validates that and (b) it's basically suicide in terms of performance. Let's build fewer footguns :) sure, this was just a hypothetical example. But to take this case as another example, if you didn't want to expose uncached rendering (or cached w/ cache flushes after each draw), you would exclude the entry from the GPU set which didn't have FOO/cached (I'm adding back a cached but not CC config just to make it interesting), and end up with: trans_a: FOO/CC -> null trans_b: FOO/cached -> null GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_b; display->GPU: none) So there isn't anything in the result set that doesn't have GPU cache, and the cache-flush transition is always in the set of required transitions going from GPU -> display Hmm, I guess this does require the concept of a required cap.. Which we already introduced to the allocator API when we realized we would need them as we were prototyping. Note I also posed the question of whether things like cached (and similarly compression, since I view compression as roughly an equivalent mechanism to a cache) in one of the open issues on my XDC 2017 slides because of this very problem of over-pruning it causes. It's on slide 15, as "No device-local capabilities". You'll have to listen to my coverage of it in the recorded presentation for that slide to make any sense, but it's the same thing Nicolai has laid out here. As I continued working through our prototype driver support, I found I didn't actually need to include cached or compressed as capabilities: The GPU just applies them as needed and the usage transitions make it transparent to the non-GPU engines. That does mean the GPU driver currently needs to be the one to realize the allocation from the capability set to get optimal behavior. We could fix that by reworking our driver though. At this point, not including device-local properties like on-device caching in capabilities seems like the right solution to me. I'm curious whether this applies universally though, or if other hardware doesn't fit the "compression and stuff all behaves like a cache" idiom. Compression is a part of the memory layout for us: framebuffer compression uses an additional "meta surface". At the most basic level, an allocation with loss-less compression support is by necessity bigger than an allocation without. We can allocate this meta surface separately, but then we're forced to decompress when passing the surface around (e.g. to a compositor.) Consider also the example I gave elsewhere, where a cross-vendor tiling layout is combined with vendor-specific compression: Device 1, rendering: caps(BASE/foo-tiling, VND1/compression) Device 2, sampling/scanout: caps(BASE/foo-tiling, VND2/compression) Some more thoughts on caching or "device-local" properties below. [snip] I think I like the idea of having transitions being part of the per-device/engine cap sets, so that such information can be used upon merging to know which capabilities may remain or have to be dropped. I think James's proposal for usage transitions was intended to work with flows like: 1. App gets GPU caps for RENDER usage 2. App allocates GPU memory using a layout from (1) 3. App now decides it wants use the buffer for SCANOUT 4. App queries usage transition metadata from RENDER to SCANOUT,
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 06.12.2017 08:01, James Jones wrote: On 12/01/2017 10:34 AM, Nicolai Hähnle wrote: On 01.12.2017 18:09, Nicolai Hähnle wrote: [snip] As for the actual transition API, I accept that some metadata may be required, and the metadata probably needs to depend on the memory layout, which is often vendor-specific. But even linear layouts need some transitions for caches. We probably need at least some generic "off-device usage" bit. I've started thinking of cached as a capability with a transition.. I think that helps. Maybe it needs to somehow be more specific (ie. if you have two devices both with there own cache with no coherency between the two) As I wrote above, I'd prefer not to think of "cached" as a capability at least for radeonsi. From the desktop perspective, I would say let's ignore caches, the drivers know which caches they need to flush to make data visible to other devices on the system. On the other hand, there are probably SoC cases where non-coherent caches are shared between some but not all devices, and in that case perhaps we do need to communicate this. So perhaps we should have two kinds of "capabilities". The first, like framebuffer compression, is a capability of the allocated memory layout (because the compression requires a meta surface), and devices that expose it may opportunistically use it. The second, like caches, is a capability that the device/driver will use and you don't get a say in it, but other devices/drivers also don't need to be aware of them. So then you could theoretically have a system that gives you: GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache) Display: FOO/tiled(layout-caps=FOO/cc) Video: FOO/tiled(dev-caps=FOO/vid-cache) Camera: FOO/tiled(dev-caps=FOO/vid-cache) [snip] FWIW, I think all that stuff about different caches quite likely over-complicates things. At the end of each "command submission" of whichever type of engine, the buffer must be in a state where the kernel is free to move it around for memory management purposes. This already puts a big constraint on the kind of (non-coherent) caches that can be supported anyway, so I wouldn't be surprised if we could get away with a *much* simpler approach. I'd rather not depend on this type of cleverness if possible. Other kernels/OS's may not behave this way, and I'd like the allocator mechanism to be something we can use across all or at least most of the POSIX and POSIX-like OS's we support. Also, this particular example is not true of our proprietary Linux driver, and I suspect it won't always be the case for other drivers. If a particular driver or OS fits this assumption, the driver is always free to return no-op transitions in that case. Agreed. (What I wrote about memory management should be true for all systems, but the kernel could use an engine that goes through the relevant caches for memory management-related buffer moves. It just so happens that it doesn't do that on our hardware, but that's by no means universal.) Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 12/01/2017 01:52 PM, Miguel Angel Vico wrote: On Fri, 1 Dec 2017 13:38:41 -0500 Rob Clarkwrote: On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle wrote: On 01.12.2017 16:06, Rob Clark wrote: On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle wrote: Hi, I've had a chance to look a bit more closely at the allocator prototype repository now. There's a whole bunch of low-level API design feedback, but for now let's focus on the high-level stuff first. Thanks for taking a look. Going by the 4.5 major object types (as also seen on slide 5 of your presentation [0]), assertions and usages make sense to me. Capabilities and capability sets should be cleaned up in my opinion, as the status quo is overly obfuscating things. What capability sets really represent, as far as I understand them, is *memory layouts*, and so that's what they should be called. This conceptually simplifies `derive_capabilities` significantly without any loss of expressiveness as far as I can see. Given two lists of memory layouts, we simply look for which memory layouts appear in both lists, and then merge their constraints and capabilities. Merging constraints looks good to me. Capabilities need some more thought. The prototype removes capabilities when merging layouts, but I'd argue that that is often undesirable. (In fact, I cannot think of capabilities which we'd always want to remove.) A typical example for this is compression (i.e. DCC in our case). For rendering usage, we'd return something like: Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC) For display usage, we might return (depending on hardware): Memory layout: AMD/tiled; constraints(alignment=64k); caps(none) Merging these in the prototype would remove the DCC capability, even though it might well make sense to keep it there for rendering. Dealing withthe fact that display usage does not have this capability is precisely one of the two things that transitions are about! The other thing that transitions are about is caches. I think this is kind of what Rob was saying in one of his mails. Perhaps "layout" is a better name than "caps".. either way I think of both AMD/tiled and AMD/DCC as the same type of "thing".. the difference between AMD/tiled and AMD/DCC is that a transition can be provided for AMD/DCC. Other than that they are both things describing the layout. The reason that a transition can be provided is that they aren't quite the same thing, though. In a very real sense, AMD/DCC is a "child" propertyof AMD/tiled: DCC is implemented as a meta surface whose memory layout depends on the layout of the main surface. I suppose this is six-of-one, half-dozen of the other.. what you are calling a layout is what I'm calling a cap that just happens not to have an associated transition Although, if there are GPUs that can do an in-place "transition" between different tiling layouts, then the distinction is perhaps really not as clear-cut. I guess that would only apply to tiled renderers. I suppose the advantage of just calling both layout and caps the same thing, and just saying that a "cap" (or "layout" if you prefer that name) can optionally have one or more associated transitions, is that you can deal with cases where sometimes a tiled format might actually have an in-place transition ;-) So lets say you have a setup where both display and GPU supported FOO/tiled, but only GPU supported compressed (FOO/CC) and cached (FOO/cached). But the GPU supported the following transitions: trans_a: FOO/CC -> null trans_b: FOO/cached -> null Then the sets for each device (in order of preference): GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) 3: caps(FOO/tiled); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); transition(GPU->display: trans_a; display->GPU: none) 3: caps(FOO/tiled); constraints(alignment=64k); transition(GPU->display: none; display->GPU: none) We definitely don't want to expose a way of getting uncached rendering surfaces for radeonsi. I mean, I think we are supposed to be able to program our hardware so that the backend bypasses all caches, but (a) nobody validates that and (b) it's basically suicide in terms of performance. Let's build fewer footguns :) sure, this was just a hypothetical example. But to take this case as another example, if you didn't want to expose uncached rendering (or cached w/ cache flushes after each draw), you would exclude the entry from the GPU set which didn't have FOO/cached (I'm adding back a cached but not CC config just to make it
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 12/01/2017 10:34 AM, Nicolai Hähnle wrote: On 01.12.2017 18:09, Nicolai Hähnle wrote: [snip] As for the actual transition API, I accept that some metadata may be required, and the metadata probably needs to depend on the memory layout, which is often vendor-specific. But even linear layouts need some transitions for caches. We probably need at least some generic "off-device usage" bit. I've started thinking of cached as a capability with a transition.. I think that helps. Maybe it needs to somehow be more specific (ie. if you have two devices both with there own cache with no coherency between the two) As I wrote above, I'd prefer not to think of "cached" as a capability at least for radeonsi. From the desktop perspective, I would say let's ignore caches, the drivers know which caches they need to flush to make data visible to other devices on the system. On the other hand, there are probably SoC cases where non-coherent caches are shared between some but not all devices, and in that case perhaps we do need to communicate this. So perhaps we should have two kinds of "capabilities". The first, like framebuffer compression, is a capability of the allocated memory layout (because the compression requires a meta surface), and devices that expose it may opportunistically use it. The second, like caches, is a capability that the device/driver will use and you don't get a say in it, but other devices/drivers also don't need to be aware of them. So then you could theoretically have a system that gives you: GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache) Display: FOO/tiled(layout-caps=FOO/cc) Video: FOO/tiled(dev-caps=FOO/vid-cache) Camera: FOO/tiled(dev-caps=FOO/vid-cache) [snip] FWIW, I think all that stuff about different caches quite likely over-complicates things. At the end of each "command submission" of whichever type of engine, the buffer must be in a state where the kernel is free to move it around for memory management purposes. This already puts a big constraint on the kind of (non-coherent) caches that can be supported anyway, so I wouldn't be surprised if we could get away with a *much* simpler approach. I'd rather not depend on this type of cleverness if possible. Other kernels/OS's may not behave this way, and I'd like the allocator mechanism to be something we can use across all or at least most of the POSIX and POSIX-like OS's we support. Also, this particular example is not true of our proprietary Linux driver, and I suspect it won't always be the case for other drivers. If a particular driver or OS fits this assumption, the driver is always free to return no-op transitions in that case. Thanks, -James Cheers, Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 11/30/2017 12:06 PM, Lyude Paul wrote: On Thu, 2017-11-30 at 13:20 -0500, Rob Clark wrote: On Thu, Nov 30, 2017 at 12:59 AM, James Joneswrote: On 11/29/2017 04:09 PM, Miguel Angel Vico wrote: On Wed, 29 Nov 2017 16:28:15 -0500 Rob Clark wrote: Do we need to define both in-place and copy transitions? Ie. what if GPU is still reading a tiled or compressed texture (ie. sampling from previous frame for some reason), but we need to untile/uncompress for display.. of maybe there are some other cases like that we should think about.. Maybe you already have some thoughts about that? This is the next thing I'll be working on. I haven't given it much thought myself so far, but I think James might have had some insights. I'll read through some of his notes to double-check. A couple of notes on usage transitions: While chatting about transitions, a few assertions were made by others that I've come to accept, despite the fact that they reduce the generality of the allocator mechanisms: -GPUs are the only things that actually need usage transitions as far as I know thus far. Other engines either share the GPU representations of data, or use more limited representations; the latter being the reason non-GPU usage transitions are a useful thing. -It's reasonable to assume that a GPU is required to perform a usage transition. This follows from the above postulate. If only GPUs are using more advanced representations, you don't need any transitions unless you have a GPU available. This seems reasonable. I can't think of any non-gpu related case where you would need a transition, other than perhaps cache flush/inv. From that, I derived the rough API proposal for transitions presented on my XDC 2017 slides. Transition "metadata" is queried from the allocator given a pair of usages (which may refer to more than one device), but the realization of the transition is left to existing GPU APIs. I think I put Vulkan-like pseudo-code in the slides, but the GL external objects extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well. I haven't quite wrapped my head around how this would work in the cross-device case.. I mean from the API standpoint for the user, it seems straightforward enough. Just not sure how to implement that and what the driver interface would look like. I guess we need a capability-conversion (?).. I mean take for example the the fb compression capability from your slide #12[1]. If we knew there was an available transition to go from "Dev2 FB compression" to "normal", then we could have allowed the "Dev2 FB compression" valid set? [1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf Regarding in-place Vs. copy: To me a transition is something that happens in-place, at least semantically. If you need to make copies, that's a format conversion blit not a transition, and graphics APIs are already capable of expressing that without any special transitions or help from the allocator. However, I understand some chipsets perform transitions using something that looks kind of like a blit using on-chip caches and constrained usage semantics. There's probably some work to do to see whether those need to be accommodated as conversion blits or usgae transitions. I guess part of what I was thinking of, is what happens if the producing device is still reading from the buffer. For example, viddec -> gpu use case, where the video decoder is also still hanging on to the frame to use as a reference frame to decode future frames? I guess if transition from devA -> devB can be done in parallel with devA still reading the buffer, it isn't a problem. I guess that limits (non-blit) transitions to decompression and cache op's? Maybe that is ok.. I don't know of a real case it would be a problem. Note you can transition to multiple usages in the proposed API, so for the video decoder example, you would transition from [video decode target] to [video decode target, GPU sampler source] for simultaneous texturing and reference frame usage. For our hardware's purposes, transitions are just various levels of decompression or compression reconfiguration and potentially cache flushing/invalidation, so our transition metadata will just be some bits signaling which compression operation is needed, if any. That's the sort of operation I modeled the API around, so if things are much more exotic than that for others, it will probably require some adjustments. [snip] Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my primary goals. However, it's a pretty heavy thing to prototype. If someone has the time though, I think it would be a great experiment. It would help flesh out the paltry list of usages, constraints, and capabilities in the existing prototype codebase. The kmscube example really should have added at least a "render" usage, but I got lazy and just re-used texture for now. That
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 11/30/2017 10:48 AM, Rob Clark wrote: On Thu, Nov 30, 2017 at 1:28 AM, James Joneswrote: On 11/29/2017 01:10 PM, Rob Clark wrote: On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand wrote: On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand wrote: I'm not quite some sure what I think about this. I think I would like to see $new_thing at least replace the guts of GBM. Whether GBM becomes a wrapper around $new_thing or $new_thing implements the GBM API, I'm not sure. What I don't think I want is to see GBM development continuing on it's own so we have two competing solutions. I don't really view them as competing.. there is *some* overlap, ie. allocating a buffer.. but even if you are using GBM w/out $new_thing you could allocate a buffer externally and import it. I don't see $new_thing as that much different from GBM PoV. But things like surfaces (aka swap chains) seem a bit out of place when you are thinking about implementing $new_thing for non-gpu devices. Plus EGL<->GBM tie-ins that seem out of place when talking about a (for ex.) camera. I kinda don't want to throw out the baby with the bathwater here. Agreed. GBM is very EGLish and we don't want the new allocator to be that. *maybe* GBM could be partially implemented on top of $new_thing. I don't quite see how that would work. Possibly we could deprecate parts of GBM that are no longer needed? idk.. Either way, I fully expect that GBM and mesa's implementation of $new_thing could perhaps sit on to of some of the same set of internal APIs. The public interface can be decoupled from the internal implementation. Maybe I should restate things a bit. My real point was that modifiers + $new_thing + Kernel blob should be a complete and more powerful replacement for GBM. I don't know that we really can implement GBM on top of it because GBM has lots of wishy-washy concepts such as "cursor plane" which may not map well at least not without querying the kernel about specifc display planes. In particular, I don't want someone to feel like they need to use $new_thing and GBM at the same time or together. Ideally, I'd like them to never do that unless we decide gbm_bo is a useful abstraction for $new_thing. (just to repeat what I mentioned on irc) I think main thing is how do you create a swapchain/surface and know which is current front buffer after SwapBuffers().. that is the only bits of GBM that seem like there would still be useful. idk, maybe there is some other idea. I don't view this as terribly useful except for legacy apps that need an EGL window surface and can't be updated to use new methods. Wayland compositors certainly don't fall in that category. I don't know that any GBM apps do. kmscube doesn't count? :-P Hmm, I assumed weston and the other wayland compositors where still using gbm to create EGL surfaces, but I confess to have not actually looked at weston src code for quite a few years now. Anyways, I think it is perfectly fine for GBM to stay as-is in it's current form. It can already import dma-buf fd's, and those can certainly come from $new_thing. So I guess we want an EGL extension to return the allocator device instance for the GPU. That also takes care of the non-bare-metal case. Rather, I think the way forward for the classes of apps that need something like GBM or the generic allocator is more or less the path ChromeOS took with their graphics architecture: Render to individual buffers (using FBOs bound to imported buffers in GL) and manage buffer exchanges/blits manually. The useful abstraction surfaces provide isn't so much deciding which buffer is currently "front" and "back", but rather handling the transition/hand-off to the window system/display device/etc. in SwapBuffers(), and the whole idea of the allocator proposals is to make that something the application or at least some non-driver utility library handles explicitly based on where exactly the buffer is being handed off to. Hmm, ok.. I guess the transition will need some hook into the driver. For freedreno and vc4 (and I suspect this is not uncommon for tiler GPUs), switching FBOs doesn't necessarily flush rendering to hw. Maybe it would work out if you requested the sync fd file descriptor from an EGL fence before passing things to next device, as that would flush rendering. This "flush" is exactly what usage transitions are for: 1) Perform rendering or texturing 2) Insert a transition into command stream using metadata extracted from allocator library into the rendering/texturing API using a new entry point. This instructs the driver to perform any flushes/decompressions/etc. needed to transition to the next usage the pipeline. 3) Insert/extract your fence (potentially this is combined with above entry point like it is in GL_EXT_semaphore). I wonder a bit
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Fri, 1 Dec 2017 13:38:41 -0500 Rob Clarkwrote: > On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle wrote: > > On 01.12.2017 16:06, Rob Clark wrote: > >> > >> On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle > >> wrote: > >>> > >>> Hi, > >>> > >>> I've had a chance to look a bit more closely at the allocator prototype > >>> repository now. There's a whole bunch of low-level API design feedback, > >>> but > >>> for now let's focus on the high-level stuff first. > >>> > >>> Going by the 4.5 major object types (as also seen on slide 5 of your > >>> presentation [0]), assertions and usages make sense to me. > >>> > >>> Capabilities and capability sets should be cleaned up in my opinion, as > >>> the > >>> status quo is overly obfuscating things. What capability sets really > >>> represent, as far as I understand them, is *memory layouts*, and so > >>> that's > >>> what they should be called. > >>> > >>> This conceptually simplifies `derive_capabilities` significantly without > >>> any > >>> loss of expressiveness as far as I can see. Given two lists of memory > >>> layouts, we simply look for which memory layouts appear in both lists, > >>> and > >>> then merge their constraints and capabilities. > >>> > >>> Merging constraints looks good to me. > >>> > >>> Capabilities need some more thought. The prototype removes capabilities > >>> when > >>> merging layouts, but I'd argue that that is often undesirable. (In fact, > >>> I > >>> cannot think of capabilities which we'd always want to remove.) > >>> > >>> A typical example for this is compression (i.e. DCC in our case). For > >>> rendering usage, we'd return something like: > >>> > >>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC) > >>> > >>> For display usage, we might return (depending on hardware): > >>> > >>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(none) > >>> > >>> Merging these in the prototype would remove the DCC capability, even > >>> though > >>> it might well make sense to keep it there for rendering. Dealing with the > >>> fact that display usage does not have this capability is precisely one of > >>> the two things that transitions are about! The other thing that > >>> transitions > >>> are about is caches. > >>> > >>> I think this is kind of what Rob was saying in one of his mails. > >> > >> > >> Perhaps "layout" is a better name than "caps".. either way I think of > >> both AMD/tiled and AMD/DCC as the same type of "thing".. the > >> difference between AMD/tiled and AMD/DCC is that a transition can be > >> provided for AMD/DCC. Other than that they are both things describing > >> the layout. > > > > > > The reason that a transition can be provided is that they aren't quite the > > same thing, though. In a very real sense, AMD/DCC is a "child" property of > > AMD/tiled: DCC is implemented as a meta surface whose memory layout depends > > on the layout of the main surface. > > I suppose this is six-of-one, half-dozen of the other.. > > what you are calling a layout is what I'm calling a cap that just > happens not to have an associated transition > > > Although, if there are GPUs that can do an in-place "transition" between > > different tiling layouts, then the distinction is perhaps really not as > > clear-cut. I guess that would only apply to tiled renderers. > > I suppose the advantage of just calling both layout and caps the same > thing, and just saying that a "cap" (or "layout" if you prefer that > name) can optionally have one or more associated transitions, is that > you can deal with cases where sometimes a tiled format might actually > have an in-place transition ;-) > > > > >> So lets say you have a setup where both display and GPU supported > >> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached > >> (FOO/cached). But the GPU supported the following transitions: > >> > >>trans_a: FOO/CC -> null > >>trans_b: FOO/cached -> null > >> > >> Then the sets for each device (in order of preference): > >> > >> GPU: > >>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) > >>2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) > >>3: caps(FOO/tiled); constraints(alignment=32k) > >> > >> Display: > >>1: caps(FOO/tiled); constraints(alignment=64k) > >> > >> Merged Result: > >>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); > >> transition(GPU->display: trans_a, trans_b; display->GPU: none) > >>2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); > >> transition(GPU->display: trans_a; display->GPU: none) > >>3: caps(FOO/tiled); constraints(alignment=64k); > >> transition(GPU->display: none; display->GPU: none) > > > > > > We definitely don't want to expose a way of getting uncached rendering > > surfaces for radeonsi. I mean, I think we are supposed to be able to program > > our hardware so that the backend
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnlewrote: > On 01.12.2017 16:06, Rob Clark wrote: >> >> On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle >> wrote: >>> >>> Hi, >>> >>> I've had a chance to look a bit more closely at the allocator prototype >>> repository now. There's a whole bunch of low-level API design feedback, >>> but >>> for now let's focus on the high-level stuff first. >>> >>> Going by the 4.5 major object types (as also seen on slide 5 of your >>> presentation [0]), assertions and usages make sense to me. >>> >>> Capabilities and capability sets should be cleaned up in my opinion, as >>> the >>> status quo is overly obfuscating things. What capability sets really >>> represent, as far as I understand them, is *memory layouts*, and so >>> that's >>> what they should be called. >>> >>> This conceptually simplifies `derive_capabilities` significantly without >>> any >>> loss of expressiveness as far as I can see. Given two lists of memory >>> layouts, we simply look for which memory layouts appear in both lists, >>> and >>> then merge their constraints and capabilities. >>> >>> Merging constraints looks good to me. >>> >>> Capabilities need some more thought. The prototype removes capabilities >>> when >>> merging layouts, but I'd argue that that is often undesirable. (In fact, >>> I >>> cannot think of capabilities which we'd always want to remove.) >>> >>> A typical example for this is compression (i.e. DCC in our case). For >>> rendering usage, we'd return something like: >>> >>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC) >>> >>> For display usage, we might return (depending on hardware): >>> >>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(none) >>> >>> Merging these in the prototype would remove the DCC capability, even >>> though >>> it might well make sense to keep it there for rendering. Dealing with the >>> fact that display usage does not have this capability is precisely one of >>> the two things that transitions are about! The other thing that >>> transitions >>> are about is caches. >>> >>> I think this is kind of what Rob was saying in one of his mails. >> >> >> Perhaps "layout" is a better name than "caps".. either way I think of >> both AMD/tiled and AMD/DCC as the same type of "thing".. the >> difference between AMD/tiled and AMD/DCC is that a transition can be >> provided for AMD/DCC. Other than that they are both things describing >> the layout. > > > The reason that a transition can be provided is that they aren't quite the > same thing, though. In a very real sense, AMD/DCC is a "child" property of > AMD/tiled: DCC is implemented as a meta surface whose memory layout depends > on the layout of the main surface. I suppose this is six-of-one, half-dozen of the other.. what you are calling a layout is what I'm calling a cap that just happens not to have an associated transition > Although, if there are GPUs that can do an in-place "transition" between > different tiling layouts, then the distinction is perhaps really not as > clear-cut. I guess that would only apply to tiled renderers. I suppose the advantage of just calling both layout and caps the same thing, and just saying that a "cap" (or "layout" if you prefer that name) can optionally have one or more associated transitions, is that you can deal with cases where sometimes a tiled format might actually have an in-place transition ;-) > >> So lets say you have a setup where both display and GPU supported >> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached >> (FOO/cached). But the GPU supported the following transitions: >> >>trans_a: FOO/CC -> null >>trans_b: FOO/cached -> null >> >> Then the sets for each device (in order of preference): >> >> GPU: >>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) >>2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) >>3: caps(FOO/tiled); constraints(alignment=32k) >> >> Display: >>1: caps(FOO/tiled); constraints(alignment=64k) >> >> Merged Result: >>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); >> transition(GPU->display: trans_a, trans_b; display->GPU: none) >>2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); >> transition(GPU->display: trans_a; display->GPU: none) >>3: caps(FOO/tiled); constraints(alignment=64k); >> transition(GPU->display: none; display->GPU: none) > > > We definitely don't want to expose a way of getting uncached rendering > surfaces for radeonsi. I mean, I think we are supposed to be able to program > our hardware so that the backend bypasses all caches, but (a) nobody > validates that and (b) it's basically suicide in terms of performance. Let's > build fewer footguns :) sure, this was just a hypothetical example. But to take this case as another example, if you didn't want to expose uncached rendering (or cached w/ cache flushes after each draw), you
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 01.12.2017 18:09, Nicolai Hähnle wrote: [snip] As for the actual transition API, I accept that some metadata may be required, and the metadata probably needs to depend on the memory layout, which is often vendor-specific. But even linear layouts need some transitions for caches. We probably need at least some generic "off-device usage" bit. I've started thinking of cached as a capability with a transition.. I think that helps. Maybe it needs to somehow be more specific (ie. if you have two devices both with there own cache with no coherency between the two) As I wrote above, I'd prefer not to think of "cached" as a capability at least for radeonsi. From the desktop perspective, I would say let's ignore caches, the drivers know which caches they need to flush to make data visible to other devices on the system. On the other hand, there are probably SoC cases where non-coherent caches are shared between some but not all devices, and in that case perhaps we do need to communicate this. So perhaps we should have two kinds of "capabilities". The first, like framebuffer compression, is a capability of the allocated memory layout (because the compression requires a meta surface), and devices that expose it may opportunistically use it. The second, like caches, is a capability that the device/driver will use and you don't get a say in it, but other devices/drivers also don't need to be aware of them. So then you could theoretically have a system that gives you: GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache) Display: FOO/tiled(layout-caps=FOO/cc) Video: FOO/tiled(dev-caps=FOO/vid-cache) Camera: FOO/tiled(dev-caps=FOO/vid-cache) [snip] FWIW, I think all that stuff about different caches quite likely over-complicates things. At the end of each "command submission" of whichever type of engine, the buffer must be in a state where the kernel is free to move it around for memory management purposes. This already puts a big constraint on the kind of (non-coherent) caches that can be supported anyway, so I wouldn't be surprised if we could get away with a *much* simpler approach. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 01.12.2017 16:06, Rob Clark wrote: On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnlewrote: Hi, I've had a chance to look a bit more closely at the allocator prototype repository now. There's a whole bunch of low-level API design feedback, but for now let's focus on the high-level stuff first. Going by the 4.5 major object types (as also seen on slide 5 of your presentation [0]), assertions and usages make sense to me. Capabilities and capability sets should be cleaned up in my opinion, as the status quo is overly obfuscating things. What capability sets really represent, as far as I understand them, is *memory layouts*, and so that's what they should be called. This conceptually simplifies `derive_capabilities` significantly without any loss of expressiveness as far as I can see. Given two lists of memory layouts, we simply look for which memory layouts appear in both lists, and then merge their constraints and capabilities. Merging constraints looks good to me. Capabilities need some more thought. The prototype removes capabilities when merging layouts, but I'd argue that that is often undesirable. (In fact, I cannot think of capabilities which we'd always want to remove.) A typical example for this is compression (i.e. DCC in our case). For rendering usage, we'd return something like: Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC) For display usage, we might return (depending on hardware): Memory layout: AMD/tiled; constraints(alignment=64k); caps(none) Merging these in the prototype would remove the DCC capability, even though it might well make sense to keep it there for rendering. Dealing with the fact that display usage does not have this capability is precisely one of the two things that transitions are about! The other thing that transitions are about is caches. I think this is kind of what Rob was saying in one of his mails. Perhaps "layout" is a better name than "caps".. either way I think of both AMD/tiled and AMD/DCC as the same type of "thing".. the difference between AMD/tiled and AMD/DCC is that a transition can be provided for AMD/DCC. Other than that they are both things describing the layout. The reason that a transition can be provided is that they aren't quite the same thing, though. In a very real sense, AMD/DCC is a "child" property of AMD/tiled: DCC is implemented as a meta surface whose memory layout depends on the layout of the main surface. Although, if there are GPUs that can do an in-place "transition" between different tiling layouts, then the distinction is perhaps really not as clear-cut. I guess that would only apply to tiled renderers. So lets say you have a setup where both display and GPU supported FOO/tiled, but only GPU supported compressed (FOO/CC) and cached (FOO/cached). But the GPU supported the following transitions: trans_a: FOO/CC -> null trans_b: FOO/cached -> null Then the sets for each device (in order of preference): GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) 3: caps(FOO/tiled); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); transition(GPU->display: trans_a; display->GPU: none) 3: caps(FOO/tiled); constraints(alignment=64k); transition(GPU->display: none; display->GPU: none) We definitely don't want to expose a way of getting uncached rendering surfaces for radeonsi. I mean, I think we are supposed to be able to program our hardware so that the backend bypasses all caches, but (a) nobody validates that and (b) it's basically suicide in terms of performance. Let's build fewer footguns :) So at least for radeonsi, we wouldn't want to have an AMD/cached bit, but we'd still want to have a transition between the GPU and display precisely to flush caches. Two interesting questions: 1. If we query for multiple usages on the same device, can we get a capability which can only be used for a subset of those usages? I think the original idea was, "no".. perhaps that could restriction could be lifted if transitions where part of the result. Or maybe you just query independently the same device for multiple different usages, and then merge that cap-set. (Do we need to care about intra-device transitions? Or can we just let the driver care about that, same as it always has?) 2. What happens when we merge memory layouts with sets of capabilities where neither is a subset of the other? I think this is a case where no zero-copy sharing is possible, right? Not necessarily. Let's say we have some industry-standard tiling layout foo, and vendors support their own proprietary framebuffer compression on top of
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnlewrote: > Hi, > > I've had a chance to look a bit more closely at the allocator prototype > repository now. There's a whole bunch of low-level API design feedback, but > for now let's focus on the high-level stuff first. > > Going by the 4.5 major object types (as also seen on slide 5 of your > presentation [0]), assertions and usages make sense to me. > > Capabilities and capability sets should be cleaned up in my opinion, as the > status quo is overly obfuscating things. What capability sets really > represent, as far as I understand them, is *memory layouts*, and so that's > what they should be called. > > This conceptually simplifies `derive_capabilities` significantly without any > loss of expressiveness as far as I can see. Given two lists of memory > layouts, we simply look for which memory layouts appear in both lists, and > then merge their constraints and capabilities. > > Merging constraints looks good to me. > > Capabilities need some more thought. The prototype removes capabilities when > merging layouts, but I'd argue that that is often undesirable. (In fact, I > cannot think of capabilities which we'd always want to remove.) > > A typical example for this is compression (i.e. DCC in our case). For > rendering usage, we'd return something like: > > Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC) > > For display usage, we might return (depending on hardware): > > Memory layout: AMD/tiled; constraints(alignment=64k); caps(none) > > Merging these in the prototype would remove the DCC capability, even though > it might well make sense to keep it there for rendering. Dealing with the > fact that display usage does not have this capability is precisely one of > the two things that transitions are about! The other thing that transitions > are about is caches. > > I think this is kind of what Rob was saying in one of his mails. Perhaps "layout" is a better name than "caps".. either way I think of both AMD/tiled and AMD/DCC as the same type of "thing".. the difference between AMD/tiled and AMD/DCC is that a transition can be provided for AMD/DCC. Other than that they are both things describing the layout. So lets say you have a setup where both display and GPU supported FOO/tiled, but only GPU supported compressed (FOO/CC) and cached (FOO/cached). But the GPU supported the following transitions: trans_a: FOO/CC -> null trans_b: FOO/cached -> null Then the sets for each device (in order of preference): GPU: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k) 3: caps(FOO/tiled); constraints(alignment=32k) Display: 1: caps(FOO/tiled); constraints(alignment=64k) Merged Result: 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k); transition(GPU->display: trans_a, trans_b; display->GPU: none) 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k); transition(GPU->display: trans_a; display->GPU: none) 3: caps(FOO/tiled); constraints(alignment=64k); transition(GPU->display: none; display->GPU: none) > Two interesting questions: > > 1. If we query for multiple usages on the same device, can we get a > capability which can only be used for a subset of those usages? I think the original idea was, "no".. perhaps that could restriction could be lifted if transitions where part of the result. Or maybe you just query independently the same device for multiple different usages, and then merge that cap-set. (Do we need to care about intra-device transitions? Or can we just let the driver care about that, same as it always has?) > 2. What happens when we merge memory layouts with sets of capabilities where > neither is a subset of the other? I think this is a case where no zero-copy sharing is possible, right? > As for the actual transition API, I accept that some metadata may be > required, and the metadata probably needs to depend on the memory layout, > which is often vendor-specific. But even linear layouts need some > transitions for caches. We probably need at least some generic "off-device > usage" bit. I've started thinking of cached as a capability with a transition.. I think that helps. Maybe it needs to somehow be more specific (ie. if you have two devices both with there own cache with no coherency between the two) BR, -R > > Cheers, > Nicolai > > [0] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf > > > On 21.11.2017 02:11, James Jones wrote: >> >> As many here know at this point, I've been working on solving issues >> related to DMA-capable memory allocation for various devices for some time >> now. I'd like to take this opportunity to apologize for the way I handled >> the EGL stream proposals. I understand now that the development process >> followed there was unacceptable to the community and likely offended many >> great engineers. >> >> Moving forward, I attempted
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
Hi, I've had a chance to look a bit more closely at the allocator prototype repository now. There's a whole bunch of low-level API design feedback, but for now let's focus on the high-level stuff first. Going by the 4.5 major object types (as also seen on slide 5 of your presentation [0]), assertions and usages make sense to me. Capabilities and capability sets should be cleaned up in my opinion, as the status quo is overly obfuscating things. What capability sets really represent, as far as I understand them, is *memory layouts*, and so that's what they should be called. This conceptually simplifies `derive_capabilities` significantly without any loss of expressiveness as far as I can see. Given two lists of memory layouts, we simply look for which memory layouts appear in both lists, and then merge their constraints and capabilities. Merging constraints looks good to me. Capabilities need some more thought. The prototype removes capabilities when merging layouts, but I'd argue that that is often undesirable. (In fact, I cannot think of capabilities which we'd always want to remove.) A typical example for this is compression (i.e. DCC in our case). For rendering usage, we'd return something like: Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC) For display usage, we might return (depending on hardware): Memory layout: AMD/tiled; constraints(alignment=64k); caps(none) Merging these in the prototype would remove the DCC capability, even though it might well make sense to keep it there for rendering. Dealing with the fact that display usage does not have this capability is precisely one of the two things that transitions are about! The other thing that transitions are about is caches. I think this is kind of what Rob was saying in one of his mails. Two interesting questions: 1. If we query for multiple usages on the same device, can we get a capability which can only be used for a subset of those usages? 2. What happens when we merge memory layouts with sets of capabilities where neither is a subset of the other? As for the actual transition API, I accept that some metadata may be required, and the metadata probably needs to depend on the memory layout, which is often vendor-specific. But even linear layouts need some transitions for caches. We probably need at least some generic "off-device usage" bit. Cheers, Nicolai [0] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf On 21.11.2017 02:11, James Jones wrote: As many here know at this point, I've been working on solving issues related to DMA-capable memory allocation for various devices for some time now. I'd like to take this opportunity to apologize for the way I handled the EGL stream proposals. I understand now that the development process followed there was unacceptable to the community and likely offended many great engineers. Moving forward, I attempted to reboot talks in a more constructive manner with the generic allocator library proposals & discussion forum at XDC 2016. Some great design ideas came out of that, and I've since been prototyping some code to prove them out before bringing them back as official proposals. Again, I understand some people are growing concerned that I've been doing this off on the side in a github project that has primarily NVIDIA contributors. My goal was only to avoid wasting everyone's time with unproven ideas. The intent was never to dump the prototype code as-is on the community and presume acceptance. It's just a public research project. Now the prototyping is nearing completion, and I'd like to renew discussion on whether and how the new mechanisms can be integrated with the Linux graphics stack. I'd be interested to know if more work is needed to demonstrate the usefulness of the new mechanisms, or whether people think they have value at this point. After talking with people on the hallway track at XDC this year, I've heard several proposals for incorporating the new mechanisms: -Include ideas from the generic allocator design into GBM. This could take the form of designing a "GBM 2.0" API, or incrementally adding to the existing GBM API. -Develop a library to replace GBM. The allocator prototype code could be massaged into something production worthy to jump start this process. -Develop a library that sits beside or on top of GBM, using GBM for low-level graphics buffer allocation, while supporting non-graphics kernel APIs directly. The additional cross-device negotiation and sorting of capabilities would be handled in this slightly higher-level API before handing off to GBM and other APIs for actual allocation somehow. -I have also heard some general comments that regardless of the relationship between GBM and the new allocator mechanisms, it might be time to move GBM out of Mesa so it can be developed as a stand-alone project. I'd be interested what others think about that,
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, 2017-11-30 at 13:20 -0500, Rob Clark wrote: > On Thu, Nov 30, 2017 at 12:59 AM, James Joneswrote: > > On 11/29/2017 04:09 PM, Miguel Angel Vico wrote: > > > > > > On Wed, 29 Nov 2017 16:28:15 -0500 > > > Rob Clark wrote: > > > > > > > > Do we need to define both in-place and copy transitions? Ie. what if > > > > GPU is still reading a tiled or compressed texture (ie. sampling from > > > > previous frame for some reason), but we need to untile/uncompress for > > > > display.. of maybe there are some other cases like that we should > > > > think about.. > > > > > > > > Maybe you already have some thoughts about that? > > > > > > > > > This is the next thing I'll be working on. I haven't given it much > > > thought myself so far, but I think James might have had some insights. > > > I'll read through some of his notes to double-check. > > > > > > A couple of notes on usage transitions: > > > > While chatting about transitions, a few assertions were made by others > > that > > I've come to accept, despite the fact that they reduce the generality of > > the > > allocator mechanisms: > > > > -GPUs are the only things that actually need usage transitions as far as I > > know thus far. Other engines either share the GPU representations of > > data, > > or use more limited representations; the latter being the reason non-GPU > > usage transitions are a useful thing. > > > > -It's reasonable to assume that a GPU is required to perform a usage > > transition. This follows from the above postulate. If only GPUs are > > using > > more advanced representations, you don't need any transitions unless you > > have a GPU available. > > This seems reasonable. I can't think of any non-gpu related case > where you would need a transition, other than perhaps cache flush/inv. > > > From that, I derived the rough API proposal for transitions presented on > > my > > XDC 2017 slides. Transition "metadata" is queried from the allocator > > given > > a pair of usages (which may refer to more than one device), but the > > realization of the transition is left to existing GPU APIs. I think I put > > Vulkan-like pseudo-code in the slides, but the GL external objects > > extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well. > > I haven't quite wrapped my head around how this would work in the > cross-device case.. I mean from the API standpoint for the user, it > seems straightforward enough. Just not sure how to implement that and > what the driver interface would look like. > > I guess we need a capability-conversion (?).. I mean take for example > the the fb compression capability from your slide #12[1]. If we knew > there was an available transition to go from "Dev2 FB compression" to > "normal", then we could have allowed the "Dev2 FB compression" valid > set? > > [1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf > > > Regarding in-place Vs. copy: To me a transition is something that happens > > in-place, at least semantically. If you need to make copies, that's a > > format conversion blit not a transition, and graphics APIs are already > > capable of expressing that without any special transitions or help from > > the > > allocator. However, I understand some chipsets perform transitions using > > something that looks kind of like a blit using on-chip caches and > > constrained usage semantics. There's probably some work to do to see > > whether those need to be accommodated as conversion blits or usgae > > transitions. > > I guess part of what I was thinking of, is what happens if the > producing device is still reading from the buffer. For example, > viddec -> gpu use case, where the video decoder is also still hanging > on to the frame to use as a reference frame to decode future frames? > > I guess if transition from devA -> devB can be done in parallel with > devA still reading the buffer, it isn't a problem. I guess that > limits (non-blit) transitions to decompression and cache op's? Maybe > that is ok.. > > > For our hardware's purposes, transitions are just various levels of > > decompression or compression reconfiguration and potentially cache > > flushing/invalidation, so our transition metadata will just be some bits > > signaling which compression operation is needed, if any. That's the sort > > of > > operation I modeled the API around, so if things are much more exotic than > > that for others, it will probably require some adjustments. > > > > > [snip] > > > > > Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my > > primary goals. However, it's a pretty heavy thing to prototype. If > > someone > > has the time though, I think it would be a great experiment. It would > > help > > flesh out the paltry list of usages, constraints, and capabilities in the > > existing prototype codebase. The kmscube example really should have added > > at least a "render" usage, but I
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, 2017-11-30 at 14:20 -0500, Alex Deucher wrote: > On Thu, Nov 30, 2017 at 2:10 PM, Nicolai Hähnlewrote: > > On 30.11.2017 19:52, Rob Clark wrote: > > > > > > On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnle > > > wrote: > > > > > > > > On 30.11.2017 01:09, Miguel Angel Vico wrote: > > > > > > > > > > > > > > > > > > > > > It seems to me that $new_thing should grow as a separate thing > > > > > > > whether > > > > > > > it ends up replacing GBM or GBM internals are somewhat rewritten > > > > > > > on > > > > > > > top > > > > > > > of it. If I'm reading you both correctly, you agree with that, > > > > > > > so in > > > > > > > order to move forward, should we go ahead and create a project > > > > > > > in > > > > > > > fd.o? > > > > > > > > > > > > > > Before filing the new project request though, we should find an > > > > > > > appropriate name for $new_thing. Creativity isn't one of my > > > > > > > strengths, > > > > > > > but I'll go ahead and start the bikeshedding with "Generic > > > > > > > Device > > > > > > > Memory Allocator" or "Generic Device Memory Manager". > > > > > > > > > > > > > > > > > > > > > > > > liballoc - Generic Device Memory Allocator ... seems reasonable to > > > > > > me.. > > > > > > > > > > > > > > > > > > > > Cool. If there aren't better suggestions, we can go with that. We > > > > > should also namespace all APIs and structures. Is 'galloc' > > > > > distinctive > > > > > enough to be used as namespace? Being an 'r' away from gralloc maybe > > > > > it's a bit confusing? > > > > > > > > > > > > > > > > libgalloc with a galloc prefix seems fine. > > > > > > > > > > I keep reading "galloc" as "gralloc".. I suspect that will be > > > confusing. Maybe libgal/gal_.. or just liballoc/al_? > > > > > > True, but liballoc is *very* generic. > > > > libimagealloc? > > libsurfacealloc? > > contractions thereof? > > libdevicealloc? libhwalloc > > Alex ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, Nov 30, 2017 at 2:10 PM, Nicolai Hähnlewrote: > On 30.11.2017 19:52, Rob Clark wrote: >> >> On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnle >> wrote: >>> >>> On 30.11.2017 01:09, Miguel Angel Vico wrote: >> >> >> It seems to me that $new_thing should grow as a separate thing whether >> it ends up replacing GBM or GBM internals are somewhat rewritten on >> top >> of it. If I'm reading you both correctly, you agree with that, so in >> order to move forward, should we go ahead and create a project in >> fd.o? >> >> Before filing the new project request though, we should find an >> appropriate name for $new_thing. Creativity isn't one of my strengths, >> but I'll go ahead and start the bikeshedding with "Generic Device >> Memory Allocator" or "Generic Device Memory Manager". > > > > liballoc - Generic Device Memory Allocator ... seems reasonable to me.. Cool. If there aren't better suggestions, we can go with that. We should also namespace all APIs and structures. Is 'galloc' distinctive enough to be used as namespace? Being an 'r' away from gralloc maybe it's a bit confusing? >>> >>> >>> >>> libgalloc with a galloc prefix seems fine. >>> >> >> I keep reading "galloc" as "gralloc".. I suspect that will be >> confusing. Maybe libgal/gal_.. or just liballoc/al_? > > > True, but liballoc is *very* generic. > > libimagealloc? > libsurfacealloc? > contractions thereof? libdevicealloc? Alex ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 30.11.2017 19:52, Rob Clark wrote: On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnlewrote: On 30.11.2017 01:09, Miguel Angel Vico wrote: It seems to me that $new_thing should grow as a separate thing whether it ends up replacing GBM or GBM internals are somewhat rewritten on top of it. If I'm reading you both correctly, you agree with that, so in order to move forward, should we go ahead and create a project in fd.o? Before filing the new project request though, we should find an appropriate name for $new_thing. Creativity isn't one of my strengths, but I'll go ahead and start the bikeshedding with "Generic Device Memory Allocator" or "Generic Device Memory Manager". liballoc - Generic Device Memory Allocator ... seems reasonable to me.. Cool. If there aren't better suggestions, we can go with that. We should also namespace all APIs and structures. Is 'galloc' distinctive enough to be used as namespace? Being an 'r' away from gralloc maybe it's a bit confusing? libgalloc with a galloc prefix seems fine. I keep reading "galloc" as "gralloc".. I suspect that will be confusing. Maybe libgal/gal_.. or just liballoc/al_? True, but liballoc is *very* generic. libimagealloc? libsurfacealloc? contractions thereof? Cheers, Nicolai BR, -R -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnlewrote: > On 30.11.2017 01:09, Miguel Angel Vico wrote: It seems to me that $new_thing should grow as a separate thing whether it ends up replacing GBM or GBM internals are somewhat rewritten on top of it. If I'm reading you both correctly, you agree with that, so in order to move forward, should we go ahead and create a project in fd.o? Before filing the new project request though, we should find an appropriate name for $new_thing. Creativity isn't one of my strengths, but I'll go ahead and start the bikeshedding with "Generic Device Memory Allocator" or "Generic Device Memory Manager". >>> >>> >>> liballoc - Generic Device Memory Allocator ... seems reasonable to me.. >> >> >> Cool. If there aren't better suggestions, we can go with that. We >> should also namespace all APIs and structures. Is 'galloc' distinctive >> enough to be used as namespace? Being an 'r' away from gralloc maybe >> it's a bit confusing? > > > libgalloc with a galloc prefix seems fine. > I keep reading "galloc" as "gralloc".. I suspect that will be confusing. Maybe libgal/gal_.. or just liballoc/al_? BR, -R > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, Nov 30, 2017 at 1:28 AM, James Joneswrote: > On 11/29/2017 01:10 PM, Rob Clark wrote: >> >> On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand >> wrote: >>> >>> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand wrote: > > I'm not quite some sure what I think about this. I think I would like > to > see $new_thing at least replace the guts of GBM. Whether GBM becomes a > wrapper around $new_thing or $new_thing implements the GBM API, I'm not > sure. What I don't think I want is to see GBM development continuing > on > it's own so we have two competing solutions. I don't really view them as competing.. there is *some* overlap, ie. allocating a buffer.. but even if you are using GBM w/out $new_thing you could allocate a buffer externally and import it. I don't see $new_thing as that much different from GBM PoV. But things like surfaces (aka swap chains) seem a bit out of place when you are thinking about implementing $new_thing for non-gpu devices. Plus EGL<->GBM tie-ins that seem out of place when talking about a (for ex.) camera. I kinda don't want to throw out the baby with the bathwater here. >>> >>> >>> >>> Agreed. GBM is very EGLish and we don't want the new allocator to be >>> that. >>> *maybe* GBM could be partially implemented on top of $new_thing. I don't quite see how that would work. Possibly we could deprecate parts of GBM that are no longer needed? idk.. Either way, I fully expect that GBM and mesa's implementation of $new_thing could perhaps sit on to of some of the same set of internal APIs. The public interface can be decoupled from the internal implementation. >>> >>> >>> >>> Maybe I should restate things a bit. My real point was that modifiers + >>> $new_thing + Kernel blob should be a complete and more powerful >>> replacement >>> for GBM. I don't know that we really can implement GBM on top of it >>> because >>> GBM has lots of wishy-washy concepts such as "cursor plane" which may not >>> map well at least not without querying the kernel about specifc display >>> planes. In particular, I don't want someone to feel like they need to >>> use >>> $new_thing and GBM at the same time or together. Ideally, I'd like them >>> to >>> never do that unless we decide gbm_bo is a useful abstraction for >>> $new_thing. >>> >> >> (just to repeat what I mentioned on irc) >> >> I think main thing is how do you create a swapchain/surface and know >> which is current front buffer after SwapBuffers().. that is the only >> bits of GBM that seem like there would still be useful. idk, maybe >> there is some other idea. > > > I don't view this as terribly useful except for legacy apps that need an EGL > window surface and can't be updated to use new methods. Wayland compositors > certainly don't fall in that category. I don't know that any GBM apps do. kmscube doesn't count? :-P Hmm, I assumed weston and the other wayland compositors where still using gbm to create EGL surfaces, but I confess to have not actually looked at weston src code for quite a few years now. Anyways, I think it is perfectly fine for GBM to stay as-is in it's current form. It can already import dma-buf fd's, and those can certainly come from $new_thing. So I guess we want an EGL extension to return the allocator device instance for the GPU. That also takes care of the non-bare-metal case. > Rather, I think the way forward for the classes of apps that need something > like GBM or the generic allocator is more or less the path ChromeOS took > with their graphics architecture: Render to individual buffers (using FBOs > bound to imported buffers in GL) and manage buffer exchanges/blits manually. > > The useful abstraction surfaces provide isn't so much deciding which buffer > is currently "front" and "back", but rather handling the transition/hand-off > to the window system/display device/etc. in SwapBuffers(), and the whole > idea of the allocator proposals is to make that something the application or > at least some non-driver utility library handles explicitly based on where > exactly the buffer is being handed off to. Hmm, ok.. I guess the transition will need some hook into the driver. For freedreno and vc4 (and I suspect this is not uncommon for tiler GPUs), switching FBOs doesn't necessarily flush rendering to hw. Maybe it would work out if you requested the sync fd file descriptor from an EGL fence before passing things to next device, as that would flush rendering. I wonder a bit about perf tools and related things.. gallium HUD and apitrace use SwapBuffers() as a frame marker.. > The one other useful information provided by EGL surfaces that I suspect > only our hardware cares about is whether the app is potentially
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Thu, Nov 30, 2017 at 12:59 AM, James Joneswrote: > On 11/29/2017 04:09 PM, Miguel Angel Vico wrote: >> >> On Wed, 29 Nov 2017 16:28:15 -0500 >> Rob Clark wrote: >>> >>> Do we need to define both in-place and copy transitions? Ie. what if >>> GPU is still reading a tiled or compressed texture (ie. sampling from >>> previous frame for some reason), but we need to untile/uncompress for >>> display.. of maybe there are some other cases like that we should >>> think about.. >>> >>> Maybe you already have some thoughts about that? >> >> >> This is the next thing I'll be working on. I haven't given it much >> thought myself so far, but I think James might have had some insights. >> I'll read through some of his notes to double-check. > > > A couple of notes on usage transitions: > > While chatting about transitions, a few assertions were made by others that > I've come to accept, despite the fact that they reduce the generality of the > allocator mechanisms: > > -GPUs are the only things that actually need usage transitions as far as I > know thus far. Other engines either share the GPU representations of data, > or use more limited representations; the latter being the reason non-GPU > usage transitions are a useful thing. > > -It's reasonable to assume that a GPU is required to perform a usage > transition. This follows from the above postulate. If only GPUs are using > more advanced representations, you don't need any transitions unless you > have a GPU available. This seems reasonable. I can't think of any non-gpu related case where you would need a transition, other than perhaps cache flush/inv. > From that, I derived the rough API proposal for transitions presented on my > XDC 2017 slides. Transition "metadata" is queried from the allocator given > a pair of usages (which may refer to more than one device), but the > realization of the transition is left to existing GPU APIs. I think I put > Vulkan-like pseudo-code in the slides, but the GL external objects > extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well. I haven't quite wrapped my head around how this would work in the cross-device case.. I mean from the API standpoint for the user, it seems straightforward enough. Just not sure how to implement that and what the driver interface would look like. I guess we need a capability-conversion (?).. I mean take for example the the fb compression capability from your slide #12[1]. If we knew there was an available transition to go from "Dev2 FB compression" to "normal", then we could have allowed the "Dev2 FB compression" valid set? [1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf > Regarding in-place Vs. copy: To me a transition is something that happens > in-place, at least semantically. If you need to make copies, that's a > format conversion blit not a transition, and graphics APIs are already > capable of expressing that without any special transitions or help from the > allocator. However, I understand some chipsets perform transitions using > something that looks kind of like a blit using on-chip caches and > constrained usage semantics. There's probably some work to do to see > whether those need to be accommodated as conversion blits or usgae > transitions. I guess part of what I was thinking of, is what happens if the producing device is still reading from the buffer. For example, viddec -> gpu use case, where the video decoder is also still hanging on to the frame to use as a reference frame to decode future frames? I guess if transition from devA -> devB can be done in parallel with devA still reading the buffer, it isn't a problem. I guess that limits (non-blit) transitions to decompression and cache op's? Maybe that is ok.. > For our hardware's purposes, transitions are just various levels of > decompression or compression reconfiguration and potentially cache > flushing/invalidation, so our transition metadata will just be some bits > signaling which compression operation is needed, if any. That's the sort of > operation I modeled the API around, so if things are much more exotic than > that for others, it will probably require some adjustments. > [snip] > > Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my > primary goals. However, it's a pretty heavy thing to prototype. If someone > has the time though, I think it would be a great experiment. It would help > flesh out the paltry list of usages, constraints, and capabilities in the > existing prototype codebase. The kmscube example really should have added > at least a "render" usage, but I got lazy and just re-used texture for now. > That won't actually work on our HW in all cases, but it's good enough for > kmscube. > btw, I did start looking at it.. I guess this gets a bit into the other side of this thread (ie. where/if GBM fits in). So far I don't think mesa has EGL_EXT_device_base, but I'm guessing
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 30.11.2017 07:28, James Jones wrote: This is all a really long-winded way of saying yeah I think it would be technically feasible to implement GBM on top of the generic allocator mechanisms, but I don't think that's a very interesting undertaking. It'd just be an ABI-compatibility thing for a bunch of open-source apps, which seems unnecessary in the long run since the apps can just be patched instead. Maybe it's useful as a transition mechanism though. However, if the generic allocator is going to be something separate from GBM, I think the idea of modernizing & adapting the existing GBM backend infrastructure in Mesa to serve as a backend for the allocator is a good idea. Maybe it's easier to just let GBM sit on that same updated backend beside the allocator API. For GBM, all the interesting stuff happens in the backend anyway. That's precisely why I brought up the libgalloc <-> driver interface in another mail. If the libgalloc <-> driver interface uses the same extension mechanism that is in place for libgbm <-> driver today, just with different extensions, the transition can be made very seamless. For example, I think we could let whatever "device handle" we use in that interface simply be an alias for __DRIscreen as far as drivers from Mesa are concerned. Other drivers (which won't implement the DRI_XXX extensions) won't have to concern themselves with that if they don't want to. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 30.11.2017 01:09, Miguel Angel Vico wrote: It seems to me that $new_thing should grow as a separate thing whether it ends up replacing GBM or GBM internals are somewhat rewritten on top of it. If I'm reading you both correctly, you agree with that, so in order to move forward, should we go ahead and create a project in fd.o? Before filing the new project request though, we should find an appropriate name for $new_thing. Creativity isn't one of my strengths, but I'll go ahead and start the bikeshedding with "Generic Device Memory Allocator" or "Generic Device Memory Manager". liballoc - Generic Device Memory Allocator ... seems reasonable to me.. Cool. If there aren't better suggestions, we can go with that. We should also namespace all APIs and structures. Is 'galloc' distinctive enough to be used as namespace? Being an 'r' away from gralloc maybe it's a bit confusing? libgalloc with a galloc prefix seems fine. I think it is reasonable to live on github until we figure out how transitions work.. or in particular are there any thread restrictions or interactions w/ gl context if transitions are done on the gpu or anything like that? Or can we just make it more vulkan like w/ explicit ctx ptr, and pass around fence fd's to synchronize everyone?? I haven't thought about the transition part too much but I guess we should have a reasonable idea for how that should work before we start getting too many non-toy users, lest we find big API changes are needed.. Seems fine, but I would like to get other people other than NVIDIANs involved giving feedback on the design as we move forward with the prototype. Due to lack of a better list, is it okay to start sending patches to mesa-dev? If that's a too broad audience, should I just CC specific individuals that have somewhat contributed to the project? Keeping it on mesa-dev seems like the best way to ensure the relevant people actually see it. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 11/29/2017 01:10 PM, Rob Clark wrote: On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrandwrote: On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand wrote: On November 24, 2017 09:29:43 Rob Clark wrote: On Mon, Nov 20, 2017 at 8:11 PM, James Jones wrote: As many here know at this point, I've been working on solving issues related to DMA-capable memory allocation for various devices for some time now. I'd like to take this opportunity to apologize for the way I handled the EGL stream proposals. I understand now that the development process followed there was unacceptable to the community and likely offended many great engineers. Moving forward, I attempted to reboot talks in a more constructive manner with the generic allocator library proposals & discussion forum at XDC 2016. Some great design ideas came out of that, and I've since been prototyping some code to prove them out before bringing them back as official proposals. Again, I understand some people are growing concerned that I've been doing this off on the side in a github project that has primarily NVIDIA contributors. My goal was only to avoid wasting everyone's time with unproven ideas. The intent was never to dump the prototype code as-is on the community and presume acceptance. It's just a public research project. Now the prototyping is nearing completion, and I'd like to renew discussion on whether and how the new mechanisms can be integrated with the Linux graphics stack. I'd be interested to know if more work is needed to demonstrate the usefulness of the new mechanisms, or whether people think they have value at this point. After talking with people on the hallway track at XDC this year, I've heard several proposals for incorporating the new mechanisms: -Include ideas from the generic allocator design into GBM. This could take the form of designing a "GBM 2.0" API, or incrementally adding to the existing GBM API. -Develop a library to replace GBM. The allocator prototype code could be massaged into something production worthy to jump start this process. -Develop a library that sits beside or on top of GBM, using GBM for low-level graphics buffer allocation, while supporting non-graphics kernel APIs directly. The additional cross-device negotiation and sorting of capabilities would be handled in this slightly higher-level API before handing off to GBM and other APIs for actual allocation somehow. tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is still the "winsys" for running on "bare metal" (ie. kms). And we don't want to saddle $new_thing with aspects of that, but rather have it focus on being the thing that in multiple-"device"[1] scenarious figures out what sort of buffer can be allocated by who for sharing. Ie $new_thing should really not care about winsys level things like cursors or surfaces.. only buffers. The mesa implementation of $new_thing could sit on top of GBM, although it could also just sit on top of the same internal APIs that GBM sits on top of. That is an implementation detail. It could be that GBM grows an API to return an instance of $new_thing for use-cases that involve sharing a buffer with the GPU. Or perhaps that is exposed via some sort of EGL extension. (We probably also need a way to get an instance from libdrm (?) for display-only KMS drivers, to cover cases like etnaviv sharing a buffer with a separate display driver.) [1] where "devices" could be multiple GPUs or multiple APIs for one or more GPUs, but also includes non-GPU devices like camera, video decoder, "image processor" (which may or may not be part of camera), etc, etc I'm not quite some sure what I think about this. I think I would like to see $new_thing at least replace the guts of GBM. Whether GBM becomes a wrapper around $new_thing or $new_thing implements the GBM API, I'm not sure. What I don't think I want is to see GBM development continuing on it's own so we have two competing solutions. I don't really view them as competing.. there is *some* overlap, ie. allocating a buffer.. but even if you are using GBM w/out $new_thing you could allocate a buffer externally and import it. I don't see $new_thing as that much different from GBM PoV. But things like surfaces (aka swap chains) seem a bit out of place when you are thinking about implementing $new_thing for non-gpu devices. Plus EGL<->GBM tie-ins that seem out of place when talking about a (for ex.) camera. I kinda don't want to throw out the baby with the bathwater here. Agreed. GBM is very EGLish and we don't want the new allocator to be that. *maybe* GBM could be partially implemented on top of $new_thing. I don't quite see how that would work. Possibly we could deprecate parts of GBM that are no longer needed? idk.. Either way, I fully expect that GBM and mesa's
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 11/29/2017 04:09 PM, Miguel Angel Vico wrote: On Wed, 29 Nov 2017 16:28:15 -0500 Rob Clarkwrote: On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico wrote: Many of you may already know, but James is going to be out for a few weeks and I'll be taking over this in the meantime. Sorry for the unfortunate timing. I am indeed on paternity leave at the moment. Some quick comments below. I'll be trying to follow the discussion as time allows while I'm out. See inline for comments. On Wed, 29 Nov 2017 09:33:29 -0800 Jason Ekstrand wrote: On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand wrote: On November 24, 2017 09:29:43 Rob Clark wrote: On Mon, Nov 20, 2017 at 8:11 PM, James Jones wrote: As many here know at this point, I've been working on solving issues related to DMA-capable memory allocation for various devices for some time now. I'd like to take this opportunity to apologize for the way I handled the EGL stream proposals. I understand now that the development process followed there was unacceptable to the community and likely offended many great engineers. Moving forward, I attempted to reboot talks in a more constructive manner with the generic allocator library proposals & discussion forum at XDC 2016. Some great design ideas came out of that, and I've since been prototyping some code to prove them out before bringing them back as official proposals. Again, I understand some people are growing concerned that I've been doing this off on the side in a github project that has primarily NVIDIA contributors. My goal was only to avoid wasting everyone's time with unproven ideas. The intent was never to dump the prototype code as-is on the community and presume acceptance. It's just a public research project. Now the prototyping is nearing completion, and I'd like to renew discussion on whether and how the new mechanisms can be integrated with the Linux graphics stack. I'd be interested to know if more work is needed to demonstrate the usefulness of the new mechanisms, or whether people think they have value at this point. After talking with people on the hallway track at XDC this year, I've heard several proposals for incorporating the new mechanisms: -Include ideas from the generic allocator design into GBM. This could take the form of designing a "GBM 2.0" API, or incrementally adding to the existing GBM API. -Develop a library to replace GBM. The allocator prototype code could be massaged into something production worthy to jump start this process. -Develop a library that sits beside or on top of GBM, using GBM for low-level graphics buffer allocation, while supporting non-graphics kernel APIs directly. The additional cross-device negotiation and sorting of capabilities would be handled in this slightly higher-level API before handing off to GBM and other APIs for actual allocation somehow. tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is still the "winsys" for running on "bare metal" (ie. kms). And we don't want to saddle $new_thing with aspects of that, but rather have it focus on being the thing that in multiple-"device"[1] scenarious figures out what sort of buffer can be allocated by who for sharing. Ie $new_thing should really not care about winsys level things like cursors or surfaces.. only buffers. The mesa implementation of $new_thing could sit on top of GBM, although it could also just sit on top of the same internal APIs that GBM sits on top of. That is an implementation detail. It could be that GBM grows an API to return an instance of $new_thing for use-cases that involve sharing a buffer with the GPU. Or perhaps that is exposed via some sort of EGL extension. (We probably also need a way to get an instance from libdrm (?) for display-only KMS drivers, to cover cases like etnaviv sharing a buffer with a separate display driver.) [1] where "devices" could be multiple GPUs or multiple APIs for one or more GPUs, but also includes non-GPU devices like camera, video decoder, "image processor" (which may or may not be part of camera), etc, etc I'm not quite some sure what I think about this. I think I would like to see $new_thing at least replace the guts of GBM. Whether GBM becomes a wrapper around $new_thing or $new_thing implements the GBM API, I'm not sure. What I don't think I want is to see GBM development continuing on it's own so we have two competing solutions. I don't really view them as competing.. there is *some* overlap, ie. allocating a buffer.. but even if you are using GBM w/out $new_thing you could allocate a buffer externally and import it. I don't see $new_thing as that much different from GBM PoV. But things like surfaces (aka swap chains) seem a bit out of place when you are
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, 29 Nov 2017 16:28:15 -0500 Rob Clarkwrote: > On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico > wrote: > > Many of you may already know, but James is going to be out for a few > > weeks and I'll be taking over this in the meantime. > > > > See inline for comments. > > > > On Wed, 29 Nov 2017 09:33:29 -0800 > > Jason Ekstrand wrote: > > > >> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: > >> > >> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand > >> > wrote: > >> > > On November 24, 2017 09:29:43 Rob Clark wrote: > >> > >> > >> > >> > >> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones > >> > wrote: > >> > >>> > >> > >>> As many here know at this point, I've been working on solving issues > >> > >>> related > >> > >>> to DMA-capable memory allocation for various devices for some time > >> > >>> now. > >> > >>> I'd > >> > >>> like to take this opportunity to apologize for the way I handled the > >> > >>> > >> > EGL > >> > >>> stream proposals. I understand now that the development process > >> > followed > >> > >>> there was unacceptable to the community and likely offended many > >> > >>> great > >> > >>> engineers. > >> > >>> > >> > >>> Moving forward, I attempted to reboot talks in a more constructive > >> > manner > >> > >>> with the generic allocator library proposals & discussion forum at > >> > >>> XDC > >> > >>> 2016. > >> > >>> Some great design ideas came out of that, and I've since been > >> > prototyping > >> > >>> some code to prove them out before bringing them back as official > >> > >>> proposals. > >> > >>> Again, I understand some people are growing concerned that I've been > >> > >>> doing > >> > >>> this off on the side in a github project that has primarily NVIDIA > >> > >>> contributors. My goal was only to avoid wasting everyone's time with > >> > >>> unproven ideas. The intent was never to dump the prototype code > >> > >>> as-is > >> > on > >> > >>> the community and presume acceptance. It's just a public research > >> > >>> project. > >> > >>> > >> > >>> Now the prototyping is nearing completion, and I'd like to renew > >> > >>> discussion > >> > >>> on whether and how the new mechanisms can be integrated with the > >> > >>> Linux > >> > >>> graphics stack. > >> > >>> > >> > >>> I'd be interested to know if more work is needed to demonstrate the > >> > >>> usefulness of the new mechanisms, or whether people think they have > >> > value > >> > >>> at > >> > >>> this point. > >> > >>> > >> > >>> After talking with people on the hallway track at XDC this year, I've > >> > >>> heard > >> > >>> several proposals for incorporating the new mechanisms: > >> > >>> > >> > >>> -Include ideas from the generic allocator design into GBM. This > >> > >>> could > >> > >>> take > >> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the > >> > >>> existing GBM API. > >> > >>> > >> > >>> -Develop a library to replace GBM. The allocator prototype code > >> > >>> could > >> > be > >> > >>> massaged into something production worthy to jump start this process. > >> > >>> > >> > >>> -Develop a library that sits beside or on top of GBM, using GBM for > >> > >>> low-level graphics buffer allocation, while supporting non-graphics > >> > >>> kernel > >> > >>> APIs directly. The additional cross-device negotiation and sorting > >> > >>> of > >> > >>> capabilities would be handled in this slightly higher-level API > >> > >>> before > >> > >>> handing off to GBM and other APIs for actual allocation somehow. > >> > >> > >> > >> > >> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is > >> > >> still the "winsys" for running on "bare metal" (ie. kms). And we > >> > >> don't want to saddle $new_thing with aspects of that, but rather have > >> > >> it focus on being the thing that in multiple-"device"[1] scenarious > >> > >> figures out what sort of buffer can be allocated by who for sharing. > >> > >> Ie $new_thing should really not care about winsys level things like > >> > >> cursors or surfaces.. only buffers. > >> > >> > >> > >> The mesa implementation of $new_thing could sit on top of GBM, > >> > >> although it could also just sit on top of the same internal APIs that > >> > >> GBM sits on top of. That is an implementation detail. It could be > >> > >> that GBM grows an API to return an instance of $new_thing for > >> > >> use-cases that involve sharing a buffer with the GPU. Or perhaps that > >> > >> is exposed via some sort of EGL extension. (We probably also need a > >> > >> way to get an instance from libdrm (?) for display-only KMS drivers, > >> > >> to cover cases like etnaviv sharing a buffer with a separate display > >> > >> driver.) > >> > >> > >> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or >
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vicowrote: > Many of you may already know, but James is going to be out for a few > weeks and I'll be taking over this in the meantime. > > See inline for comments. > > On Wed, 29 Nov 2017 09:33:29 -0800 > Jason Ekstrand wrote: > >> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: >> >> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand >> > wrote: >> > > On November 24, 2017 09:29:43 Rob Clark wrote: >> > >> >> > >> >> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones >> > wrote: >> > >>> >> > >>> As many here know at this point, I've been working on solving issues >> > >>> related >> > >>> to DMA-capable memory allocation for various devices for some time now. >> > >>> I'd >> > >>> like to take this opportunity to apologize for the way I handled the >> > EGL >> > >>> stream proposals. I understand now that the development process >> > followed >> > >>> there was unacceptable to the community and likely offended many great >> > >>> engineers. >> > >>> >> > >>> Moving forward, I attempted to reboot talks in a more constructive >> > manner >> > >>> with the generic allocator library proposals & discussion forum at XDC >> > >>> 2016. >> > >>> Some great design ideas came out of that, and I've since been >> > prototyping >> > >>> some code to prove them out before bringing them back as official >> > >>> proposals. >> > >>> Again, I understand some people are growing concerned that I've been >> > >>> doing >> > >>> this off on the side in a github project that has primarily NVIDIA >> > >>> contributors. My goal was only to avoid wasting everyone's time with >> > >>> unproven ideas. The intent was never to dump the prototype code as-is >> > on >> > >>> the community and presume acceptance. It's just a public research >> > >>> project. >> > >>> >> > >>> Now the prototyping is nearing completion, and I'd like to renew >> > >>> discussion >> > >>> on whether and how the new mechanisms can be integrated with the Linux >> > >>> graphics stack. >> > >>> >> > >>> I'd be interested to know if more work is needed to demonstrate the >> > >>> usefulness of the new mechanisms, or whether people think they have >> > value >> > >>> at >> > >>> this point. >> > >>> >> > >>> After talking with people on the hallway track at XDC this year, I've >> > >>> heard >> > >>> several proposals for incorporating the new mechanisms: >> > >>> >> > >>> -Include ideas from the generic allocator design into GBM. This could >> > >>> take >> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the >> > >>> existing GBM API. >> > >>> >> > >>> -Develop a library to replace GBM. The allocator prototype code could >> > be >> > >>> massaged into something production worthy to jump start this process. >> > >>> >> > >>> -Develop a library that sits beside or on top of GBM, using GBM for >> > >>> low-level graphics buffer allocation, while supporting non-graphics >> > >>> kernel >> > >>> APIs directly. The additional cross-device negotiation and sorting of >> > >>> capabilities would be handled in this slightly higher-level API before >> > >>> handing off to GBM and other APIs for actual allocation somehow. >> > >> >> > >> >> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is >> > >> still the "winsys" for running on "bare metal" (ie. kms). And we >> > >> don't want to saddle $new_thing with aspects of that, but rather have >> > >> it focus on being the thing that in multiple-"device"[1] scenarious >> > >> figures out what sort of buffer can be allocated by who for sharing. >> > >> Ie $new_thing should really not care about winsys level things like >> > >> cursors or surfaces.. only buffers. >> > >> >> > >> The mesa implementation of $new_thing could sit on top of GBM, >> > >> although it could also just sit on top of the same internal APIs that >> > >> GBM sits on top of. That is an implementation detail. It could be >> > >> that GBM grows an API to return an instance of $new_thing for >> > >> use-cases that involve sharing a buffer with the GPU. Or perhaps that >> > >> is exposed via some sort of EGL extension. (We probably also need a >> > >> way to get an instance from libdrm (?) for display-only KMS drivers, >> > >> to cover cases like etnaviv sharing a buffer with a separate display >> > >> driver.) >> > >> >> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or >> > >> more GPUs, but also includes non-GPU devices like camera, video >> > >> decoder, "image processor" (which may or may not be part of camera), >> > >> etc, etc >> > > >> > > >> > > I'm not quite some sure what I think about this. I think I would like to >> > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a >> > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not >> > > sure. What I
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrandwrote: > On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: >> >> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand >> wrote: >> > On November 24, 2017 09:29:43 Rob Clark wrote: >> >> >> >> >> >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones >> >> wrote: >> >>> >> >>> As many here know at this point, I've been working on solving issues >> >>> related >> >>> to DMA-capable memory allocation for various devices for some time >> >>> now. >> >>> I'd >> >>> like to take this opportunity to apologize for the way I handled the >> >>> EGL >> >>> stream proposals. I understand now that the development process >> >>> followed >> >>> there was unacceptable to the community and likely offended many great >> >>> engineers. >> >>> >> >>> Moving forward, I attempted to reboot talks in a more constructive >> >>> manner >> >>> with the generic allocator library proposals & discussion forum at XDC >> >>> 2016. >> >>> Some great design ideas came out of that, and I've since been >> >>> prototyping >> >>> some code to prove them out before bringing them back as official >> >>> proposals. >> >>> Again, I understand some people are growing concerned that I've been >> >>> doing >> >>> this off on the side in a github project that has primarily NVIDIA >> >>> contributors. My goal was only to avoid wasting everyone's time with >> >>> unproven ideas. The intent was never to dump the prototype code as-is >> >>> on >> >>> the community and presume acceptance. It's just a public research >> >>> project. >> >>> >> >>> Now the prototyping is nearing completion, and I'd like to renew >> >>> discussion >> >>> on whether and how the new mechanisms can be integrated with the Linux >> >>> graphics stack. >> >>> >> >>> I'd be interested to know if more work is needed to demonstrate the >> >>> usefulness of the new mechanisms, or whether people think they have >> >>> value >> >>> at >> >>> this point. >> >>> >> >>> After talking with people on the hallway track at XDC this year, I've >> >>> heard >> >>> several proposals for incorporating the new mechanisms: >> >>> >> >>> -Include ideas from the generic allocator design into GBM. This could >> >>> take >> >>> the form of designing a "GBM 2.0" API, or incrementally adding to the >> >>> existing GBM API. >> >>> >> >>> -Develop a library to replace GBM. The allocator prototype code could >> >>> be >> >>> massaged into something production worthy to jump start this process. >> >>> >> >>> -Develop a library that sits beside or on top of GBM, using GBM for >> >>> low-level graphics buffer allocation, while supporting non-graphics >> >>> kernel >> >>> APIs directly. The additional cross-device negotiation and sorting of >> >>> capabilities would be handled in this slightly higher-level API before >> >>> handing off to GBM and other APIs for actual allocation somehow. >> >> >> >> >> >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is >> >> still the "winsys" for running on "bare metal" (ie. kms). And we >> >> don't want to saddle $new_thing with aspects of that, but rather have >> >> it focus on being the thing that in multiple-"device"[1] scenarious >> >> figures out what sort of buffer can be allocated by who for sharing. >> >> Ie $new_thing should really not care about winsys level things like >> >> cursors or surfaces.. only buffers. >> >> >> >> The mesa implementation of $new_thing could sit on top of GBM, >> >> although it could also just sit on top of the same internal APIs that >> >> GBM sits on top of. That is an implementation detail. It could be >> >> that GBM grows an API to return an instance of $new_thing for >> >> use-cases that involve sharing a buffer with the GPU. Or perhaps that >> >> is exposed via some sort of EGL extension. (We probably also need a >> >> way to get an instance from libdrm (?) for display-only KMS drivers, >> >> to cover cases like etnaviv sharing a buffer with a separate display >> >> driver.) >> >> >> >> [1] where "devices" could be multiple GPUs or multiple APIs for one or >> >> more GPUs, but also includes non-GPU devices like camera, video >> >> decoder, "image processor" (which may or may not be part of camera), >> >> etc, etc >> > >> > >> > I'm not quite some sure what I think about this. I think I would like >> > to >> > see $new_thing at least replace the guts of GBM. Whether GBM becomes a >> > wrapper around $new_thing or $new_thing implements the GBM API, I'm not >> > sure. What I don't think I want is to see GBM development continuing on >> > it's own so we have two competing solutions. >> >> I don't really view them as competing.. there is *some* overlap, ie. >> allocating a buffer.. but even if you are using GBM w/out $new_thing >> you could allocate a buffer externally and import it. I don't see >> $new_thing as that much different from GBM PoV. >> >> But things
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
Many of you may already know, but James is going to be out for a few weeks and I'll be taking over this in the meantime. See inline for comments. On Wed, 29 Nov 2017 09:33:29 -0800 Jason Ekstrandwrote: > On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark wrote: > > > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand > > wrote: > > > On November 24, 2017 09:29:43 Rob Clark wrote: > > >> > > >> > > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones > > wrote: > > >>> > > >>> As many here know at this point, I've been working on solving issues > > >>> related > > >>> to DMA-capable memory allocation for various devices for some time now. > > >>> I'd > > >>> like to take this opportunity to apologize for the way I handled the > > EGL > > >>> stream proposals. I understand now that the development process > > followed > > >>> there was unacceptable to the community and likely offended many great > > >>> engineers. > > >>> > > >>> Moving forward, I attempted to reboot talks in a more constructive > > manner > > >>> with the generic allocator library proposals & discussion forum at XDC > > >>> 2016. > > >>> Some great design ideas came out of that, and I've since been > > prototyping > > >>> some code to prove them out before bringing them back as official > > >>> proposals. > > >>> Again, I understand some people are growing concerned that I've been > > >>> doing > > >>> this off on the side in a github project that has primarily NVIDIA > > >>> contributors. My goal was only to avoid wasting everyone's time with > > >>> unproven ideas. The intent was never to dump the prototype code as-is > > on > > >>> the community and presume acceptance. It's just a public research > > >>> project. > > >>> > > >>> Now the prototyping is nearing completion, and I'd like to renew > > >>> discussion > > >>> on whether and how the new mechanisms can be integrated with the Linux > > >>> graphics stack. > > >>> > > >>> I'd be interested to know if more work is needed to demonstrate the > > >>> usefulness of the new mechanisms, or whether people think they have > > value > > >>> at > > >>> this point. > > >>> > > >>> After talking with people on the hallway track at XDC this year, I've > > >>> heard > > >>> several proposals for incorporating the new mechanisms: > > >>> > > >>> -Include ideas from the generic allocator design into GBM. This could > > >>> take > > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the > > >>> existing GBM API. > > >>> > > >>> -Develop a library to replace GBM. The allocator prototype code could > > be > > >>> massaged into something production worthy to jump start this process. > > >>> > > >>> -Develop a library that sits beside or on top of GBM, using GBM for > > >>> low-level graphics buffer allocation, while supporting non-graphics > > >>> kernel > > >>> APIs directly. The additional cross-device negotiation and sorting of > > >>> capabilities would be handled in this slightly higher-level API before > > >>> handing off to GBM and other APIs for actual allocation somehow. > > >> > > >> > > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is > > >> still the "winsys" for running on "bare metal" (ie. kms). And we > > >> don't want to saddle $new_thing with aspects of that, but rather have > > >> it focus on being the thing that in multiple-"device"[1] scenarious > > >> figures out what sort of buffer can be allocated by who for sharing. > > >> Ie $new_thing should really not care about winsys level things like > > >> cursors or surfaces.. only buffers. > > >> > > >> The mesa implementation of $new_thing could sit on top of GBM, > > >> although it could also just sit on top of the same internal APIs that > > >> GBM sits on top of. That is an implementation detail. It could be > > >> that GBM grows an API to return an instance of $new_thing for > > >> use-cases that involve sharing a buffer with the GPU. Or perhaps that > > >> is exposed via some sort of EGL extension. (We probably also need a > > >> way to get an instance from libdrm (?) for display-only KMS drivers, > > >> to cover cases like etnaviv sharing a buffer with a separate display > > >> driver.) > > >> > > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or > > >> more GPUs, but also includes non-GPU devices like camera, video > > >> decoder, "image processor" (which may or may not be part of camera), > > >> etc, etc > > > > > > > > > I'm not quite some sure what I think about this. I think I would like to > > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a > > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not > > > sure. What I don't think I want is to see GBM development continuing on > > > it's own so we have two competing solutions. > > > > I don't really view them as
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Wed, Nov 29, 2017 at 4:19 AM, Nicolai Hähnlewrote: > On 25.11.2017 18:46, Jason Ekstrand wrote: > >> I'm not quite some sure what I think about this. I think I would like to >> see $new_thing at least replace the guts of GBM. Whether GBM becomes a >> wrapper around $new_thing or $new_thing implements the GBM API, I'm not >> sure. What I don't think I want is to see GBM development continuing on >> it's own so we have two competing solutions. >> >> I *think* I like the idea of having $new_thing implement GBM as a >> deprecated legacy API. Whether that means we start by pulling GBM out into >> it's own project or we start over, I don't know. My feeling is that the >> current dri_interface is *not* what we want which is why starting with GBM >> makes me nervous. >> > > Why not? > > The most basic part of the dri_interface is just a > __driDriverGetExtensions_xxx function that returns an array of pointers to > extension structures derived from __DRIextension. > > That is *perfectly fine*. > Fair enough. I'm perfectly happy to re-use a well-tested API extension mechanism. > I completely agree if you limit your statement to saying that the current > *set of extensions* that are exposed by this interface are full of X-isms, > and it's a good idea to do a thorough house-cleaning in there. This can go > all the way up to eventually phasing out the DRI_Core "extension" as far as > I'm concerned. > That's more of what I was getting at. In particular, I don't want the design of $new_thing to be constrained by trying to cram into the current DRI extensions nor do I want it to attempt to have exactly the same set of functionality as the current DRI extensions (or GBM) support. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Sat, Nov 25, 2017 at 1:20 PM, Rob Clarkwrote: > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand > wrote: > > On November 24, 2017 09:29:43 Rob Clark wrote: > >> > >> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones > wrote: > >>> > >>> As many here know at this point, I've been working on solving issues > >>> related > >>> to DMA-capable memory allocation for various devices for some time now. > >>> I'd > >>> like to take this opportunity to apologize for the way I handled the > EGL > >>> stream proposals. I understand now that the development process > followed > >>> there was unacceptable to the community and likely offended many great > >>> engineers. > >>> > >>> Moving forward, I attempted to reboot talks in a more constructive > manner > >>> with the generic allocator library proposals & discussion forum at XDC > >>> 2016. > >>> Some great design ideas came out of that, and I've since been > prototyping > >>> some code to prove them out before bringing them back as official > >>> proposals. > >>> Again, I understand some people are growing concerned that I've been > >>> doing > >>> this off on the side in a github project that has primarily NVIDIA > >>> contributors. My goal was only to avoid wasting everyone's time with > >>> unproven ideas. The intent was never to dump the prototype code as-is > on > >>> the community and presume acceptance. It's just a public research > >>> project. > >>> > >>> Now the prototyping is nearing completion, and I'd like to renew > >>> discussion > >>> on whether and how the new mechanisms can be integrated with the Linux > >>> graphics stack. > >>> > >>> I'd be interested to know if more work is needed to demonstrate the > >>> usefulness of the new mechanisms, or whether people think they have > value > >>> at > >>> this point. > >>> > >>> After talking with people on the hallway track at XDC this year, I've > >>> heard > >>> several proposals for incorporating the new mechanisms: > >>> > >>> -Include ideas from the generic allocator design into GBM. This could > >>> take > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the > >>> existing GBM API. > >>> > >>> -Develop a library to replace GBM. The allocator prototype code could > be > >>> massaged into something production worthy to jump start this process. > >>> > >>> -Develop a library that sits beside or on top of GBM, using GBM for > >>> low-level graphics buffer allocation, while supporting non-graphics > >>> kernel > >>> APIs directly. The additional cross-device negotiation and sorting of > >>> capabilities would be handled in this slightly higher-level API before > >>> handing off to GBM and other APIs for actual allocation somehow. > >> > >> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is > >> still the "winsys" for running on "bare metal" (ie. kms). And we > >> don't want to saddle $new_thing with aspects of that, but rather have > >> it focus on being the thing that in multiple-"device"[1] scenarious > >> figures out what sort of buffer can be allocated by who for sharing. > >> Ie $new_thing should really not care about winsys level things like > >> cursors or surfaces.. only buffers. > >> > >> The mesa implementation of $new_thing could sit on top of GBM, > >> although it could also just sit on top of the same internal APIs that > >> GBM sits on top of. That is an implementation detail. It could be > >> that GBM grows an API to return an instance of $new_thing for > >> use-cases that involve sharing a buffer with the GPU. Or perhaps that > >> is exposed via some sort of EGL extension. (We probably also need a > >> way to get an instance from libdrm (?) for display-only KMS drivers, > >> to cover cases like etnaviv sharing a buffer with a separate display > >> driver.) > >> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or > >> more GPUs, but also includes non-GPU devices like camera, video > >> decoder, "image processor" (which may or may not be part of camera), > >> etc, etc > > > > > > I'm not quite some sure what I think about this. I think I would like to > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not > > sure. What I don't think I want is to see GBM development continuing on > > it's own so we have two competing solutions. > > I don't really view them as competing.. there is *some* overlap, ie. > allocating a buffer.. but even if you are using GBM w/out $new_thing > you could allocate a buffer externally and import it. I don't see > $new_thing as that much different from GBM PoV. > > But things like surfaces (aka swap chains) seem a bit out of place > when you are thinking about implementing $new_thing for non-gpu > devices. Plus EGL<->GBM tie-ins that seem out of place when talking > about a (for ex.) camera. I kinda
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On 25.11.2017 18:46, Jason Ekstrand wrote: I'm not quite some sure what I think about this. I think I would like to see $new_thing at least replace the guts of GBM. Whether GBM becomes a wrapper around $new_thing or $new_thing implements the GBM API, I'm not sure. What I don't think I want is to see GBM development continuing on it's own so we have two competing solutions. I *think* I like the idea of having $new_thing implement GBM as a deprecated legacy API. Whether that means we start by pulling GBM out into it's own project or we start over, I don't know. My feeling is that the current dri_interface is *not* what we want which is why starting with GBM makes me nervous. Why not? The most basic part of the dri_interface is just a __driDriverGetExtensions_xxx function that returns an array of pointers to extension structures derived from __DRIextension. That is *perfectly fine*. I completely agree if you limit your statement to saying that the current *set of extensions* that are exposed by this interface are full of X-isms, and it's a good idea to do a thorough house-cleaning in there. This can go all the way up to eventually phasing out the DRI_Core "extension" as far as I'm concerned. I know it's tempting to reinvent the world every couple of years, but it's just *better* to find an evolutionary path that makes sense. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrandwrote: > On November 24, 2017 09:29:43 Rob Clark wrote: >> >> >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones wrote: >>> >>> As many here know at this point, I've been working on solving issues >>> related >>> to DMA-capable memory allocation for various devices for some time now. >>> I'd >>> like to take this opportunity to apologize for the way I handled the EGL >>> stream proposals. I understand now that the development process followed >>> there was unacceptable to the community and likely offended many great >>> engineers. >>> >>> Moving forward, I attempted to reboot talks in a more constructive manner >>> with the generic allocator library proposals & discussion forum at XDC >>> 2016. >>> Some great design ideas came out of that, and I've since been prototyping >>> some code to prove them out before bringing them back as official >>> proposals. >>> Again, I understand some people are growing concerned that I've been >>> doing >>> this off on the side in a github project that has primarily NVIDIA >>> contributors. My goal was only to avoid wasting everyone's time with >>> unproven ideas. The intent was never to dump the prototype code as-is on >>> the community and presume acceptance. It's just a public research >>> project. >>> >>> Now the prototyping is nearing completion, and I'd like to renew >>> discussion >>> on whether and how the new mechanisms can be integrated with the Linux >>> graphics stack. >>> >>> I'd be interested to know if more work is needed to demonstrate the >>> usefulness of the new mechanisms, or whether people think they have value >>> at >>> this point. >>> >>> After talking with people on the hallway track at XDC this year, I've >>> heard >>> several proposals for incorporating the new mechanisms: >>> >>> -Include ideas from the generic allocator design into GBM. This could >>> take >>> the form of designing a "GBM 2.0" API, or incrementally adding to the >>> existing GBM API. >>> >>> -Develop a library to replace GBM. The allocator prototype code could be >>> massaged into something production worthy to jump start this process. >>> >>> -Develop a library that sits beside or on top of GBM, using GBM for >>> low-level graphics buffer allocation, while supporting non-graphics >>> kernel >>> APIs directly. The additional cross-device negotiation and sorting of >>> capabilities would be handled in this slightly higher-level API before >>> handing off to GBM and other APIs for actual allocation somehow. >> >> >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is >> still the "winsys" for running on "bare metal" (ie. kms). And we >> don't want to saddle $new_thing with aspects of that, but rather have >> it focus on being the thing that in multiple-"device"[1] scenarious >> figures out what sort of buffer can be allocated by who for sharing. >> Ie $new_thing should really not care about winsys level things like >> cursors or surfaces.. only buffers. >> >> The mesa implementation of $new_thing could sit on top of GBM, >> although it could also just sit on top of the same internal APIs that >> GBM sits on top of. That is an implementation detail. It could be >> that GBM grows an API to return an instance of $new_thing for >> use-cases that involve sharing a buffer with the GPU. Or perhaps that >> is exposed via some sort of EGL extension. (We probably also need a >> way to get an instance from libdrm (?) for display-only KMS drivers, >> to cover cases like etnaviv sharing a buffer with a separate display >> driver.) >> >> [1] where "devices" could be multiple GPUs or multiple APIs for one or >> more GPUs, but also includes non-GPU devices like camera, video >> decoder, "image processor" (which may or may not be part of camera), >> etc, etc > > > I'm not quite some sure what I think about this. I think I would like to > see $new_thing at least replace the guts of GBM. Whether GBM becomes a > wrapper around $new_thing or $new_thing implements the GBM API, I'm not > sure. What I don't think I want is to see GBM development continuing on > it's own so we have two competing solutions. I don't really view them as competing.. there is *some* overlap, ie. allocating a buffer.. but even if you are using GBM w/out $new_thing you could allocate a buffer externally and import it. I don't see $new_thing as that much different from GBM PoV. But things like surfaces (aka swap chains) seem a bit out of place when you are thinking about implementing $new_thing for non-gpu devices. Plus EGL<->GBM tie-ins that seem out of place when talking about a (for ex.) camera. I kinda don't want to throw out the baby with the bathwater here. *maybe* GBM could be partially implemented on top of $new_thing. I don't quite see how that would work. Possibly we could deprecate parts of GBM that are no longer needed? idk.. Either way, I fully expect that GBM and
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On November 24, 2017 09:29:43 Rob Clarkwrote: On Mon, Nov 20, 2017 at 8:11 PM, James Jones wrote: As many here know at this point, I've been working on solving issues related to DMA-capable memory allocation for various devices for some time now. I'd like to take this opportunity to apologize for the way I handled the EGL stream proposals. I understand now that the development process followed there was unacceptable to the community and likely offended many great engineers. Moving forward, I attempted to reboot talks in a more constructive manner with the generic allocator library proposals & discussion forum at XDC 2016. Some great design ideas came out of that, and I've since been prototyping some code to prove them out before bringing them back as official proposals. Again, I understand some people are growing concerned that I've been doing this off on the side in a github project that has primarily NVIDIA contributors. My goal was only to avoid wasting everyone's time with unproven ideas. The intent was never to dump the prototype code as-is on the community and presume acceptance. It's just a public research project. Now the prototyping is nearing completion, and I'd like to renew discussion on whether and how the new mechanisms can be integrated with the Linux graphics stack. I'd be interested to know if more work is needed to demonstrate the usefulness of the new mechanisms, or whether people think they have value at this point. After talking with people on the hallway track at XDC this year, I've heard several proposals for incorporating the new mechanisms: -Include ideas from the generic allocator design into GBM. This could take the form of designing a "GBM 2.0" API, or incrementally adding to the existing GBM API. -Develop a library to replace GBM. The allocator prototype code could be massaged into something production worthy to jump start this process. -Develop a library that sits beside or on top of GBM, using GBM for low-level graphics buffer allocation, while supporting non-graphics kernel APIs directly. The additional cross-device negotiation and sorting of capabilities would be handled in this slightly higher-level API before handing off to GBM and other APIs for actual allocation somehow. tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is still the "winsys" for running on "bare metal" (ie. kms). And we don't want to saddle $new_thing with aspects of that, but rather have it focus on being the thing that in multiple-"device"[1] scenarious figures out what sort of buffer can be allocated by who for sharing. Ie $new_thing should really not care about winsys level things like cursors or surfaces.. only buffers. The mesa implementation of $new_thing could sit on top of GBM, although it could also just sit on top of the same internal APIs that GBM sits on top of. That is an implementation detail. It could be that GBM grows an API to return an instance of $new_thing for use-cases that involve sharing a buffer with the GPU. Or perhaps that is exposed via some sort of EGL extension. (We probably also need a way to get an instance from libdrm (?) for display-only KMS drivers, to cover cases like etnaviv sharing a buffer with a separate display driver.) [1] where "devices" could be multiple GPUs or multiple APIs for one or more GPUs, but also includes non-GPU devices like camera, video decoder, "image processor" (which may or may not be part of camera), etc, etc I'm not quite some sure what I think about this. I think I would like to see $new_thing at least replace the guts of GBM. Whether GBM becomes a wrapper around $new_thing or $new_thing implements the GBM API, I'm not sure. What I don't think I want is to see GBM development continuing on it's own so we have two competing solutions. I *think* I like the idea of having $new_thing implement GBM as a deprecated legacy API. Whether that means we start by pulling GBM out into it's own project or we start over, I don't know. My feeling is that the current dri_interface is *not* what we want which is why starting with GBM makes me nervous. I need to go read through your code before I can provide a stronger or more nuanced opinion. That's not going to happen before the end of the year. -I have also heard some general comments that regardless of the relationship between GBM and the new allocator mechanisms, it might be time to move GBM out of Mesa so it can be developed as a stand-alone project. I'd be interested what others think about that, as it would be something worth coordinating with any other new development based on or inside of GBM. +1 We already have at least a couple different non-mesa implementations of GBM (which afaict tend to lag behind mesa's GBM and cause headaches). The extracted part probably isn't much more than a header and shim. But probably does need to grow some versioning for the backend to know if, for example,
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On November 24, 2017 09:45:07 Jason Ekstrandwrote: On November 23, 2017 09:00:05 Emil Velikov wrote: Hi James, On 21 November 2017 at 01:11, James Jones wrote: -I have also heard some general comments that regardless of the relationship between GBM and the new allocator mechanisms, it might be time to move GBM out of Mesa so it can be developed as a stand-alone project. I'd be interested what others think about that, as it would be something worth coordinating with any other new development based on or inside of GBM. Having a GBM frontend is one thing I've been pondering as well. Regardless of exact solution wrt the new allocator, having a clear frontend/backend separation for GBM will be beneficial. I'll be giving it a stab these days. I'm not sure what you mean by that. It currently has something that looks like separation but it's a joke. Unless we have a real reason to have anything other than a dri_interface back-end, I'd rather we just stop pretending and drop the extra layer of function pointer indirection entirely. Gah! I didn't read Rob's email before writing this. It looks like there is a use-case for this. I'm still a bit skeptical about whether or not we really want to extend what we have our if it would be better to start over and just require that the new thing also support the current GBM ABI. --Jason Disclaimer: Mostly thinking out loud, so please take the following with grain of salt. On the details wrt the new allocator project, I think that having a new lean library would be a good idea. One could borrow ideas from GBM, but by default no connection between the two should be required. That might lead to having a the initial hurdle of porting a bit harder, but it will allow for more efficient driver implementation. HTH Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On November 23, 2017 09:00:05 Emil Velikovwrote: Hi James, On 21 November 2017 at 01:11, James Jones wrote: -I have also heard some general comments that regardless of the relationship between GBM and the new allocator mechanisms, it might be time to move GBM out of Mesa so it can be developed as a stand-alone project. I'd be interested what others think about that, as it would be something worth coordinating with any other new development based on or inside of GBM. Having a GBM frontend is one thing I've been pondering as well. Regardless of exact solution wrt the new allocator, having a clear frontend/backend separation for GBM will be beneficial. I'll be giving it a stab these days. I'm not sure what you mean by that. It currently has something that looks like separation but it's a joke. Unless we have a real reason to have anything other than a dri_interface back-end, I'd rather we just stop pretending and drop the extra layer of function pointer indirection entirely. --Jason Disclaimer: Mostly thinking out loud, so please take the following with grain of salt. On the details wrt the new allocator project, I think that having a new lean library would be a good idea. One could borrow ideas from GBM, but by default no connection between the two should be required. That might lead to having a the initial hurdle of porting a bit harder, but it will allow for more efficient driver implementation. HTH Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
On Mon, Nov 20, 2017 at 8:11 PM, James Joneswrote: > As many here know at this point, I've been working on solving issues related > to DMA-capable memory allocation for various devices for some time now. I'd > like to take this opportunity to apologize for the way I handled the EGL > stream proposals. I understand now that the development process followed > there was unacceptable to the community and likely offended many great > engineers. > > Moving forward, I attempted to reboot talks in a more constructive manner > with the generic allocator library proposals & discussion forum at XDC 2016. > Some great design ideas came out of that, and I've since been prototyping > some code to prove them out before bringing them back as official proposals. > Again, I understand some people are growing concerned that I've been doing > this off on the side in a github project that has primarily NVIDIA > contributors. My goal was only to avoid wasting everyone's time with > unproven ideas. The intent was never to dump the prototype code as-is on > the community and presume acceptance. It's just a public research project. > > Now the prototyping is nearing completion, and I'd like to renew discussion > on whether and how the new mechanisms can be integrated with the Linux > graphics stack. > > I'd be interested to know if more work is needed to demonstrate the > usefulness of the new mechanisms, or whether people think they have value at > this point. > > After talking with people on the hallway track at XDC this year, I've heard > several proposals for incorporating the new mechanisms: > > -Include ideas from the generic allocator design into GBM. This could take > the form of designing a "GBM 2.0" API, or incrementally adding to the > existing GBM API. > > -Develop a library to replace GBM. The allocator prototype code could be > massaged into something production worthy to jump start this process. > > -Develop a library that sits beside or on top of GBM, using GBM for > low-level graphics buffer allocation, while supporting non-graphics kernel > APIs directly. The additional cross-device negotiation and sorting of > capabilities would be handled in this slightly higher-level API before > handing off to GBM and other APIs for actual allocation somehow. tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is still the "winsys" for running on "bare metal" (ie. kms). And we don't want to saddle $new_thing with aspects of that, but rather have it focus on being the thing that in multiple-"device"[1] scenarious figures out what sort of buffer can be allocated by who for sharing. Ie $new_thing should really not care about winsys level things like cursors or surfaces.. only buffers. The mesa implementation of $new_thing could sit on top of GBM, although it could also just sit on top of the same internal APIs that GBM sits on top of. That is an implementation detail. It could be that GBM grows an API to return an instance of $new_thing for use-cases that involve sharing a buffer with the GPU. Or perhaps that is exposed via some sort of EGL extension. (We probably also need a way to get an instance from libdrm (?) for display-only KMS drivers, to cover cases like etnaviv sharing a buffer with a separate display driver.) [1] where "devices" could be multiple GPUs or multiple APIs for one or more GPUs, but also includes non-GPU devices like camera, video decoder, "image processor" (which may or may not be part of camera), etc, etc > -I have also heard some general comments that regardless of the relationship > between GBM and the new allocator mechanisms, it might be time to move GBM > out of Mesa so it can be developed as a stand-alone project. I'd be > interested what others think about that, as it would be something worth > coordinating with any other new development based on or inside of GBM. +1 We already have at least a couple different non-mesa implementations of GBM (which afaict tend to lag behind mesa's GBM and cause headaches). The extracted part probably isn't much more than a header and shim. But probably does need to grow some versioning for the backend to know if, for example, gbm->bo_map() is supported.. at least it could provide stubs that return an error, rather than having link-time fail if building something w/ $vendor's old gbm implementation. > And of course I'm open to any other ideas for integration. Beyond just > where this code would live, there is much to debate about the mechanisms > themselves and all the implementation details. I was just hoping to kick > things off with something high level to start. My $0.02, is that the place where devel happens and place to go for releases could be different. Either way, I would like to see git tree for tagged release versions live on fd.o and use the common release process[2] for generating/uploading release tarballs that distros can use. [2]
Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals
Hi James, On 21 November 2017 at 01:11, James Joneswrote: > -I have also heard some general comments that regardless of the relationship > between GBM and the new allocator mechanisms, it might be time to move GBM > out of Mesa so it can be developed as a stand-alone project. I'd be > interested what others think about that, as it would be something worth > coordinating with any other new development based on or inside of GBM. > Having a GBM frontend is one thing I've been pondering as well. Regardless of exact solution wrt the new allocator, having a clear frontend/backend separation for GBM will be beneficial. I'll be giving it a stab these days. Disclaimer: Mostly thinking out loud, so please take the following with grain of salt. On the details wrt the new allocator project, I think that having a new lean library would be a good idea. One could borrow ideas from GBM, but by default no connection between the two should be required. That might lead to having a the initial hurdle of porting a bit harder, but it will allow for more efficient driver implementation. HTH Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev