subject:"Re\: \[Mesa\-dev\] GBM and the Device Memory Allocator Proposals"

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-08 Thread Miguel Angel Vico



On Wed, 6 Dec 2017 16:57:45 -0800
James Jones  wrote:

> On 12/06/2017 03:25 AM, Nicolai Hähnle wrote:
> > On 06.12.2017 08:07, James Jones wrote:
> > [snip]  
> >> So lets say you have a setup where both display and GPU supported
> >> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
> >> (FOO/cached).  But the GPU supported the following transitions:
> >>
> >>     trans_a: FOO/CC -> null
> >>     trans_b: FOO/cached -> null
> >>
> >> Then the sets for each device (in order of preference):
> >>
> >> GPU:
> >>     1: caps(FOO/tiled, FOO/CC, FOO/cached); 
> >> constraints(alignment=32k)
> >>     2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
> >>     3: caps(FOO/tiled); constraints(alignment=32k)
> >>
> >> Display:
> >>     1: caps(FOO/tiled); constraints(alignment=64k)
> >>
> >> Merged Result:
> >>     1: caps(FOO/tiled, FOO/CC, FOO/cached); 
> >> constraints(alignment=64k);
> >>    transition(GPU->display: trans_a, trans_b; display->GPU: none)
> >>     2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
> >>    transition(GPU->display: trans_a; display->GPU: none)
> >>     3: caps(FOO/tiled); constraints(alignment=64k);
> >>    transition(GPU->display: none; display->GPU: none)  
> >
> >
> > We definitely don't want to expose a way of getting uncached rendering
> > surfaces for radeonsi. I mean, I think we are supposed to be able 
> > to program
> > our hardware so that the backend bypasses all caches, but (a) nobody
> > validates that and (b) it's basically suicide in terms of 
> > performance. Let's
> > build fewer footguns :)  
> 
>  sure, this was just a hypothetical example.  But to take this case as
>  another example, if you didn't want to expose uncached rendering (or
>  cached w/ cache flushes after each draw), you would exclude the entry
>  from the GPU set which didn't have FOO/cached (I'm adding back a
>  cached but not CC config just to make it interesting), and end up
>  with:
> 
>      trans_a: FOO/CC -> null
>      trans_b: FOO/cached -> null
> 
>  GPU:
>     1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
>     2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k)
> 
>  Display:
>     1: caps(FOO/tiled); constraints(alignment=64k)
> 
>  Merged Result:
>     1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
>    transition(GPU->display: trans_a, trans_b; display->GPU: none)
>     2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k);
>    transition(GPU->display: trans_b; display->GPU: none)
> 
>  So there isn't anything in the result set that doesn't have GPU cache,
>  and the cache-flush transition is always in the set of required
>  transitions going from GPU -> display
> 
>  Hmm, I guess this does require the concept of a required cap..  
> >>>
> >>> Which we already introduced to the allocator API when we realized we
> >>> would need them as we were prototyping.  
> >>
> >> Note I also posed the question of whether things like cached (and 
> >> similarly compression, since I view compression as roughly an 
> >> equivalent mechanism to a cache) in one of the open issues on my XDC 
> >> 2017 slides because of this very problem of over-pruning it causes.  
> >> It's on slide 15, as "No device-local capabilities".  You'll have to 
> >> listen to my coverage of it in the recorded presentation for that 
> >> slide to make any sense, but it's the same thing Nicolai has laid out 
> >> here.
> >>
> >> As I continued working through our prototype driver support, I found I 
> >> didn't actually need to include cached or compressed as capabilities: 
> >> The GPU just applies them as needed and the usage transitions make it 
> >> transparent to the non-GPU engines.  That does mean the GPU driver 
> >> currently needs to be the one to realize the allocation from the 
> >> capability set to get optimal behavior.  We could fix that by 
> >> reworking our driver though.  At this point, not including 
> >> device-local properties like on-device caching in capabilities seems 
> >> like the right solution to me.  I'm curious whether this applies 
> >> universally though, or if other hardware doesn't fit the "compression 
> >> and stuff all behaves like a cache" idiom.  
> > 
> > Compression is a part of the memory layout for us: framebuffer 
> > compression uses an additional "meta surface". At the most basic level, 
> > an allocation with loss-less compression support is by necessity bigger 
> > than an allocation without.
> > 
> > We can allocate this meta surface separately, but then we're forced to 
> > decompress when passing the surface around (e.g. to a compositor.)
> > 
> > Consider also the example I gave elsewhere, where a

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread James Jones


On 12/06/2017 03:25 AM, Nicolai Hähnle wrote:

On 06.12.2017 08:07, James Jones wrote:
[snip]

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

    trans_a: FOO/CC -> null
    trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
    1: caps(FOO/tiled, FOO/CC, FOO/cached); 
constraints(alignment=32k)

    2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
    3: caps(FOO/tiled); constraints(alignment=32k)

Display:
    1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
    1: caps(FOO/tiled, FOO/CC, FOO/cached); 
constraints(alignment=64k);

   transition(GPU->display: trans_a, trans_b; display->GPU: none)
    2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
   transition(GPU->display: trans_a; display->GPU: none)
    3: caps(FOO/tiled); constraints(alignment=64k);
   transition(GPU->display: none; display->GPU: none)



We definitely don't want to expose a way of getting uncached rendering
surfaces for radeonsi. I mean, I think we are supposed to be able 
to program

our hardware so that the backend bypasses all caches, but (a) nobody
validates that and (b) it's basically suicide in terms of 
performance. Let's

build fewer footguns :)


sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you would exclude the entry
from the GPU set which didn't have FOO/cached (I'm adding back a
cached but not CC config just to make it interesting), and end up
with:

    trans_a: FOO/CC -> null
    trans_b: FOO/cached -> null

GPU:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
   2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k)

Display:
   1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_a, trans_b; display->GPU: none)
   2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_b; display->GPU: none)

So there isn't anything in the result set that doesn't have GPU cache,
and the cache-flush transition is always in the set of required
transitions going from GPU -> display

Hmm, I guess this does require the concept of a required cap..


Which we already introduced to the allocator API when we realized we
would need them as we were prototyping.


Note I also posed the question of whether things like cached (and 
similarly compression, since I view compression as roughly an 
equivalent mechanism to a cache) in one of the open issues on my XDC 
2017 slides because of this very problem of over-pruning it causes.  
It's on slide 15, as "No device-local capabilities".  You'll have to 
listen to my coverage of it in the recorded presentation for that 
slide to make any sense, but it's the same thing Nicolai has laid out 
here.


As I continued working through our prototype driver support, I found I 
didn't actually need to include cached or compressed as capabilities: 
The GPU just applies them as needed and the usage transitions make it 
transparent to the non-GPU engines.  That does mean the GPU driver 
currently needs to be the one to realize the allocation from the 
capability set to get optimal behavior.  We could fix that by 
reworking our driver though.  At this point, not including 
device-local properties like on-device caching in capabilities seems 
like the right solution to me.  I'm curious whether this applies 
universally though, or if other hardware doesn't fit the "compression 
and stuff all behaves like a cache" idiom.


Compression is a part of the memory layout for us: framebuffer 
compression uses an additional "meta surface". At the most basic level, 
an allocation with loss-less compression support is by necessity bigger 
than an allocation without.


We can allocate this meta surface separately, but then we're forced to 
decompress when passing the surface around (e.g. to a compositor.)


Consider also the example I gave elsewhere, where a cross-vendor tiling 
layout is combined with vendor-specific compression:


Device 1, rendering: caps(BASE/foo-tiling, VND1/compression)
Device 2, sampling/scanout: caps(BASE/foo-tiling, VND2/compression)

Some more thoughts on caching or "device-local" properties below.


Compression requires extra resources for us as well.  That's probably 
universal.  I think the distinction between the two approaches is 
whether the allocating driver deduces that compression can be used with 
a given capability set and hence adds the resources implicitly, or 
whether the capability set indicates it explicitly.  My theory is that 
the implicit path is possible, but it has downsides.  The explicit path 
is attractive due to its exact nature, as I

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread Nicolai Hähnle


On 06.12.2017 14:25, Rob Clark wrote:

On Wed, Dec 6, 2017 at 2:07 AM, James Jones  wrote:

Note I also posed the question of whether things like cached (and similarly
compression, since I view compression as roughly an equivalent mechanism to
a cache) in one of the open issues on my XDC 2017 slides because of this
very problem of over-pruning it causes.  It's on slide 15, as "No
device-local capabilities".  You'll have to listen to my coverage of it in
the recorded presentation for that slide to make any sense, but it's the
same thing Nicolai has laid out here.

As I continued working through our prototype driver support, I found I
didn't actually need to include cached or compressed as capabilities: The
GPU just applies them as needed and the usage transitions make it
transparent to the non-GPU engines.  That does mean the GPU driver currently
needs to be the one to realize the allocation from the capability set to get
optimal behavior.  We could fix that by reworking our driver though.  At
this point, not including device-local properties like on-device caching in
capabilities seems like the right solution to me.  I'm curious whether this
applies universally though, or if other hardware doesn't fit the
"compression and stuff all behaves like a cache" idiom.



Possibly a SoC(ish) type device which has a "system" cache that some
but not all devices fall into.  I *think* the intel chips w/ EDRAM
might fall into this category.  I know the idea has come up elsewhere,
although not sure if anything like that ended up in production.  It
seems like something we'd at least want to have an idea how to deal
with, even if it isn't used for device internal caches.

Not sure if similar situation could come up w/ discrete GPU and video
decode/encode engines on the same die?


It definitely could. Our GPUs currently don't have shared caches between 
gfx and video engines, but moving more and more clients under a shared 
L2 cache has been a theme over the last few generations. I doubt that's 
going to happen for the video engines any time soon, but you never know.


I don't think we really need caches as a capability for our current 
GPUs, but it may change, and in any case, we do want compression as a 
capability.




[snip]

I think I like the idea of having transitions being part of the
per-device/engine cap sets, so that such information can be used upon
merging to know which capabilities may remain or have to be dropped.

I think James's proposal for usage transitions was intended to work
with flows like:

1. App gets GPU caps for RENDER usage
2. App allocates GPU memory using a layout from (1)
3. App now decides it wants use the buffer for SCANOUT
4. App queries usage transition metadata from RENDER to SCANOUT,
   given the current memory layout.
5. Do the transition and hand the buffer off to display



No, all usages the app intends to transition to must be specified up front
when initially querying caps in the model I assumed.  The app then specifies
some subset (up to the full set) of the specified usages as a src and dst
when querying transition metadata.


The problem I see with this is that it isn't guaranteed that there will
be a chain of transitions for the buffer to be usable by display.




hmm, I guess if a buffer *can* be shared across all uses, there by
definition has to be a chain of transitions to go from any
usage+device to any other usage+device.

Possibly a separate step to query transitions avoids solving for every
possible transition when merging the caps set.. although until you do
that query I don't think you know the resulting merged caps set is
valid.

Maybe in practice for every cap FOO there exists a FOO->null (or
FOO->generic if you prefer) transition, ie. compressed->uncompressed,
cached->clean, etc.  I suppose that makes the problem easier to solve.


It really would, to the extent that I would prefer if we could bake it 
into the system as an assumption.


I have my doubts about how to manage calculating transitions cleanly at 
all without it. The metadata stuff is very vague to me.




I hadn't thought hard about it, but my initial thoughts were that it would
be required that the driver support transitioning to any single usage given
the capabilities returned.  However, transitioning to multiple usages (E.g.,
to simultaneously rendering and scanning out) could fail to produce a valid
transition, in which case the app would have to fall back to a copy in that
case, or avoid that simultaneous usage combination in some other way.


Adding transition metadata to the original capability sets, and using
that information when merging could give us a compatible memory layout
that would be usable by both GPU and display.

I'll look into extending the current merging logic to also take into
account transitions.



Yes, it'll be good to see whether this can be made to work.  I agree Rob's
example outcomes above are ideal, but it's not clear to me

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread Rob Clark

On Wed, Dec 6, 2017 at 6:25 AM, Nicolai Hähnle  wrote:
> On 06.12.2017 08:07, James Jones wrote:
> [snip]
>
>> So lets say you have a setup where both display and GPU supported
>> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
>> (FOO/cached).  But the GPU supported the following transitions:
>>
>> trans_a: FOO/CC -> null
>> trans_b: FOO/cached -> null
>>
>> Then the sets for each device (in order of preference):
>>
>> GPU:
>> 1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
>> 2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
>> 3: caps(FOO/tiled); constraints(alignment=32k)
>>
>> Display:
>> 1: caps(FOO/tiled); constraints(alignment=64k)
>>
>> Merged Result:
>> 1: caps(FOO/tiled, FOO/CC, FOO/cached);
>> constraints(alignment=64k);
>>transition(GPU->display: trans_a, trans_b; display->GPU: none)
>> 2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
>>transition(GPU->display: trans_a; display->GPU: none)
>> 3: caps(FOO/tiled); constraints(alignment=64k);
>>transition(GPU->display: none; display->GPU: none)
>
>
>
> We definitely don't want to expose a way of getting uncached rendering
> surfaces for radeonsi. I mean, I think we are supposed to be able to
> program
> our hardware so that the backend bypasses all caches, but (a) nobody
> validates that and (b) it's basically suicide in terms of performance.
> Let's
> build fewer footguns :)


 sure, this was just a hypothetical example.  But to take this case as
 another example, if you didn't want to expose uncached rendering (or
 cached w/ cache flushes after each draw), you would exclude the entry
 from the GPU set which didn't have FOO/cached (I'm adding back a
 cached but not CC config just to make it interesting), and end up
 with:

 trans_a: FOO/CC -> null
 trans_b: FOO/cached -> null

 GPU:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k)

 Display:
1: caps(FOO/tiled); constraints(alignment=64k)

 Merged Result:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
   transition(GPU->display: trans_a, trans_b; display->GPU: none)
2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k);
   transition(GPU->display: trans_b; display->GPU: none)

 So there isn't anything in the result set that doesn't have GPU cache,
 and the cache-flush transition is always in the set of required
 transitions going from GPU -> display

 Hmm, I guess this does require the concept of a required cap..
>>>
>>>
>>> Which we already introduced to the allocator API when we realized we
>>> would need them as we were prototyping.
>>
>>
>> Note I also posed the question of whether things like cached (and
>> similarly compression, since I view compression as roughly an equivalent
>> mechanism to a cache) in one of the open issues on my XDC 2017 slides
>> because of this very problem of over-pruning it causes.  It's on slide 15,
>> as "No device-local capabilities".  You'll have to listen to my coverage of
>> it in the recorded presentation for that slide to make any sense, but it's
>> the same thing Nicolai has laid out here.
>>
>> As I continued working through our prototype driver support, I found I
>> didn't actually need to include cached or compressed as capabilities: The
>> GPU just applies them as needed and the usage transitions make it
>> transparent to the non-GPU engines.  That does mean the GPU driver currently
>> needs to be the one to realize the allocation from the capability set to get
>> optimal behavior.  We could fix that by reworking our driver though.  At
>> this point, not including device-local properties like on-device caching in
>> capabilities seems like the right solution to me.  I'm curious whether this
>> applies universally though, or if other hardware doesn't fit the
>> "compression and stuff all behaves like a cache" idiom.
>
>
> Compression is a part of the memory layout for us: framebuffer compression
> uses an additional "meta surface". At the most basic level, an allocation
> with loss-less compression support is by necessity bigger than an allocation
> without.
>
> We can allocate this meta surface separately, but then we're forced to
> decompress when passing the surface around (e.g. to a compositor.)
>

side note:  I think this is pretty typical.. although afaict for
adreno at least, when you start getting into sampling from things with
multiple layers/levels, the meta surface needs to be interleaved with
the "main" surface, so it can't really be allocated after the fact.

Also for depth buffer, there is potentially an additional meta

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread Rob Clark

On Wed, Dec 6, 2017 at 2:07 AM, James Jones  wrote:
> On 12/01/2017 01:52 PM, Miguel Angel Vico wrote:
>>
>>
>>
>> On Fri, 1 Dec 2017 13:38:41 -0500
>> Rob Clark  wrote:
>>
>>>
>>> sure, this was just a hypothetical example.  But to take this case as
>>> another example, if you didn't want to expose uncached rendering (or
>>> cached w/ cache flushes after each draw), you would exclude the entry
>>> from the GPU set which didn't have FOO/cached (I'm adding back a
>>> cached but not CC config just to make it interesting), and end up
>>> with:
>>>
>>> trans_a: FOO/CC -> null
>>> trans_b: FOO/cached -> null
>>>
>>> GPU:
>>>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
>>>2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k)
>>>
>>> Display:
>>>1: caps(FOO/tiled); constraints(alignment=64k)
>>>
>>> Merged Result:
>>>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
>>>   transition(GPU->display: trans_a, trans_b; display->GPU: none)
>>>2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k);
>>>   transition(GPU->display: trans_b; display->GPU: none)
>>>
>>> So there isn't anything in the result set that doesn't have GPU cache,
>>> and the cache-flush transition is always in the set of required
>>> transitions going from GPU -> display
>>>
>>> Hmm, I guess this does require the concept of a required cap..
>>
>>
>> Which we already introduced to the allocator API when we realized we
>> would need them as we were prototyping.
>
>
> Note I also posed the question of whether things like cached (and similarly
> compression, since I view compression as roughly an equivalent mechanism to
> a cache) in one of the open issues on my XDC 2017 slides because of this
> very problem of over-pruning it causes.  It's on slide 15, as "No
> device-local capabilities".  You'll have to listen to my coverage of it in
> the recorded presentation for that slide to make any sense, but it's the
> same thing Nicolai has laid out here.
>
> As I continued working through our prototype driver support, I found I
> didn't actually need to include cached or compressed as capabilities: The
> GPU just applies them as needed and the usage transitions make it
> transparent to the non-GPU engines.  That does mean the GPU driver currently
> needs to be the one to realize the allocation from the capability set to get
> optimal behavior.  We could fix that by reworking our driver though.  At
> this point, not including device-local properties like on-device caching in
> capabilities seems like the right solution to me.  I'm curious whether this
> applies universally though, or if other hardware doesn't fit the
> "compression and stuff all behaves like a cache" idiom.
>

Possibly a SoC(ish) type device which has a "system" cache that some
but not all devices fall into.  I *think* the intel chips w/ EDRAM
might fall into this category.  I know the idea has come up elsewhere,
although not sure if anything like that ended up in production.  It
seems like something we'd at least want to have an idea how to deal
with, even if it isn't used for device internal caches.

Not sure if similar situation could come up w/ discrete GPU and video
decode/encode engines on the same die?

[snip]

>> I think I like the idea of having transitions being part of the
>> per-device/engine cap sets, so that such information can be used upon
>> merging to know which capabilities may remain or have to be dropped.
>>
>> I think James's proposal for usage transitions was intended to work
>> with flows like:
>>
>>1. App gets GPU caps for RENDER usage
>>2. App allocates GPU memory using a layout from (1)
>>3. App now decides it wants use the buffer for SCANOUT
>>4. App queries usage transition metadata from RENDER to SCANOUT,
>>   given the current memory layout.
>>5. Do the transition and hand the buffer off to display
>
>
> No, all usages the app intends to transition to must be specified up front
> when initially querying caps in the model I assumed.  The app then specifies
> some subset (up to the full set) of the specified usages as a src and dst
> when querying transition metadata.
>
>> The problem I see with this is that it isn't guaranteed that there will
>> be a chain of transitions for the buffer to be usable by display.
>

hmm, I guess if a buffer *can* be shared across all uses, there by
definition has to be a chain of transitions to go from any
usage+device to any other usage+device.

Possibly a separate step to query transitions avoids solving for every
possible transition when merging the caps set.. although until you do
that query I don't think you know the resulting merged caps set is
valid.

Maybe in practice for every cap FOO there exists a FOO->null (or
FOO->generic if you prefer) transition, ie. compressed->uncompressed,
cached->clean, etc.  I suppose that makes the problem easier to solve.

>
> I hadn't thought

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread Rob Clark

On Wed, Dec 6, 2017 at 12:52 AM, James Jones  wrote:
> On 11/30/2017 10:48 AM, Rob Clark wrote:
>>
>> On Thu, Nov 30, 2017 at 1:28 AM, James Jones  wrote:
>>>
>>> On 11/29/2017 01:10 PM, Rob Clark wrote:


 On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand 
 wrote:
>
>
> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
>>
>>
>>
>> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand
>> 
>> wrote:
>>>
>>>
>>> I'm not quite some sure what I think about this.  I think I would
>>> like
>>> to
>>> see $new_thing at least replace the guts of GBM. Whether GBM becomes
>>> a
>>> wrapper around $new_thing or $new_thing implements the GBM API, I'm
>>> not
>>> sure.  What I don't think I want is to see GBM development continuing
>>> on
>>> it's own so we have two competing solutions.
>>
>>
>>
>> I don't really view them as competing.. there is *some* overlap, ie.
>> allocating a buffer.. but even if you are using GBM w/out $new_thing
>> you could allocate a buffer externally and import it.  I don't see
>> $new_thing as that much different from GBM PoV.
>>
>> But things like surfaces (aka swap chains) seem a bit out of place
>> when you are thinking about implementing $new_thing for non-gpu
>> devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
>> about a (for ex.) camera.  I kinda don't want to throw out the baby
>> with the bathwater here.
>
>
>
>
> Agreed.  GBM is very EGLish and we don't want the new allocator to be
> that.
>
>>
>> *maybe* GBM could be partially implemented on top of $new_thing.  I
>> don't quite see how that would work.  Possibly we could deprecate
>> parts of GBM that are no longer needed?  idk..  Either way, I fully
>> expect that GBM and mesa's implementation of $new_thing could perhaps
>> sit on to of some of the same set of internal APIs.  The public
>> interface can be decoupled from the internal implementation.
>
>
>
>
> Maybe I should restate things a bit.  My real point was that modifiers
> +
> $new_thing + Kernel blob should be a complete and more powerful
> replacement
> for GBM.  I don't know that we really can implement GBM on top of it
> because
> GBM has lots of wishy-washy concepts such as "cursor plane" which may
> not
> map well at least not without querying the kernel about specifc display
> planes.  In particular, I don't want someone to feel like they need to
> use
> $new_thing and GBM at the same time or together.  Ideally, I'd like
> them
> to
> never do that unless we decide gbm_bo is a useful abstraction for
> $new_thing.
>

 (just to repeat what I mentioned on irc)

 I think main thing is how do you create a swapchain/surface and know
 which is current front buffer after SwapBuffers()..  that is the only
 bits of GBM that seem like there would still be useful.  idk, maybe
 there is some other idea.
>>>
>>>
>>>
>>> I don't view this as terribly useful except for legacy apps that need an
>>> EGL
>>> window surface and can't be updated to use new methods.  Wayland
>>> compositors
>>> certainly don't fall in that category.  I don't know that any GBM apps
>>> do.
>>
>>
>> kmscube doesn't count?  :-P
>>
>> Hmm, I assumed weston and the other wayland compositors where still
>> using gbm to create EGL surfaces, but I confess to have not actually
>> looked at weston src code for quite a few years now.
>>
>> Anyways, I think it is perfectly fine for GBM to stay as-is in it's
>> current form.  It can already import dma-buf fd's, and those can
>> certainly come from $new_thing.
>>
>> So I guess we want an EGL extension to return the allocator device
>> instance for the GPU.  That also takes care of the non-bare-metal
>> case.
>>
>>> Rather, I think the way forward for the classes of apps that need
>>> something
>>> like GBM or the generic allocator is more or less the path ChromeOS took
>>> with their graphics architecture: Render to individual buffers (using
>>> FBOs
>>> bound to imported buffers in GL) and manage buffer exchanges/blits
>>> manually.
>>>
>>> The useful abstraction surfaces provide isn't so much deciding which
>>> buffer
>>> is currently "front" and "back", but rather handling the
>>> transition/hand-off
>>> to the window system/display device/etc. in SwapBuffers(), and the whole
>>> idea of the allocator proposals is to make that something the application
>>> or
>>> at least some non-driver utility library handles explicitly based on
>>> where
>>> exactly the buffer is being handed off to.
>>
>>
>> Hmm, ok..  I guess the transition will need some hook into the driver.
>> For freedreno and vc4 (and I suspect this is not uncommon for

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread Nicolai Hähnle


On 06.12.2017 08:07, James Jones wrote:
[snip]

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

    trans_a: FOO/CC -> null
    trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
    1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
    2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
    3: caps(FOO/tiled); constraints(alignment=32k)

Display:
    1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
    1: caps(FOO/tiled, FOO/CC, FOO/cached); 
constraints(alignment=64k);

   transition(GPU->display: trans_a, trans_b; display->GPU: none)
    2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
   transition(GPU->display: trans_a; display->GPU: none)
    3: caps(FOO/tiled); constraints(alignment=64k);
   transition(GPU->display: none; display->GPU: none)



We definitely don't want to expose a way of getting uncached rendering
surfaces for radeonsi. I mean, I think we are supposed to be able to 
program

our hardware so that the backend bypasses all caches, but (a) nobody
validates that and (b) it's basically suicide in terms of 
performance. Let's

build fewer footguns :)


sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you would exclude the entry
from the GPU set which didn't have FOO/cached (I'm adding back a
cached but not CC config just to make it interesting), and end up
with:

    trans_a: FOO/CC -> null
    trans_b: FOO/cached -> null

GPU:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
   2: caps(FOO/tiled, FOO/cached); constraints(alignment=32k)

Display:
   1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_a, trans_b; display->GPU: none)
   2: caps(FOO/tiled, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_b; display->GPU: none)

So there isn't anything in the result set that doesn't have GPU cache,
and the cache-flush transition is always in the set of required
transitions going from GPU -> display

Hmm, I guess this does require the concept of a required cap..


Which we already introduced to the allocator API when we realized we
would need them as we were prototyping.


Note I also posed the question of whether things like cached (and 
similarly compression, since I view compression as roughly an equivalent 
mechanism to a cache) in one of the open issues on my XDC 2017 slides 
because of this very problem of over-pruning it causes.  It's on slide 
15, as "No device-local capabilities".  You'll have to listen to my 
coverage of it in the recorded presentation for that slide to make any 
sense, but it's the same thing Nicolai has laid out here.


As I continued working through our prototype driver support, I found I 
didn't actually need to include cached or compressed as capabilities: 
The GPU just applies them as needed and the usage transitions make it 
transparent to the non-GPU engines.  That does mean the GPU driver 
currently needs to be the one to realize the allocation from the 
capability set to get optimal behavior.  We could fix that by reworking 
our driver though.  At this point, not including device-local properties 
like on-device caching in capabilities seems like the right solution to 
me.  I'm curious whether this applies universally though, or if other 
hardware doesn't fit the "compression and stuff all behaves like a 
cache" idiom.


Compression is a part of the memory layout for us: framebuffer 
compression uses an additional "meta surface". At the most basic level, 
an allocation with loss-less compression support is by necessity bigger 
than an allocation without.


We can allocate this meta surface separately, but then we're forced to 
decompress when passing the surface around (e.g. to a compositor.)


Consider also the example I gave elsewhere, where a cross-vendor tiling 
layout is combined with vendor-specific compression:


Device 1, rendering: caps(BASE/foo-tiling, VND1/compression)
Device 2, sampling/scanout: caps(BASE/foo-tiling, VND2/compression)

Some more thoughts on caching or "device-local" properties below.


[snip]

I think I like the idea of having transitions being part of the
per-device/engine cap sets, so that such information can be used upon
merging to know which capabilities may remain or have to be dropped.

I think James's proposal for usage transitions was intended to work
with flows like:

   1. App gets GPU caps for RENDER usage
   2. App allocates GPU memory using a layout from (1)
   3. App now decides it wants use the buffer for SCANOUT
   4. App queries usage transition metadata from RENDER to SCANOUT,

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-06 Thread Nicolai Hähnle


On 06.12.2017 08:01, James Jones wrote:

On 12/01/2017 10:34 AM, Nicolai Hähnle wrote:

On 01.12.2017 18:09, Nicolai Hähnle wrote:
[snip]

As for the actual transition API, I accept that some metadata may be
required, and the metadata probably needs to depend on the memory 
layout,

which is often vendor-specific. But even linear layouts need some
transitions for caches. We probably need at least some generic 
"off-device

usage" bit.


I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)


As I wrote above, I'd prefer not to think of "cached" as a capability 
at least for radeonsi.


 From the desktop perspective, I would say let's ignore caches, the 
drivers know which caches they need to flush to make data visible to 
other devices on the system.


On the other hand, there are probably SoC cases where non-coherent 
caches are shared between some but not all devices, and in that case 
perhaps we do need to communicate this.


So perhaps we should have two kinds of "capabilities".

The first, like framebuffer compression, is a capability of the 
allocated memory layout (because the compression requires a meta 
surface), and devices that expose it may opportunistically use it.


The second, like caches, is a capability that the device/driver will 
use and you don't get a say in it, but other devices/drivers also 
don't need to be aware of them.


So then you could theoretically have a system that gives you:

GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache)
Display: FOO/tiled(layout-caps=FOO/cc)
Video:   FOO/tiled(dev-caps=FOO/vid-cache)
Camera:  FOO/tiled(dev-caps=FOO/vid-cache)

[snip]

FWIW, I think all that stuff about different caches quite likely 
over-complicates things. At the end of each "command submission" of 
whichever type of engine, the buffer must be in a state where the 
kernel is free to move it around for memory management purposes. This 
already puts a big constraint on the kind of (non-coherent) caches 
that can be supported anyway, so I wouldn't be surprised if we could 
get away with a *much* simpler approach.


I'd rather not depend on this type of cleverness if possible.  Other 
kernels/OS's may not behave this way, and I'd like the allocator 
mechanism to be something we can use across all or at least most of the 
POSIX and POSIX-like OS's we support.  Also, this particular example is 
not true of our proprietary Linux driver, and I suspect it won't always 
be the case for other drivers.  If a particular driver or OS fits this 
assumption, the driver is always free to return no-op transitions in 
that case.


Agreed.

(What I wrote about memory management should be true for all systems, 
but the kernel could use an engine that goes through the relevant caches 
for memory management-related buffer moves. It just so happens that it 
doesn't do that on our hardware, but that's by no means universal.)


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones


On 12/01/2017 01:52 PM, Miguel Angel Vico wrote:



On Fri, 1 Dec 2017 13:38:41 -0500
Rob Clark  wrote:


On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle  wrote:

On 01.12.2017 16:06, Rob Clark wrote:


On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle 
wrote:


Hi,

I've had a chance to look a bit more closely at the allocator prototype
repository now. There's a whole bunch of low-level API design feedback,
but
for now let's focus on the high-level stuff first.


Thanks for taking a look.


Going by the 4.5 major object types (as also seen on slide 5 of your
presentation [0]), assertions and usages make sense to me.

Capabilities and capability sets should be cleaned up in my opinion, as
the
status quo is overly obfuscating things. What capability sets really
represent, as far as I understand them, is *memory layouts*, and so
that's
what they should be called.

This conceptually simplifies `derive_capabilities` significantly without
any
loss of expressiveness as far as I can see. Given two lists of memory
layouts, we simply look for which memory layouts appear in both lists,
and
then merge their constraints and capabilities.

Merging constraints looks good to me.

Capabilities need some more thought. The prototype removes capabilities
when
merging layouts, but I'd argue that that is often undesirable. (In fact,
I
cannot think of capabilities which we'd always want to remove.)

A typical example for this is compression (i.e. DCC in our case). For
rendering usage, we'd return something like:

Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)

For display usage, we might return (depending on hardware):

Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)

Merging these in the prototype would remove the DCC capability, even
though
it might well make sense to keep it there for rendering. Dealing withthe
fact that display usage does not have this capability is precisely one of
the two things that transitions are about! The other thing that
transitions
are about is caches.

I think this is kind of what Rob was saying in one of his mails.



Perhaps "layout" is a better name than "caps".. either way I think of
both AMD/tiled and AMD/DCC as the same type of "thing".. the
difference between AMD/tiled and AMD/DCC is that a transition can be
provided for AMD/DCC.  Other than that they are both things describing
the layout.



The reason that a transition can be provided is that they aren't quite the
same thing, though. In a very real sense, AMD/DCC is a "child" propertyof
AMD/tiled: DCC is implemented as a meta surface whose memory layout depends
on the layout of the main surface.


I suppose this is six-of-one, half-dozen of the other..

what you are calling a layout is what I'm calling a cap that just
happens not to have an associated transition


Although, if there are GPUs that can do an in-place "transition" between
different tiling layouts, then the distinction is perhaps really not as
clear-cut. I guess that would only apply to tiled renderers.


I suppose the advantage of just calling both layout and caps the same
thing, and just saying that a "cap" (or "layout" if you prefer that
name) can optionally have one or more associated transitions, is that
you can deal with cases where sometimes a tiled format might actually
have an in-place transition ;-)

  

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

trans_a: FOO/CC -> null
trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
3: caps(FOO/tiled); constraints(alignment=32k)

Display:
1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
   transition(GPU->display: trans_a, trans_b; display->GPU: none)
2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
   transition(GPU->display: trans_a; display->GPU: none)
3: caps(FOO/tiled); constraints(alignment=64k);
   transition(GPU->display: none; display->GPU: none)



We definitely don't want to expose a way of getting uncached rendering
surfaces for radeonsi. I mean, I think we are supposed to be able to program
our hardware so that the backend bypasses all caches, but (a) nobody
validates that and (b) it's basically suicide in terms of performance. Let's
build fewer footguns :)


sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you would exclude the entry
from the GPU set which didn't have FOO/cached (I'm adding back a
cached but not CC config just to make it

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones


On 12/01/2017 10:34 AM, Nicolai Hähnle wrote:

On 01.12.2017 18:09, Nicolai Hähnle wrote:
[snip]

As for the actual transition API, I accept that some metadata may be
required, and the metadata probably needs to depend on the memory 
layout,

which is often vendor-specific. But even linear layouts need some
transitions for caches. We probably need at least some generic 
"off-device

usage" bit.


I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)


As I wrote above, I'd prefer not to think of "cached" as a capability 
at least for radeonsi.


 From the desktop perspective, I would say let's ignore caches, the 
drivers know which caches they need to flush to make data visible to 
other devices on the system.


On the other hand, there are probably SoC cases where non-coherent 
caches are shared between some but not all devices, and in that case 
perhaps we do need to communicate this.


So perhaps we should have two kinds of "capabilities".

The first, like framebuffer compression, is a capability of the 
allocated memory layout (because the compression requires a meta 
surface), and devices that expose it may opportunistically use it.


The second, like caches, is a capability that the device/driver will 
use and you don't get a say in it, but other devices/drivers also 
don't need to be aware of them.


So then you could theoretically have a system that gives you:

GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache)
Display: FOO/tiled(layout-caps=FOO/cc)
Video:   FOO/tiled(dev-caps=FOO/vid-cache)
Camera:  FOO/tiled(dev-caps=FOO/vid-cache)

[snip]

FWIW, I think all that stuff about different caches quite likely 
over-complicates things. At the end of each "command submission" of 
whichever type of engine, the buffer must be in a state where the kernel 
is free to move it around for memory management purposes. This already 
puts a big constraint on the kind of (non-coherent) caches that can be 
supported anyway, so I wouldn't be surprised if we could get away with a 
*much* simpler approach.


I'd rather not depend on this type of cleverness if possible.  Other 
kernels/OS's may not behave this way, and I'd like the allocator 
mechanism to be something we can use across all or at least most of the 
POSIX and POSIX-like OS's we support.  Also, this particular example is 
not true of our proprietary Linux driver, and I suspect it won't always 
be the case for other drivers.  If a particular driver or OS fits this 
assumption, the driver is always free to return no-op transitions in 
that case.


Thanks,
-James


Cheers,
Nicolai


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones


On 11/30/2017 12:06 PM, Lyude Paul wrote:

On Thu, 2017-11-30 at 13:20 -0500, Rob Clark wrote:

On Thu, Nov 30, 2017 at 12:59 AM, James Jones  wrote:

On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:


On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:


Do we need to define both in-place and copy transitions?  Ie. what if
GPU is still reading a tiled or compressed texture (ie. sampling from
previous frame for some reason), but we need to untile/uncompress for
display.. of maybe there are some other cases like that we should
think about..

Maybe you already have some thoughts about that?



This is the next thing I'll be working on. I haven't given it much
thought myself so far, but I think James might have had some insights.
I'll read through some of his notes to double-check.



A couple of notes on usage transitions:

While chatting about transitions, a few assertions were made by others
that
I've come to accept, despite the fact that they reduce the generality of
the
allocator mechanisms:

-GPUs are the only things that actually need usage transitions as far as I
know thus far.  Other engines either share the GPU representations of
data,
or use more limited representations; the latter being the reason non-GPU
usage transitions are a useful thing.

-It's reasonable to assume that a GPU is required to perform a usage
transition.  This follows from the above postulate.  If only GPUs are
using
more advanced representations, you don't need any transitions unless you
have a GPU available.


This seems reasonable.  I can't think of any non-gpu related case
where you would need a transition, other than perhaps cache flush/inv.


 From that, I derived the rough API proposal for transitions presented on
my
XDC 2017 slides.  Transition "metadata" is queried from the allocator
given
a pair of usages (which may refer to more than one device), but the
realization of the transition is left to existing GPU APIs.  I think I put
Vulkan-like pseudo-code in the slides, but the GL external objects
extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well.


I haven't quite wrapped my head around how this would work in the
cross-device case.. I mean from the API standpoint for the user, it
seems straightforward enough.  Just not sure how to implement that and
what the driver interface would look like.

I guess we need a capability-conversion (?).. I mean take for example
the the fb compression capability from your slide #12[1].  If we knew
there was an available transition to go from "Dev2 FB compression" to
"normal", then we could have allowed the "Dev2 FB compression" valid
set?

[1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf


Regarding in-place Vs. copy: To me a transition is something that happens
in-place, at least semantically.  If you need to make copies, that's a
format conversion blit not a transition, and graphics APIs are already
capable of expressing that without any special transitions or help from
the
allocator.  However, I understand some chipsets perform transitions using
something that looks kind of like a blit using on-chip caches and
constrained usage semantics.  There's probably some work to do to see
whether those need to be accommodated as conversion blits or usgae
transitions.


I guess part of what I was thinking of, is what happens if the
producing device is still reading from the buffer.  For example,
viddec -> gpu use case, where the video decoder is also still hanging
on to the frame to use as a reference frame to decode future frames?

I guess if transition from devA -> devB can be done in parallel with
devA still reading the buffer, it isn't a problem.  I guess that
limits (non-blit) transitions to decompression and cache op's?  Maybe
that is ok..


I don't know of a real case it would be a problem.  Note you can 
transition to multiple usages in the proposed API, so for the video 
decoder example, you would transition from [video decode target] to 
[video decode target, GPU sampler source] for simultaneous texturing and 
reference frame usage.



For our hardware's purposes, transitions are just various levels of
decompression or compression reconfiguration and potentially cache
flushing/invalidation, so our transition metadata will just be some bits
signaling which compression operation is needed, if any.  That's the sort
of
operation I modeled the API around, so if things are much more exotic than
that for others, it will probably require some adjustments.




[snip]



Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my
primary goals.  However, it's a pretty heavy thing to prototype.  If
someone
has the time though, I think it would be a great experiment.  It would
help
flesh out the paltry list of usages, constraints, and capabilities in the
existing prototype codebase.  The kmscube example really should have added
at least a "render" usage, but I got lazy and just re-used texture for
now.
That

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-05 Thread James Jones


On 11/30/2017 10:48 AM, Rob Clark wrote:

On Thu, Nov 30, 2017 at 1:28 AM, James Jones  wrote:

On 11/29/2017 01:10 PM, Rob Clark wrote:


On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand 
wrote:


On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:



On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:


I'm not quite some sure what I think about this.  I think I would like
to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing
on
it's own so we have two competing solutions.



I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.




Agreed.  GBM is very EGLish and we don't want the new allocator to be
that.



*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and mesa's implementation of $new_thing could perhaps
sit on to of some of the same set of internal APIs.  The public
interface can be decoupled from the internal implementation.




Maybe I should restate things a bit.  My real point was that modifiers +
$new_thing + Kernel blob should be a complete and more powerful
replacement
for GBM.  I don't know that we really can implement GBM on top of it
because
GBM has lots of wishy-washy concepts such as "cursor plane" which may not
map well at least not without querying the kernel about specifc display
planes.  In particular, I don't want someone to feel like they need to
use
$new_thing and GBM at the same time or together.  Ideally, I'd like them
to
never do that unless we decide gbm_bo is a useful abstraction for
$new_thing.



(just to repeat what I mentioned on irc)

I think main thing is how do you create a swapchain/surface and know
which is current front buffer after SwapBuffers()..  that is the only
bits of GBM that seem like there would still be useful.  idk, maybe
there is some other idea.



I don't view this as terribly useful except for legacy apps that need an EGL
window surface and can't be updated to use new methods.  Wayland compositors
certainly don't fall in that category.  I don't know that any GBM apps do.


kmscube doesn't count?  :-P

Hmm, I assumed weston and the other wayland compositors where still
using gbm to create EGL surfaces, but I confess to have not actually
looked at weston src code for quite a few years now.

Anyways, I think it is perfectly fine for GBM to stay as-is in it's
current form.  It can already import dma-buf fd's, and those can
certainly come from $new_thing.

So I guess we want an EGL extension to return the allocator device
instance for the GPU.  That also takes care of the non-bare-metal
case.


Rather, I think the way forward for the classes of apps that need something
like GBM or the generic allocator is more or less the path ChromeOS took
with their graphics architecture: Render to individual buffers (using FBOs
bound to imported buffers in GL) and manage buffer exchanges/blits manually.

The useful abstraction surfaces provide isn't so much deciding which buffer
is currently "front" and "back", but rather handling the transition/hand-off
to the window system/display device/etc. in SwapBuffers(), and the whole
idea of the allocator proposals is to make that something the application or
at least some non-driver utility library handles explicitly based on where
exactly the buffer is being handed off to.


Hmm, ok..  I guess the transition will need some hook into the driver.
For freedreno and vc4 (and I suspect this is not uncommon for tiler
GPUs), switching FBOs doesn't necessarily flush rendering to hw.
Maybe it would work out if you requested the sync fd file descriptor
from an EGL fence before passing things to next device, as that would
flush rendering.


This "flush" is exactly what usage transitions are for:

1) Perform rendering or texturing
2) Insert a transition into command stream using metadata extracted from 
allocator library into the rendering/texturing API using a new entry 
point.  This instructs the driver to perform any 
flushes/decompressions/etc. needed to transition to the next usage the 
pipeline.
3) Insert/extract your fence (potentially this is combined with above 
entry point like it is in GL_EXT_semaphore).



I wonder a bit

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-01 Thread Miguel Angel Vico



On Fri, 1 Dec 2017 13:38:41 -0500
Rob Clark  wrote:

> On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle  wrote:
> > On 01.12.2017 16:06, Rob Clark wrote:  
> >>
> >> On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle 
> >> wrote:  
> >>>
> >>> Hi,
> >>>
> >>> I've had a chance to look a bit more closely at the allocator prototype
> >>> repository now. There's a whole bunch of low-level API design feedback,
> >>> but
> >>> for now let's focus on the high-level stuff first.
> >>>
> >>> Going by the 4.5 major object types (as also seen on slide 5 of your
> >>> presentation [0]), assertions and usages make sense to me.
> >>>
> >>> Capabilities and capability sets should be cleaned up in my opinion, as
> >>> the
> >>> status quo is overly obfuscating things. What capability sets really
> >>> represent, as far as I understand them, is *memory layouts*, and so
> >>> that's
> >>> what they should be called.
> >>>
> >>> This conceptually simplifies `derive_capabilities` significantly without
> >>> any
> >>> loss of expressiveness as far as I can see. Given two lists of memory
> >>> layouts, we simply look for which memory layouts appear in both lists,
> >>> and
> >>> then merge their constraints and capabilities.
> >>>
> >>> Merging constraints looks good to me.
> >>>
> >>> Capabilities need some more thought. The prototype removes capabilities
> >>> when
> >>> merging layouts, but I'd argue that that is often undesirable. (In fact,
> >>> I
> >>> cannot think of capabilities which we'd always want to remove.)
> >>>
> >>> A typical example for this is compression (i.e. DCC in our case). For
> >>> rendering usage, we'd return something like:
> >>>
> >>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)
> >>>
> >>> For display usage, we might return (depending on hardware):
> >>>
> >>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)
> >>>
> >>> Merging these in the prototype would remove the DCC capability, even
> >>> though
> >>> it might well make sense to keep it there for rendering. Dealing with the
> >>> fact that display usage does not have this capability is precisely one of
> >>> the two things that transitions are about! The other thing that
> >>> transitions
> >>> are about is caches.
> >>>
> >>> I think this is kind of what Rob was saying in one of his mails.  
> >>
> >>
> >> Perhaps "layout" is a better name than "caps".. either way I think of
> >> both AMD/tiled and AMD/DCC as the same type of "thing".. the
> >> difference between AMD/tiled and AMD/DCC is that a transition can be
> >> provided for AMD/DCC.  Other than that they are both things describing
> >> the layout.  
> >
> >
> > The reason that a transition can be provided is that they aren't quite the
> > same thing, though. In a very real sense, AMD/DCC is a "child" property of
> > AMD/tiled: DCC is implemented as a meta surface whose memory layout depends
> > on the layout of the main surface.  
> 
> I suppose this is six-of-one, half-dozen of the other..
> 
> what you are calling a layout is what I'm calling a cap that just
> happens not to have an associated transition
> 
> > Although, if there are GPUs that can do an in-place "transition" between
> > different tiling layouts, then the distinction is perhaps really not as
> > clear-cut. I guess that would only apply to tiled renderers.  
> 
> I suppose the advantage of just calling both layout and caps the same
> thing, and just saying that a "cap" (or "layout" if you prefer that
> name) can optionally have one or more associated transitions, is that
> you can deal with cases where sometimes a tiled format might actually
> have an in-place transition ;-)
> 
> >  
> >> So lets say you have a setup where both display and GPU supported
> >> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
> >> (FOO/cached).  But the GPU supported the following transitions:
> >>
> >>trans_a: FOO/CC -> null
> >>trans_b: FOO/cached -> null
> >>
> >> Then the sets for each device (in order of preference):
> >>
> >> GPU:
> >>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
> >>2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
> >>3: caps(FOO/tiled); constraints(alignment=32k)
> >>
> >> Display:
> >>1: caps(FOO/tiled); constraints(alignment=64k)
> >>
> >> Merged Result:
> >>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
> >>   transition(GPU->display: trans_a, trans_b; display->GPU: none)
> >>2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
> >>   transition(GPU->display: trans_a; display->GPU: none)
> >>3: caps(FOO/tiled); constraints(alignment=64k);
> >>   transition(GPU->display: none; display->GPU: none)  
> >
> >
> > We definitely don't want to expose a way of getting uncached rendering
> > surfaces for radeonsi. I mean, I think we are supposed to be able to program
> > our hardware so that the backend

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-01 Thread Rob Clark

On Fri, Dec 1, 2017 at 12:09 PM, Nicolai Hähnle  wrote:
> On 01.12.2017 16:06, Rob Clark wrote:
>>
>> On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle 
>> wrote:
>>>
>>> Hi,
>>>
>>> I've had a chance to look a bit more closely at the allocator prototype
>>> repository now. There's a whole bunch of low-level API design feedback,
>>> but
>>> for now let's focus on the high-level stuff first.
>>>
>>> Going by the 4.5 major object types (as also seen on slide 5 of your
>>> presentation [0]), assertions and usages make sense to me.
>>>
>>> Capabilities and capability sets should be cleaned up in my opinion, as
>>> the
>>> status quo is overly obfuscating things. What capability sets really
>>> represent, as far as I understand them, is *memory layouts*, and so
>>> that's
>>> what they should be called.
>>>
>>> This conceptually simplifies `derive_capabilities` significantly without
>>> any
>>> loss of expressiveness as far as I can see. Given two lists of memory
>>> layouts, we simply look for which memory layouts appear in both lists,
>>> and
>>> then merge their constraints and capabilities.
>>>
>>> Merging constraints looks good to me.
>>>
>>> Capabilities need some more thought. The prototype removes capabilities
>>> when
>>> merging layouts, but I'd argue that that is often undesirable. (In fact,
>>> I
>>> cannot think of capabilities which we'd always want to remove.)
>>>
>>> A typical example for this is compression (i.e. DCC in our case). For
>>> rendering usage, we'd return something like:
>>>
>>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)
>>>
>>> For display usage, we might return (depending on hardware):
>>>
>>> Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)
>>>
>>> Merging these in the prototype would remove the DCC capability, even
>>> though
>>> it might well make sense to keep it there for rendering. Dealing with the
>>> fact that display usage does not have this capability is precisely one of
>>> the two things that transitions are about! The other thing that
>>> transitions
>>> are about is caches.
>>>
>>> I think this is kind of what Rob was saying in one of his mails.
>>
>>
>> Perhaps "layout" is a better name than "caps".. either way I think of
>> both AMD/tiled and AMD/DCC as the same type of "thing".. the
>> difference between AMD/tiled and AMD/DCC is that a transition can be
>> provided for AMD/DCC.  Other than that they are both things describing
>> the layout.
>
>
> The reason that a transition can be provided is that they aren't quite the
> same thing, though. In a very real sense, AMD/DCC is a "child" property of
> AMD/tiled: DCC is implemented as a meta surface whose memory layout depends
> on the layout of the main surface.

I suppose this is six-of-one, half-dozen of the other..

what you are calling a layout is what I'm calling a cap that just
happens not to have an associated transition

> Although, if there are GPUs that can do an in-place "transition" between
> different tiling layouts, then the distinction is perhaps really not as
> clear-cut. I guess that would only apply to tiled renderers.

I suppose the advantage of just calling both layout and caps the same
thing, and just saying that a "cap" (or "layout" if you prefer that
name) can optionally have one or more associated transitions, is that
you can deal with cases where sometimes a tiled format might actually
have an in-place transition ;-)

>
>> So lets say you have a setup where both display and GPU supported
>> FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
>> (FOO/cached).  But the GPU supported the following transitions:
>>
>>trans_a: FOO/CC -> null
>>trans_b: FOO/cached -> null
>>
>> Then the sets for each device (in order of preference):
>>
>> GPU:
>>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
>>2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
>>3: caps(FOO/tiled); constraints(alignment=32k)
>>
>> Display:
>>1: caps(FOO/tiled); constraints(alignment=64k)
>>
>> Merged Result:
>>1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
>>   transition(GPU->display: trans_a, trans_b; display->GPU: none)
>>2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
>>   transition(GPU->display: trans_a; display->GPU: none)
>>3: caps(FOO/tiled); constraints(alignment=64k);
>>   transition(GPU->display: none; display->GPU: none)
>
>
> We definitely don't want to expose a way of getting uncached rendering
> surfaces for radeonsi. I mean, I think we are supposed to be able to program
> our hardware so that the backend bypasses all caches, but (a) nobody
> validates that and (b) it's basically suicide in terms of performance. Let's
> build fewer footguns :)

sure, this was just a hypothetical example.  But to take this case as
another example, if you didn't want to expose uncached rendering (or
cached w/ cache flushes after each draw), you

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-01 Thread Nicolai Hähnle


On 01.12.2017 18:09, Nicolai Hähnle wrote:
[snip]

As for the actual transition API, I accept that some metadata may be
required, and the metadata probably needs to depend on the memory 
layout,

which is often vendor-specific. But even linear layouts need some
transitions for caches. We probably need at least some generic 
"off-device

usage" bit.


I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)


As I wrote above, I'd prefer not to think of "cached" as a capability at 
least for radeonsi.


 From the desktop perspective, I would say let's ignore caches, the 
drivers know which caches they need to flush to make data visible to 
other devices on the system.


On the other hand, there are probably SoC cases where non-coherent 
caches are shared between some but not all devices, and in that case 
perhaps we do need to communicate this.


So perhaps we should have two kinds of "capabilities".

The first, like framebuffer compression, is a capability of the 
allocated memory layout (because the compression requires a meta 
surface), and devices that expose it may opportunistically use it.


The second, like caches, is a capability that the device/driver will use 
and you don't get a say in it, but other devices/drivers also don't need 
to be aware of them.


So then you could theoretically have a system that gives you:

GPU: FOO/tiled(layout-caps=FOO/cc, dev-caps=FOO/gpu-cache)
Display: FOO/tiled(layout-caps=FOO/cc)
Video:   FOO/tiled(dev-caps=FOO/vid-cache)
Camera:  FOO/tiled(dev-caps=FOO/vid-cache)

[snip]

FWIW, I think all that stuff about different caches quite likely 
over-complicates things. At the end of each "command submission" of 
whichever type of engine, the buffer must be in a state where the kernel 
is free to move it around for memory management purposes. This already 
puts a big constraint on the kind of (non-coherent) caches that can be 
supported anyway, so I wouldn't be surprised if we could get away with a 
*much* simpler approach.


Cheers,
Nicolai

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-01 Thread Nicolai Hähnle


On 01.12.2017 16:06, Rob Clark wrote:

On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle  wrote:

Hi,

I've had a chance to look a bit more closely at the allocator prototype
repository now. There's a whole bunch of low-level API design feedback, but
for now let's focus on the high-level stuff first.

Going by the 4.5 major object types (as also seen on slide 5 of your
presentation [0]), assertions and usages make sense to me.

Capabilities and capability sets should be cleaned up in my opinion, as the
status quo is overly obfuscating things. What capability sets really
represent, as far as I understand them, is *memory layouts*, and so that's
what they should be called.

This conceptually simplifies `derive_capabilities` significantly without any
loss of expressiveness as far as I can see. Given two lists of memory
layouts, we simply look for which memory layouts appear in both lists, and
then merge their constraints and capabilities.

Merging constraints looks good to me.

Capabilities need some more thought. The prototype removes capabilities when
merging layouts, but I'd argue that that is often undesirable. (In fact, I
cannot think of capabilities which we'd always want to remove.)

A typical example for this is compression (i.e. DCC in our case). For
rendering usage, we'd return something like:

Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)

For display usage, we might return (depending on hardware):

Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)

Merging these in the prototype would remove the DCC capability, even though
it might well make sense to keep it there for rendering. Dealing with the
fact that display usage does not have this capability is precisely one of
the two things that transitions are about! The other thing that transitions
are about is caches.

I think this is kind of what Rob was saying in one of his mails.


Perhaps "layout" is a better name than "caps".. either way I think of
both AMD/tiled and AMD/DCC as the same type of "thing".. the
difference between AMD/tiled and AMD/DCC is that a transition can be
provided for AMD/DCC.  Other than that they are both things describing
the layout.


The reason that a transition can be provided is that they aren't quite 
the same thing, though. In a very real sense, AMD/DCC is a "child" 
property of AMD/tiled: DCC is implemented as a meta surface whose memory 
layout depends on the layout of the main surface.


Although, if there are GPUs that can do an in-place "transition" between 
different tiling layouts, then the distinction is perhaps really not as 
clear-cut. I guess that would only apply to tiled renderers.




So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

   trans_a: FOO/CC -> null
   trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
   2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
   3: caps(FOO/tiled); constraints(alignment=32k)

Display:
   1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
   1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
  transition(GPU->display: trans_a, trans_b; display->GPU: none)
   2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
  transition(GPU->display: trans_a; display->GPU: none)
   3: caps(FOO/tiled); constraints(alignment=64k);
  transition(GPU->display: none; display->GPU: none)


We definitely don't want to expose a way of getting uncached rendering 
surfaces for radeonsi. I mean, I think we are supposed to be able to 
program our hardware so that the backend bypasses all caches, but (a) 
nobody validates that and (b) it's basically suicide in terms of 
performance. Let's build fewer footguns :)


So at least for radeonsi, we wouldn't want to have an AMD/cached bit, 
but we'd still want to have a transition between the GPU and display 
precisely to flush caches.




Two interesting questions:

1. If we query for multiple usages on the same device, can we get a
capability which can only be used for a subset of those usages?


I think the original idea was, "no"..  perhaps that could restriction
could be lifted if transitions where part of the result.  Or maybe you
just query independently the same device for multiple different
usages, and then merge that cap-set.

(Do we need to care about intra-device transitions?  Or can we just
let the driver care about that, same as it always has?)


2. What happens when we merge memory layouts with sets of capabilities where
neither is a subset of the other?


I think this is a case where no zero-copy sharing is possible, right?


Not necessarily. Let's say we have some industry-standard tiling layout 
foo, and vendors support their own proprietary framebuffer compression 
on top of

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-12-01 Thread Rob Clark

On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle  wrote:
> Hi,
>
> I've had a chance to look a bit more closely at the allocator prototype
> repository now. There's a whole bunch of low-level API design feedback, but
> for now let's focus on the high-level stuff first.
>
> Going by the 4.5 major object types (as also seen on slide 5 of your
> presentation [0]), assertions and usages make sense to me.
>
> Capabilities and capability sets should be cleaned up in my opinion, as the
> status quo is overly obfuscating things. What capability sets really
> represent, as far as I understand them, is *memory layouts*, and so that's
> what they should be called.
>
> This conceptually simplifies `derive_capabilities` significantly without any
> loss of expressiveness as far as I can see. Given two lists of memory
> layouts, we simply look for which memory layouts appear in both lists, and
> then merge their constraints and capabilities.
>
> Merging constraints looks good to me.
>
> Capabilities need some more thought. The prototype removes capabilities when
> merging layouts, but I'd argue that that is often undesirable. (In fact, I
> cannot think of capabilities which we'd always want to remove.)
>
> A typical example for this is compression (i.e. DCC in our case). For
> rendering usage, we'd return something like:
>
> Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)
>
> For display usage, we might return (depending on hardware):
>
> Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)
>
> Merging these in the prototype would remove the DCC capability, even though
> it might well make sense to keep it there for rendering. Dealing with the
> fact that display usage does not have this capability is precisely one of
> the two things that transitions are about! The other thing that transitions
> are about is caches.
>
> I think this is kind of what Rob was saying in one of his mails.

Perhaps "layout" is a better name than "caps".. either way I think of
both AMD/tiled and AMD/DCC as the same type of "thing".. the
difference between AMD/tiled and AMD/DCC is that a transition can be
provided for AMD/DCC.  Other than that they are both things describing
the layout.

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

  trans_a: FOO/CC -> null
  trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
  1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
  2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
  3: caps(FOO/tiled); constraints(alignment=32k)

Display:
  1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
  1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
 transition(GPU->display: trans_a, trans_b; display->GPU: none)
  2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
 transition(GPU->display: trans_a; display->GPU: none)
  3: caps(FOO/tiled); constraints(alignment=64k);
 transition(GPU->display: none; display->GPU: none)

> Two interesting questions:
>
> 1. If we query for multiple usages on the same device, can we get a
> capability which can only be used for a subset of those usages?

I think the original idea was, "no"..  perhaps that could restriction
could be lifted if transitions where part of the result.  Or maybe you
just query independently the same device for multiple different
usages, and then merge that cap-set.

(Do we need to care about intra-device transitions?  Or can we just
let the driver care about that, same as it always has?)

> 2. What happens when we merge memory layouts with sets of capabilities where
> neither is a subset of the other?

I think this is a case where no zero-copy sharing is possible, right?

> As for the actual transition API, I accept that some metadata may be
> required, and the metadata probably needs to depend on the memory layout,
> which is often vendor-specific. But even linear layouts need some
> transitions for caches. We probably need at least some generic "off-device
> usage" bit.

I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)

BR,
-R

>
> Cheers,
> Nicolai
>
> [0] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
>
>
> On 21.11.2017 02:11, James Jones wrote:
>>
>> As many here know at this point, I've been working on solving issues
>> related to DMA-capable memory allocation for various devices for some time
>> now.  I'd like to take this opportunity to apologize for the way I handled
>> the EGL stream proposals.  I understand now that the development process
>> followed there was unacceptable to the community and likely offended many
>> great engineers.
>>
>> Moving forward, I attempted

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Nicolai Hähnle


Hi,

I've had a chance to look a bit more closely at the allocator prototype 
repository now. There's a whole bunch of low-level API design feedback, 
but for now let's focus on the high-level stuff first.


Going by the 4.5 major object types (as also seen on slide 5 of your 
presentation [0]), assertions and usages make sense to me.


Capabilities and capability sets should be cleaned up in my opinion, as 
the status quo is overly obfuscating things. What capability sets really 
represent, as far as I understand them, is *memory layouts*, and so 
that's what they should be called.


This conceptually simplifies `derive_capabilities` significantly without 
any loss of expressiveness as far as I can see. Given two lists of 
memory layouts, we simply look for which memory layouts appear in both 
lists, and then merge their constraints and capabilities.


Merging constraints looks good to me.

Capabilities need some more thought. The prototype removes capabilities 
when merging layouts, but I'd argue that that is often undesirable. (In 
fact, I cannot think of capabilities which we'd always want to remove.)


A typical example for this is compression (i.e. DCC in our case). For 
rendering usage, we'd return something like:


Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)

For display usage, we might return (depending on hardware):

Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)

Merging these in the prototype would remove the DCC capability, even 
though it might well make sense to keep it there for rendering. Dealing 
with the fact that display usage does not have this capability is 
precisely one of the two things that transitions are about! The other 
thing that transitions are about is caches.


I think this is kind of what Rob was saying in one of his mails.

Two interesting questions:

1. If we query for multiple usages on the same device, can we get a 
capability which can only be used for a subset of those usages?


2. What happens when we merge memory layouts with sets of capabilities 
where neither is a subset of the other?


As for the actual transition API, I accept that some metadata may be 
required, and the metadata probably needs to depend on the memory 
layout, which is often vendor-specific. But even linear layouts need 
some transitions for caches. We probably need at least some generic 
"off-device usage" bit.


Cheers,
Nicolai

[0] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf

On 21.11.2017 02:11, James Jones wrote:
As many here know at this point, I've been working on solving issues 
related to DMA-capable memory allocation for various devices for some 
time now.  I'd like to take this opportunity to apologize for the way I 
handled the EGL stream proposals.  I understand now that the development 
process followed there was unacceptable to the community and likely 
offended many great engineers.


Moving forward, I attempted to reboot talks in a more constructive 
manner with the generic allocator library proposals & discussion forum 
at XDC 2016.  Some great design ideas came out of that, and I've since 
been prototyping some code to prove them out before bringing them back 
as official proposals.  Again, I understand some people are growing 
concerned that I've been doing this off on the side in a github project 
that has primarily NVIDIA contributors.  My goal was only to avoid 
wasting everyone's time with unproven ideas.  The intent was never to 
dump the prototype code as-is on the community and presume acceptance. 
It's just a public research project.


Now the prototyping is nearing completion, and I'd like to renew 
discussion on whether and how the new mechanisms can be integrated with 
the Linux graphics stack.


I'd be interested to know if more work is needed to demonstrate the 
usefulness of the new mechanisms, or whether people think they have 
value at this point.


After talking with people on the hallway track at XDC this year, I've 
heard several proposals for incorporating the new mechanisms:


-Include ideas from the generic allocator design into GBM.  This could 
take the form of designing a "GBM 2.0" API, or incrementally adding to 
the existing GBM API.


-Develop a library to replace GBM.  The allocator prototype code could 
be massaged into something production worthy to jump start this process.


-Develop a library that sits beside or on top of GBM, using GBM for 
low-level graphics buffer allocation, while supporting non-graphics 
kernel APIs directly.  The additional cross-device negotiation and 
sorting of capabilities would be handled in this slightly higher-level 
API before handing off to GBM and other APIs for actual allocation somehow.


-I have also heard some general comments that regardless of the 
relationship between GBM and the new allocator mechanisms, it might be 
time to move GBM out of Mesa so it can be developed as a stand-alone 
project.  I'd be interested what others think about that,

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Lyude Paul

On Thu, 2017-11-30 at 13:20 -0500, Rob Clark wrote:
> On Thu, Nov 30, 2017 at 12:59 AM, James Jones  wrote:
> > On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:
> > > 
> > > On Wed, 29 Nov 2017 16:28:15 -0500
> > > Rob Clark  wrote:
> > > > 
> > > > Do we need to define both in-place and copy transitions?  Ie. what if
> > > > GPU is still reading a tiled or compressed texture (ie. sampling from
> > > > previous frame for some reason), but we need to untile/uncompress for
> > > > display.. of maybe there are some other cases like that we should
> > > > think about..
> > > > 
> > > > Maybe you already have some thoughts about that?
> > > 
> > > 
> > > This is the next thing I'll be working on. I haven't given it much
> > > thought myself so far, but I think James might have had some insights.
> > > I'll read through some of his notes to double-check.
> > 
> > 
> > A couple of notes on usage transitions:
> > 
> > While chatting about transitions, a few assertions were made by others
> > that
> > I've come to accept, despite the fact that they reduce the generality of
> > the
> > allocator mechanisms:
> > 
> > -GPUs are the only things that actually need usage transitions as far as I
> > know thus far.  Other engines either share the GPU representations of
> > data,
> > or use more limited representations; the latter being the reason non-GPU
> > usage transitions are a useful thing.
> > 
> > -It's reasonable to assume that a GPU is required to perform a usage
> > transition.  This follows from the above postulate.  If only GPUs are
> > using
> > more advanced representations, you don't need any transitions unless you
> > have a GPU available.
> 
> This seems reasonable.  I can't think of any non-gpu related case
> where you would need a transition, other than perhaps cache flush/inv.
> 
> > From that, I derived the rough API proposal for transitions presented on
> > my
> > XDC 2017 slides.  Transition "metadata" is queried from the allocator
> > given
> > a pair of usages (which may refer to more than one device), but the
> > realization of the transition is left to existing GPU APIs.  I think I put
> > Vulkan-like pseudo-code in the slides, but the GL external objects
> > extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well.
> 
> I haven't quite wrapped my head around how this would work in the
> cross-device case.. I mean from the API standpoint for the user, it
> seems straightforward enough.  Just not sure how to implement that and
> what the driver interface would look like.
> 
> I guess we need a capability-conversion (?).. I mean take for example
> the the fb compression capability from your slide #12[1].  If we knew
> there was an available transition to go from "Dev2 FB compression" to
> "normal", then we could have allowed the "Dev2 FB compression" valid
> set?
> 
> [1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
> 
> > Regarding in-place Vs. copy: To me a transition is something that happens
> > in-place, at least semantically.  If you need to make copies, that's a
> > format conversion blit not a transition, and graphics APIs are already
> > capable of expressing that without any special transitions or help from
> > the
> > allocator.  However, I understand some chipsets perform transitions using
> > something that looks kind of like a blit using on-chip caches and
> > constrained usage semantics.  There's probably some work to do to see
> > whether those need to be accommodated as conversion blits or usgae
> > transitions.
> 
> I guess part of what I was thinking of, is what happens if the
> producing device is still reading from the buffer.  For example,
> viddec -> gpu use case, where the video decoder is also still hanging
> on to the frame to use as a reference frame to decode future frames?
> 
> I guess if transition from devA -> devB can be done in parallel with
> devA still reading the buffer, it isn't a problem.  I guess that
> limits (non-blit) transitions to decompression and cache op's?  Maybe
> that is ok..
> 
> > For our hardware's purposes, transitions are just various levels of
> > decompression or compression reconfiguration and potentially cache
> > flushing/invalidation, so our transition metadata will just be some bits
> > signaling which compression operation is needed, if any.  That's the sort
> > of
> > operation I modeled the API around, so if things are much more exotic than
> > that for others, it will probably require some adjustments.
> > 
> 
> 
> [snip]
> 
> > 
> > Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my
> > primary goals.  However, it's a pretty heavy thing to prototype.  If
> > someone
> > has the time though, I think it would be a great experiment.  It would
> > help
> > flesh out the paltry list of usages, constraints, and capabilities in the
> > existing prototype codebase.  The kmscube example really should have added
> > at least a "render" usage, but I

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Lyude Paul

On Thu, 2017-11-30 at 14:20 -0500, Alex Deucher wrote:
> On Thu, Nov 30, 2017 at 2:10 PM, Nicolai Hähnle  wrote:
> > On 30.11.2017 19:52, Rob Clark wrote:
> > > 
> > > On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnle 
> > > wrote:
> > > > 
> > > > On 30.11.2017 01:09, Miguel Angel Vico wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > It seems to me that $new_thing should grow as a separate thing
> > > > > > > whether
> > > > > > > it ends up replacing GBM or GBM internals are somewhat rewritten
> > > > > > > on
> > > > > > > top
> > > > > > > of it. If I'm reading you both correctly, you agree with that,
> > > > > > > so in
> > > > > > > order to move forward, should we go ahead and create a project
> > > > > > > in
> > > > > > > fd.o?
> > > > > > > 
> > > > > > > Before filing the new project request though, we should find an
> > > > > > > appropriate name for $new_thing. Creativity isn't one of my
> > > > > > > strengths,
> > > > > > > but I'll go ahead and start the bikeshedding with "Generic
> > > > > > > Device
> > > > > > > Memory Allocator" or "Generic Device Memory Manager".
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > liballoc - Generic Device Memory Allocator ... seems reasonable to
> > > > > > me..
> > > > > 
> > > > > 
> > > > > 
> > > > > Cool. If there aren't better suggestions, we can go with that. We
> > > > > should also namespace all APIs and structures. Is 'galloc'
> > > > > distinctive
> > > > > enough to be used as namespace? Being an 'r' away from gralloc maybe
> > > > > it's a bit confusing?
> > > > 
> > > > 
> > > > 
> > > > libgalloc with a galloc prefix seems fine.
> > > > 
> > > 
> > > I keep reading "galloc" as "gralloc".. I suspect that will be
> > > confusing.  Maybe libgal/gal_.. or just liballoc/al_?
> > 
> > 
> > True, but liballoc is *very* generic.
> > 
> > libimagealloc?
> > libsurfacealloc?
> > contractions thereof?
> 
> libdevicealloc?

libhwalloc

> 
> Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Alex Deucher

On Thu, Nov 30, 2017 at 2:10 PM, Nicolai Hähnle  wrote:
> On 30.11.2017 19:52, Rob Clark wrote:
>>
>> On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnle 
>> wrote:
>>>
>>> On 30.11.2017 01:09, Miguel Angel Vico wrote:
>>
>>
>> It seems to me that $new_thing should grow as a separate thing whether
>> it ends up replacing GBM or GBM internals are somewhat rewritten on
>> top
>> of it. If I'm reading you both correctly, you agree with that, so in
>> order to move forward, should we go ahead and create a project in
>> fd.o?
>>
>> Before filing the new project request though, we should find an
>> appropriate name for $new_thing. Creativity isn't one of my strengths,
>> but I'll go ahead and start the bikeshedding with "Generic Device
>> Memory Allocator" or "Generic Device Memory Manager".
>
>
>
> liballoc - Generic Device Memory Allocator ... seems reasonable to me..



 Cool. If there aren't better suggestions, we can go with that. We
 should also namespace all APIs and structures. Is 'galloc' distinctive
 enough to be used as namespace? Being an 'r' away from gralloc maybe
 it's a bit confusing?
>>>
>>>
>>>
>>> libgalloc with a galloc prefix seems fine.
>>>
>>
>> I keep reading "galloc" as "gralloc".. I suspect that will be
>> confusing.  Maybe libgal/gal_.. or just liballoc/al_?
>
>
> True, but liballoc is *very* generic.
>
> libimagealloc?
> libsurfacealloc?
> contractions thereof?

libdevicealloc?

Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Nicolai Hähnle


On 30.11.2017 19:52, Rob Clark wrote:

On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnle  wrote:

On 30.11.2017 01:09, Miguel Angel Vico wrote:


It seems to me that $new_thing should grow as a separate thing whether
it ends up replacing GBM or GBM internals are somewhat rewritten on top
of it. If I'm reading you both correctly, you agree with that, so in
order to move forward, should we go ahead and create a project in fd.o?

Before filing the new project request though, we should find an
appropriate name for $new_thing. Creativity isn't one of my strengths,
but I'll go ahead and start the bikeshedding with "Generic Device
Memory Allocator" or "Generic Device Memory Manager".



liballoc - Generic Device Memory Allocator ... seems reasonable to me..



Cool. If there aren't better suggestions, we can go with that. We
should also namespace all APIs and structures. Is 'galloc' distinctive
enough to be used as namespace? Being an 'r' away from gralloc maybe
it's a bit confusing?



libgalloc with a galloc prefix seems fine.



I keep reading "galloc" as "gralloc".. I suspect that will be
confusing.  Maybe libgal/gal_.. or just liballoc/al_?


True, but liballoc is *very* generic.

libimagealloc?
libsurfacealloc?
contractions thereof?

Cheers,
Nicolai



BR,
-R






--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Rob Clark

On Thu, Nov 30, 2017 at 4:21 AM, Nicolai Hähnle  wrote:
> On 30.11.2017 01:09, Miguel Angel Vico wrote:

 It seems to me that $new_thing should grow as a separate thing whether
 it ends up replacing GBM or GBM internals are somewhat rewritten on top
 of it. If I'm reading you both correctly, you agree with that, so in
 order to move forward, should we go ahead and create a project in fd.o?

 Before filing the new project request though, we should find an
 appropriate name for $new_thing. Creativity isn't one of my strengths,
 but I'll go ahead and start the bikeshedding with "Generic Device
 Memory Allocator" or "Generic Device Memory Manager".
>>>
>>>
>>> liballoc - Generic Device Memory Allocator ... seems reasonable to me..
>>
>>
>> Cool. If there aren't better suggestions, we can go with that. We
>> should also namespace all APIs and structures. Is 'galloc' distinctive
>> enough to be used as namespace? Being an 'r' away from gralloc maybe
>> it's a bit confusing?
>
>
> libgalloc with a galloc prefix seems fine.
>

I keep reading "galloc" as "gralloc".. I suspect that will be
confusing.  Maybe libgal/gal_.. or just liballoc/al_?

BR,
-R

>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Rob Clark

On Thu, Nov 30, 2017 at 1:28 AM, James Jones  wrote:
> On 11/29/2017 01:10 PM, Rob Clark wrote:
>>
>> On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand 
>> wrote:
>>>
>>> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:


 On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
 wrote:
>
> I'm not quite some sure what I think about this.  I think I would like
> to
> see $new_thing at least replace the guts of GBM. Whether GBM becomes a
> wrapper around $new_thing or $new_thing implements the GBM API, I'm not
> sure.  What I don't think I want is to see GBM development continuing
> on
> it's own so we have two competing solutions.


 I don't really view them as competing.. there is *some* overlap, ie.
 allocating a buffer.. but even if you are using GBM w/out $new_thing
 you could allocate a buffer externally and import it.  I don't see
 $new_thing as that much different from GBM PoV.

 But things like surfaces (aka swap chains) seem a bit out of place
 when you are thinking about implementing $new_thing for non-gpu
 devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
 about a (for ex.) camera.  I kinda don't want to throw out the baby
 with the bathwater here.
>>>
>>>
>>>
>>> Agreed.  GBM is very EGLish and we don't want the new allocator to be
>>> that.
>>>

 *maybe* GBM could be partially implemented on top of $new_thing.  I
 don't quite see how that would work.  Possibly we could deprecate
 parts of GBM that are no longer needed?  idk..  Either way, I fully
 expect that GBM and mesa's implementation of $new_thing could perhaps
 sit on to of some of the same set of internal APIs.  The public
 interface can be decoupled from the internal implementation.
>>>
>>>
>>>
>>> Maybe I should restate things a bit.  My real point was that modifiers +
>>> $new_thing + Kernel blob should be a complete and more powerful
>>> replacement
>>> for GBM.  I don't know that we really can implement GBM on top of it
>>> because
>>> GBM has lots of wishy-washy concepts such as "cursor plane" which may not
>>> map well at least not without querying the kernel about specifc display
>>> planes.  In particular, I don't want someone to feel like they need to
>>> use
>>> $new_thing and GBM at the same time or together.  Ideally, I'd like them
>>> to
>>> never do that unless we decide gbm_bo is a useful abstraction for
>>> $new_thing.
>>>
>>
>> (just to repeat what I mentioned on irc)
>>
>> I think main thing is how do you create a swapchain/surface and know
>> which is current front buffer after SwapBuffers()..  that is the only
>> bits of GBM that seem like there would still be useful.  idk, maybe
>> there is some other idea.
>
>
> I don't view this as terribly useful except for legacy apps that need an EGL
> window surface and can't be updated to use new methods.  Wayland compositors
> certainly don't fall in that category.  I don't know that any GBM apps do.

kmscube doesn't count?  :-P

Hmm, I assumed weston and the other wayland compositors where still
using gbm to create EGL surfaces, but I confess to have not actually
looked at weston src code for quite a few years now.

Anyways, I think it is perfectly fine for GBM to stay as-is in it's
current form.  It can already import dma-buf fd's, and those can
certainly come from $new_thing.

So I guess we want an EGL extension to return the allocator device
instance for the GPU.  That also takes care of the non-bare-metal
case.

> Rather, I think the way forward for the classes of apps that need something
> like GBM or the generic allocator is more or less the path ChromeOS took
> with their graphics architecture: Render to individual buffers (using FBOs
> bound to imported buffers in GL) and manage buffer exchanges/blits manually.
>
> The useful abstraction surfaces provide isn't so much deciding which buffer
> is currently "front" and "back", but rather handling the transition/hand-off
> to the window system/display device/etc. in SwapBuffers(), and the whole
> idea of the allocator proposals is to make that something the application or
> at least some non-driver utility library handles explicitly based on where
> exactly the buffer is being handed off to.

Hmm, ok..  I guess the transition will need some hook into the driver.
For freedreno and vc4 (and I suspect this is not uncommon for tiler
GPUs), switching FBOs doesn't necessarily flush rendering to hw.
Maybe it would work out if you requested the sync fd file descriptor
from an EGL fence before passing things to next device, as that would
flush rendering.

I wonder a bit about perf tools and related things.. gallium HUD and
apitrace use SwapBuffers() as a frame marker..

> The one other useful information provided by EGL surfaces that I suspect
> only our hardware cares about is whether the app is potentially

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Rob Clark

On Thu, Nov 30, 2017 at 12:59 AM, James Jones  wrote:
> On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:
>>
>> On Wed, 29 Nov 2017 16:28:15 -0500
>> Rob Clark  wrote:
>>>
>>> Do we need to define both in-place and copy transitions?  Ie. what if
>>> GPU is still reading a tiled or compressed texture (ie. sampling from
>>> previous frame for some reason), but we need to untile/uncompress for
>>> display.. of maybe there are some other cases like that we should
>>> think about..
>>>
>>> Maybe you already have some thoughts about that?
>>
>>
>> This is the next thing I'll be working on. I haven't given it much
>> thought myself so far, but I think James might have had some insights.
>> I'll read through some of his notes to double-check.
>
>
> A couple of notes on usage transitions:
>
> While chatting about transitions, a few assertions were made by others that
> I've come to accept, despite the fact that they reduce the generality of the
> allocator mechanisms:
>
> -GPUs are the only things that actually need usage transitions as far as I
> know thus far.  Other engines either share the GPU representations of data,
> or use more limited representations; the latter being the reason non-GPU
> usage transitions are a useful thing.
>
> -It's reasonable to assume that a GPU is required to perform a usage
> transition.  This follows from the above postulate.  If only GPUs are using
> more advanced representations, you don't need any transitions unless you
> have a GPU available.

This seems reasonable.  I can't think of any non-gpu related case
where you would need a transition, other than perhaps cache flush/inv.

> From that, I derived the rough API proposal for transitions presented on my
> XDC 2017 slides.  Transition "metadata" is queried from the allocator given
> a pair of usages (which may refer to more than one device), but the
> realization of the transition is left to existing GPU APIs.  I think I put
> Vulkan-like pseudo-code in the slides, but the GL external objects
> extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well.

I haven't quite wrapped my head around how this would work in the
cross-device case.. I mean from the API standpoint for the user, it
seems straightforward enough.  Just not sure how to implement that and
what the driver interface would look like.

I guess we need a capability-conversion (?).. I mean take for example
the the fb compression capability from your slide #12[1].  If we knew
there was an available transition to go from "Dev2 FB compression" to
"normal", then we could have allowed the "Dev2 FB compression" valid
set?

[1] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf

> Regarding in-place Vs. copy: To me a transition is something that happens
> in-place, at least semantically.  If you need to make copies, that's a
> format conversion blit not a transition, and graphics APIs are already
> capable of expressing that without any special transitions or help from the
> allocator.  However, I understand some chipsets perform transitions using
> something that looks kind of like a blit using on-chip caches and
> constrained usage semantics.  There's probably some work to do to see
> whether those need to be accommodated as conversion blits or usgae
> transitions.

I guess part of what I was thinking of, is what happens if the
producing device is still reading from the buffer.  For example,
viddec -> gpu use case, where the video decoder is also still hanging
on to the frame to use as a reference frame to decode future frames?

I guess if transition from devA -> devB can be done in parallel with
devA still reading the buffer, it isn't a problem.  I guess that
limits (non-blit) transitions to decompression and cache op's?  Maybe
that is ok..

> For our hardware's purposes, transitions are just various levels of
> decompression or compression reconfiguration and potentially cache
> flushing/invalidation, so our transition metadata will just be some bits
> signaling which compression operation is needed, if any.  That's the sort of
> operation I modeled the API around, so if things are much more exotic than
> that for others, it will probably require some adjustments.
>


[snip]

>
> Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my
> primary goals.  However, it's a pretty heavy thing to prototype.  If someone
> has the time though, I think it would be a great experiment.  It would help
> flesh out the paltry list of usages, constraints, and capabilities in the
> existing prototype codebase.  The kmscube example really should have added
> at least a "render" usage, but I got lazy and just re-used texture for now.
> That won't actually work on our HW in all cases, but it's good enough for
> kmscube.
>

btw, I did start looking at it.. I guess this gets a bit into the
other side of this thread (ie. where/if GBM fits in).  So far I don't
think mesa has EGL_EXT_device_base, but I'm guessing

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Nicolai Hähnle


On 30.11.2017 07:28, James Jones wrote:
This is all a really long-winded way of saying yeah I think it would be 
technically feasible to implement GBM on top of the generic allocator 
mechanisms, but I don't think that's a very interesting undertaking. 
It'd just be an ABI-compatibility thing for a bunch of open-source apps, 
which seems unnecessary in the long run since the apps can just be 
patched instead.  Maybe it's useful as a transition mechanism though.


However, if the generic allocator is going to be something separate from 
GBM, I think the idea of modernizing & adapting the existing GBM backend 
infrastructure in Mesa to serve as a backend for the allocator is a good 
idea.  Maybe it's easier to just let GBM sit on that same updated 
backend beside the allocator API.  For GBM, all the interesting stuff 
happens in the backend anyway.


That's precisely why I brought up the libgalloc <-> driver interface in 
another mail. If the libgalloc <-> driver interface uses the same 
extension mechanism that is in place for libgbm <-> driver today, just 
with different extensions, the transition can be made very seamless.


For example, I think we could let whatever "device handle" we use in 
that interface simply be an alias for __DRIscreen as far as drivers from 
Mesa are concerned. Other drivers (which won't implement the DRI_XXX 
extensions) won't have to concern themselves with that if they don't 
want to.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-30 Thread Nicolai Hähnle


On 30.11.2017 01:09, Miguel Angel Vico wrote:

It seems to me that $new_thing should grow as a separate thing whether
it ends up replacing GBM or GBM internals are somewhat rewritten on top
of it. If I'm reading you both correctly, you agree with that, so in
order to move forward, should we go ahead and create a project in fd.o?

Before filing the new project request though, we should find an
appropriate name for $new_thing. Creativity isn't one of my strengths,
but I'll go ahead and start the bikeshedding with "Generic Device
Memory Allocator" or "Generic Device Memory Manager".


liballoc - Generic Device Memory Allocator ... seems reasonable to me..


Cool. If there aren't better suggestions, we can go with that. We
should also namespace all APIs and structures. Is 'galloc' distinctive
enough to be used as namespace? Being an 'r' away from gralloc maybe
it's a bit confusing?


libgalloc with a galloc prefix seems fine.



I think it is reasonable to live on github until we figure out how
transitions work.. or in particular are there any thread restrictions
or interactions w/ gl context if transitions are done on the gpu or
anything like that?  Or can we just make it more vulkan like w/
explicit ctx ptr, and pass around fence fd's to synchronize everyone??
  I haven't thought about the transition part too much but I guess we
should have a reasonable idea for how that should work before we start
getting too many non-toy users, lest we find big API changes are
needed..


Seems fine, but I would like to get other people other than NVIDIANs
involved giving feedback on the design as we move forward with the
prototype.

Due to lack of a better list, is it okay to start sending patches to
mesa-dev? If that's a too broad audience, should I just CC specific
individuals that have somewhat contributed to the project?


Keeping it on mesa-dev seems like the best way to ensure the relevant 
people actually see it.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread James Jones


On 11/29/2017 01:10 PM, Rob Clark wrote:

On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand  wrote:

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:


On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:

On November 24, 2017 09:29:43 Rob Clark  wrote:



On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
wrote:


As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time
now.
I'd
like to take this opportunity to apologize for the way I handled the
EGL
stream proposals.  I understand now that the development process
followed
there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive
manner
with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been
prototyping
some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is
on
the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have
value
at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could
be
massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.



tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc



I'm not quite some sure what I think about this.  I think I would like
to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.


I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.



Agreed.  GBM is very EGLish and we don't want the new allocator to be that.



*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and mesa's

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread James Jones


On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:



On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:


On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  wrote:

Many of you may already know, but James is going to be out for a few
weeks and I'll be taking over this in the meantime.


Sorry for the unfortunate timing.  I am indeed on paternity leave at the 
moment.  Some quick comments below.  I'll be trying to follow the 
discussion as time allows while I'm out.



See inline for comments.

On Wed, 29 Nov 2017 09:33:29 -0800
Jason Ekstrand  wrote:
  

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
  

On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
wrote:

On November 24, 2017 09:29:43 Rob Clark  wrote:



On Mon, Nov 20, 2017 at 8:11 PM, James Jones 

wrote:


As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time now.
I'd
like to take this opportunity to apologize for the way I handled the

EGL

stream proposals.  I understand now that the development process

followed

there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive

manner

with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been

prototyping

some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is

on

the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have

value

at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could

be

massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.



tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc



I'm not quite some sure what I think about this.  I think I would like to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.


I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Miguel Angel Vico



On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark  wrote:

> On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  
> wrote:
> > Many of you may already know, but James is going to be out for a few
> > weeks and I'll be taking over this in the meantime.
> >
> > See inline for comments.
> >
> > On Wed, 29 Nov 2017 09:33:29 -0800
> > Jason Ekstrand  wrote:
> >  
> >> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
> >>  
> >> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
> >> > wrote:  
> >> > > On November 24, 2017 09:29:43 Rob Clark  wrote:  
> >> > >>
> >> > >>
> >> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones   
> >> > wrote:  
> >> > >>>
> >> > >>> As many here know at this point, I've been working on solving issues
> >> > >>> related
> >> > >>> to DMA-capable memory allocation for various devices for some time 
> >> > >>> now.
> >> > >>> I'd
> >> > >>> like to take this opportunity to apologize for the way I handled the 
> >> > >>>  
> >> > EGL  
> >> > >>> stream proposals.  I understand now that the development process  
> >> > followed  
> >> > >>> there was unacceptable to the community and likely offended many 
> >> > >>> great
> >> > >>> engineers.
> >> > >>>
> >> > >>> Moving forward, I attempted to reboot talks in a more constructive  
> >> > manner  
> >> > >>> with the generic allocator library proposals & discussion forum at 
> >> > >>> XDC
> >> > >>> 2016.
> >> > >>> Some great design ideas came out of that, and I've since been  
> >> > prototyping  
> >> > >>> some code to prove them out before bringing them back as official
> >> > >>> proposals.
> >> > >>> Again, I understand some people are growing concerned that I've been
> >> > >>> doing
> >> > >>> this off on the side in a github project that has primarily NVIDIA
> >> > >>> contributors.  My goal was only to avoid wasting everyone's time with
> >> > >>> unproven ideas.  The intent was never to dump the prototype code 
> >> > >>> as-is  
> >> > on  
> >> > >>> the community and presume acceptance. It's just a public research
> >> > >>> project.
> >> > >>>
> >> > >>> Now the prototyping is nearing completion, and I'd like to renew
> >> > >>> discussion
> >> > >>> on whether and how the new mechanisms can be integrated with the 
> >> > >>> Linux
> >> > >>> graphics stack.
> >> > >>>
> >> > >>> I'd be interested to know if more work is needed to demonstrate the
> >> > >>> usefulness of the new mechanisms, or whether people think they have  
> >> > value  
> >> > >>> at
> >> > >>> this point.
> >> > >>>
> >> > >>> After talking with people on the hallway track at XDC this year, I've
> >> > >>> heard
> >> > >>> several proposals for incorporating the new mechanisms:
> >> > >>>
> >> > >>> -Include ideas from the generic allocator design into GBM.  This 
> >> > >>> could
> >> > >>> take
> >> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
> >> > >>> existing GBM API.
> >> > >>>
> >> > >>> -Develop a library to replace GBM.  The allocator prototype code 
> >> > >>> could  
> >> > be  
> >> > >>> massaged into something production worthy to jump start this process.
> >> > >>>
> >> > >>> -Develop a library that sits beside or on top of GBM, using GBM for
> >> > >>> low-level graphics buffer allocation, while supporting non-graphics
> >> > >>> kernel
> >> > >>> APIs directly.  The additional cross-device negotiation and sorting 
> >> > >>> of
> >> > >>> capabilities would be handled in this slightly higher-level API 
> >> > >>> before
> >> > >>> handing off to GBM and other APIs for actual allocation somehow.  
> >> > >>
> >> > >>
> >> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
> >> > >> still the "winsys" for running on "bare metal" (ie. kms).  And we
> >> > >> don't want to saddle $new_thing with aspects of that, but rather have
> >> > >> it focus on being the thing that in multiple-"device"[1] scenarious
> >> > >> figures out what sort of buffer can be allocated by who for sharing.
> >> > >> Ie $new_thing should really not care about winsys level things like
> >> > >> cursors or surfaces.. only buffers.
> >> > >>
> >> > >> The mesa implementation of $new_thing could sit on top of GBM,
> >> > >> although it could also just sit on top of the same internal APIs that
> >> > >> GBM sits on top of.  That is an implementation detail.  It could be
> >> > >> that GBM grows an API to return an instance of $new_thing for
> >> > >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
> >> > >> is exposed via some sort of EGL extension.  (We probably also need a
> >> > >> way to get an instance from libdrm (?) for display-only KMS drivers,
> >> > >> to cover cases like etnaviv sharing a buffer with a separate display
> >> > >> driver.)
> >> > >>
> >> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Rob Clark

On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico  wrote:
> Many of you may already know, but James is going to be out for a few
> weeks and I'll be taking over this in the meantime.
>
> See inline for comments.
>
> On Wed, 29 Nov 2017 09:33:29 -0800
> Jason Ekstrand  wrote:
>
>> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
>>
>> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
>> > wrote:
>> > > On November 24, 2017 09:29:43 Rob Clark  wrote:
>> > >>
>> > >>
>> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
>> > wrote:
>> > >>>
>> > >>> As many here know at this point, I've been working on solving issues
>> > >>> related
>> > >>> to DMA-capable memory allocation for various devices for some time now.
>> > >>> I'd
>> > >>> like to take this opportunity to apologize for the way I handled the
>> > EGL
>> > >>> stream proposals.  I understand now that the development process
>> > followed
>> > >>> there was unacceptable to the community and likely offended many great
>> > >>> engineers.
>> > >>>
>> > >>> Moving forward, I attempted to reboot talks in a more constructive
>> > manner
>> > >>> with the generic allocator library proposals & discussion forum at XDC
>> > >>> 2016.
>> > >>> Some great design ideas came out of that, and I've since been
>> > prototyping
>> > >>> some code to prove them out before bringing them back as official
>> > >>> proposals.
>> > >>> Again, I understand some people are growing concerned that I've been
>> > >>> doing
>> > >>> this off on the side in a github project that has primarily NVIDIA
>> > >>> contributors.  My goal was only to avoid wasting everyone's time with
>> > >>> unproven ideas.  The intent was never to dump the prototype code as-is
>> > on
>> > >>> the community and presume acceptance. It's just a public research
>> > >>> project.
>> > >>>
>> > >>> Now the prototyping is nearing completion, and I'd like to renew
>> > >>> discussion
>> > >>> on whether and how the new mechanisms can be integrated with the Linux
>> > >>> graphics stack.
>> > >>>
>> > >>> I'd be interested to know if more work is needed to demonstrate the
>> > >>> usefulness of the new mechanisms, or whether people think they have
>> > value
>> > >>> at
>> > >>> this point.
>> > >>>
>> > >>> After talking with people on the hallway track at XDC this year, I've
>> > >>> heard
>> > >>> several proposals for incorporating the new mechanisms:
>> > >>>
>> > >>> -Include ideas from the generic allocator design into GBM.  This could
>> > >>> take
>> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
>> > >>> existing GBM API.
>> > >>>
>> > >>> -Develop a library to replace GBM.  The allocator prototype code could
>> > be
>> > >>> massaged into something production worthy to jump start this process.
>> > >>>
>> > >>> -Develop a library that sits beside or on top of GBM, using GBM for
>> > >>> low-level graphics buffer allocation, while supporting non-graphics
>> > >>> kernel
>> > >>> APIs directly.  The additional cross-device negotiation and sorting of
>> > >>> capabilities would be handled in this slightly higher-level API before
>> > >>> handing off to GBM and other APIs for actual allocation somehow.
>> > >>
>> > >>
>> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
>> > >> still the "winsys" for running on "bare metal" (ie. kms).  And we
>> > >> don't want to saddle $new_thing with aspects of that, but rather have
>> > >> it focus on being the thing that in multiple-"device"[1] scenarious
>> > >> figures out what sort of buffer can be allocated by who for sharing.
>> > >> Ie $new_thing should really not care about winsys level things like
>> > >> cursors or surfaces.. only buffers.
>> > >>
>> > >> The mesa implementation of $new_thing could sit on top of GBM,
>> > >> although it could also just sit on top of the same internal APIs that
>> > >> GBM sits on top of.  That is an implementation detail.  It could be
>> > >> that GBM grows an API to return an instance of $new_thing for
>> > >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
>> > >> is exposed via some sort of EGL extension.  (We probably also need a
>> > >> way to get an instance from libdrm (?) for display-only KMS drivers,
>> > >> to cover cases like etnaviv sharing a buffer with a separate display
>> > >> driver.)
>> > >>
>> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>> > >> more GPUs, but also includes non-GPU devices like camera, video
>> > >> decoder, "image processor" (which may or may not be part of camera),
>> > >> etc, etc
>> > >
>> > >
>> > > I'm not quite some sure what I think about this.  I think I would like to
>> > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
>> > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
>> > > sure.  What I

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Rob Clark

On Wed, Nov 29, 2017 at 12:33 PM, Jason Ekstrand  wrote:
> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
>>
>> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
>> wrote:
>> > On November 24, 2017 09:29:43 Rob Clark  wrote:
>> >>
>> >>
>> >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
>> >> wrote:
>> >>>
>> >>> As many here know at this point, I've been working on solving issues
>> >>> related
>> >>> to DMA-capable memory allocation for various devices for some time
>> >>> now.
>> >>> I'd
>> >>> like to take this opportunity to apologize for the way I handled the
>> >>> EGL
>> >>> stream proposals.  I understand now that the development process
>> >>> followed
>> >>> there was unacceptable to the community and likely offended many great
>> >>> engineers.
>> >>>
>> >>> Moving forward, I attempted to reboot talks in a more constructive
>> >>> manner
>> >>> with the generic allocator library proposals & discussion forum at XDC
>> >>> 2016.
>> >>> Some great design ideas came out of that, and I've since been
>> >>> prototyping
>> >>> some code to prove them out before bringing them back as official
>> >>> proposals.
>> >>> Again, I understand some people are growing concerned that I've been
>> >>> doing
>> >>> this off on the side in a github project that has primarily NVIDIA
>> >>> contributors.  My goal was only to avoid wasting everyone's time with
>> >>> unproven ideas.  The intent was never to dump the prototype code as-is
>> >>> on
>> >>> the community and presume acceptance. It's just a public research
>> >>> project.
>> >>>
>> >>> Now the prototyping is nearing completion, and I'd like to renew
>> >>> discussion
>> >>> on whether and how the new mechanisms can be integrated with the Linux
>> >>> graphics stack.
>> >>>
>> >>> I'd be interested to know if more work is needed to demonstrate the
>> >>> usefulness of the new mechanisms, or whether people think they have
>> >>> value
>> >>> at
>> >>> this point.
>> >>>
>> >>> After talking with people on the hallway track at XDC this year, I've
>> >>> heard
>> >>> several proposals for incorporating the new mechanisms:
>> >>>
>> >>> -Include ideas from the generic allocator design into GBM.  This could
>> >>> take
>> >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
>> >>> existing GBM API.
>> >>>
>> >>> -Develop a library to replace GBM.  The allocator prototype code could
>> >>> be
>> >>> massaged into something production worthy to jump start this process.
>> >>>
>> >>> -Develop a library that sits beside or on top of GBM, using GBM for
>> >>> low-level graphics buffer allocation, while supporting non-graphics
>> >>> kernel
>> >>> APIs directly.  The additional cross-device negotiation and sorting of
>> >>> capabilities would be handled in this slightly higher-level API before
>> >>> handing off to GBM and other APIs for actual allocation somehow.
>> >>
>> >>
>> >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
>> >> still the "winsys" for running on "bare metal" (ie. kms).  And we
>> >> don't want to saddle $new_thing with aspects of that, but rather have
>> >> it focus on being the thing that in multiple-"device"[1] scenarious
>> >> figures out what sort of buffer can be allocated by who for sharing.
>> >> Ie $new_thing should really not care about winsys level things like
>> >> cursors or surfaces.. only buffers.
>> >>
>> >> The mesa implementation of $new_thing could sit on top of GBM,
>> >> although it could also just sit on top of the same internal APIs that
>> >> GBM sits on top of.  That is an implementation detail.  It could be
>> >> that GBM grows an API to return an instance of $new_thing for
>> >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
>> >> is exposed via some sort of EGL extension.  (We probably also need a
>> >> way to get an instance from libdrm (?) for display-only KMS drivers,
>> >> to cover cases like etnaviv sharing a buffer with a separate display
>> >> driver.)
>> >>
>> >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>> >> more GPUs, but also includes non-GPU devices like camera, video
>> >> decoder, "image processor" (which may or may not be part of camera),
>> >> etc, etc
>> >
>> >
>> > I'm not quite some sure what I think about this.  I think I would like
>> > to
>> > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
>> > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
>> > sure.  What I don't think I want is to see GBM development continuing on
>> > it's own so we have two competing solutions.
>>
>> I don't really view them as competing.. there is *some* overlap, ie.
>> allocating a buffer.. but even if you are using GBM w/out $new_thing
>> you could allocate a buffer externally and import it.  I don't see
>> $new_thing as that much different from GBM PoV.
>>
>> But things

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Miguel Angel Vico

Many of you may already know, but James is going to be out for a few
weeks and I'll be taking over this in the meantime.

See inline for comments.

On Wed, 29 Nov 2017 09:33:29 -0800
Jason Ekstrand  wrote:

> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:
> 
> > On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
> > wrote:  
> > > On November 24, 2017 09:29:43 Rob Clark  wrote:  
> > >>
> > >>
> > >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones   
> > wrote:  
> > >>>
> > >>> As many here know at this point, I've been working on solving issues
> > >>> related
> > >>> to DMA-capable memory allocation for various devices for some time now.
> > >>> I'd
> > >>> like to take this opportunity to apologize for the way I handled the  
> > EGL  
> > >>> stream proposals.  I understand now that the development process  
> > followed  
> > >>> there was unacceptable to the community and likely offended many great
> > >>> engineers.
> > >>>
> > >>> Moving forward, I attempted to reboot talks in a more constructive  
> > manner  
> > >>> with the generic allocator library proposals & discussion forum at XDC
> > >>> 2016.
> > >>> Some great design ideas came out of that, and I've since been  
> > prototyping  
> > >>> some code to prove them out before bringing them back as official
> > >>> proposals.
> > >>> Again, I understand some people are growing concerned that I've been
> > >>> doing
> > >>> this off on the side in a github project that has primarily NVIDIA
> > >>> contributors.  My goal was only to avoid wasting everyone's time with
> > >>> unproven ideas.  The intent was never to dump the prototype code as-is  
> > on  
> > >>> the community and presume acceptance. It's just a public research
> > >>> project.
> > >>>
> > >>> Now the prototyping is nearing completion, and I'd like to renew
> > >>> discussion
> > >>> on whether and how the new mechanisms can be integrated with the Linux
> > >>> graphics stack.
> > >>>
> > >>> I'd be interested to know if more work is needed to demonstrate the
> > >>> usefulness of the new mechanisms, or whether people think they have  
> > value  
> > >>> at
> > >>> this point.
> > >>>
> > >>> After talking with people on the hallway track at XDC this year, I've
> > >>> heard
> > >>> several proposals for incorporating the new mechanisms:
> > >>>
> > >>> -Include ideas from the generic allocator design into GBM.  This could
> > >>> take
> > >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
> > >>> existing GBM API.
> > >>>
> > >>> -Develop a library to replace GBM.  The allocator prototype code could  
> > be  
> > >>> massaged into something production worthy to jump start this process.
> > >>>
> > >>> -Develop a library that sits beside or on top of GBM, using GBM for
> > >>> low-level graphics buffer allocation, while supporting non-graphics
> > >>> kernel
> > >>> APIs directly.  The additional cross-device negotiation and sorting of
> > >>> capabilities would be handled in this slightly higher-level API before
> > >>> handing off to GBM and other APIs for actual allocation somehow.  
> > >>
> > >>
> > >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
> > >> still the "winsys" for running on "bare metal" (ie. kms).  And we
> > >> don't want to saddle $new_thing with aspects of that, but rather have
> > >> it focus on being the thing that in multiple-"device"[1] scenarious
> > >> figures out what sort of buffer can be allocated by who for sharing.
> > >> Ie $new_thing should really not care about winsys level things like
> > >> cursors or surfaces.. only buffers.
> > >>
> > >> The mesa implementation of $new_thing could sit on top of GBM,
> > >> although it could also just sit on top of the same internal APIs that
> > >> GBM sits on top of.  That is an implementation detail.  It could be
> > >> that GBM grows an API to return an instance of $new_thing for
> > >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
> > >> is exposed via some sort of EGL extension.  (We probably also need a
> > >> way to get an instance from libdrm (?) for display-only KMS drivers,
> > >> to cover cases like etnaviv sharing a buffer with a separate display
> > >> driver.)
> > >>
> > >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
> > >> more GPUs, but also includes non-GPU devices like camera, video
> > >> decoder, "image processor" (which may or may not be part of camera),
> > >> etc, etc  
> > >
> > >
> > > I'm not quite some sure what I think about this.  I think I would like to
> > > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
> > > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
> > > sure.  What I don't think I want is to see GBM development continuing on
> > > it's own so we have two competing solutions.  
> >
> > I don't really view them as

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Jason Ekstrand

On Wed, Nov 29, 2017 at 4:19 AM, Nicolai Hähnle  wrote:

> On 25.11.2017 18:46, Jason Ekstrand wrote:
>
>> I'm not quite some sure what I think about this.  I think I would like to
>> see $new_thing at least replace the guts of GBM. Whether GBM becomes a
>> wrapper around $new_thing or $new_thing implements the GBM API, I'm not
>> sure.  What I don't think I want is to see GBM development continuing on
>> it's own so we have two competing solutions.
>>
>> I *think* I like the idea of having $new_thing implement GBM as a
>> deprecated legacy API.  Whether that means we start by pulling GBM out into
>> it's own project or we start over, I don't know.  My feeling is that the
>> current dri_interface is *not* what we want which is why starting with GBM
>> makes me nervous.
>>
>
> Why not?
>
> The most basic part of the dri_interface is just a
> __driDriverGetExtensions_xxx function that returns an array of pointers to
> extension structures derived from __DRIextension.
>
> That is *perfectly fine*.
>

Fair enough.  I'm perfectly happy to re-use a well-tested API extension
mechanism.


> I completely agree if you limit your statement to saying that the current
> *set of extensions* that are exposed by this interface are full of X-isms,
> and it's a good idea to do a thorough house-cleaning in there. This can go
> all the way up to eventually phasing out the DRI_Core "extension" as far as
> I'm concerned.
>

That's more of what I was getting at.  In particular, I don't want the
design of $new_thing to be constrained by trying to cram into the current
DRI extensions nor do I want it to attempt to have exactly the same set of
functionality as the current DRI extensions (or GBM) support.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Jason Ekstrand

On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark  wrote:

> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand 
> wrote:
> > On November 24, 2017 09:29:43 Rob Clark  wrote:
> >>
> >>
> >> On Mon, Nov 20, 2017 at 8:11 PM, James Jones 
> wrote:
> >>>
> >>> As many here know at this point, I've been working on solving issues
> >>> related
> >>> to DMA-capable memory allocation for various devices for some time now.
> >>> I'd
> >>> like to take this opportunity to apologize for the way I handled the
> EGL
> >>> stream proposals.  I understand now that the development process
> followed
> >>> there was unacceptable to the community and likely offended many great
> >>> engineers.
> >>>
> >>> Moving forward, I attempted to reboot talks in a more constructive
> manner
> >>> with the generic allocator library proposals & discussion forum at XDC
> >>> 2016.
> >>> Some great design ideas came out of that, and I've since been
> prototyping
> >>> some code to prove them out before bringing them back as official
> >>> proposals.
> >>> Again, I understand some people are growing concerned that I've been
> >>> doing
> >>> this off on the side in a github project that has primarily NVIDIA
> >>> contributors.  My goal was only to avoid wasting everyone's time with
> >>> unproven ideas.  The intent was never to dump the prototype code as-is
> on
> >>> the community and presume acceptance. It's just a public research
> >>> project.
> >>>
> >>> Now the prototyping is nearing completion, and I'd like to renew
> >>> discussion
> >>> on whether and how the new mechanisms can be integrated with the Linux
> >>> graphics stack.
> >>>
> >>> I'd be interested to know if more work is needed to demonstrate the
> >>> usefulness of the new mechanisms, or whether people think they have
> value
> >>> at
> >>> this point.
> >>>
> >>> After talking with people on the hallway track at XDC this year, I've
> >>> heard
> >>> several proposals for incorporating the new mechanisms:
> >>>
> >>> -Include ideas from the generic allocator design into GBM.  This could
> >>> take
> >>> the form of designing a "GBM 2.0" API, or incrementally adding to the
> >>> existing GBM API.
> >>>
> >>> -Develop a library to replace GBM.  The allocator prototype code could
> be
> >>> massaged into something production worthy to jump start this process.
> >>>
> >>> -Develop a library that sits beside or on top of GBM, using GBM for
> >>> low-level graphics buffer allocation, while supporting non-graphics
> >>> kernel
> >>> APIs directly.  The additional cross-device negotiation and sorting of
> >>> capabilities would be handled in this slightly higher-level API before
> >>> handing off to GBM and other APIs for actual allocation somehow.
> >>
> >>
> >> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
> >> still the "winsys" for running on "bare metal" (ie. kms).  And we
> >> don't want to saddle $new_thing with aspects of that, but rather have
> >> it focus on being the thing that in multiple-"device"[1] scenarious
> >> figures out what sort of buffer can be allocated by who for sharing.
> >> Ie $new_thing should really not care about winsys level things like
> >> cursors or surfaces.. only buffers.
> >>
> >> The mesa implementation of $new_thing could sit on top of GBM,
> >> although it could also just sit on top of the same internal APIs that
> >> GBM sits on top of.  That is an implementation detail.  It could be
> >> that GBM grows an API to return an instance of $new_thing for
> >> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
> >> is exposed via some sort of EGL extension.  (We probably also need a
> >> way to get an instance from libdrm (?) for display-only KMS drivers,
> >> to cover cases like etnaviv sharing a buffer with a separate display
> >> driver.)
> >>
> >> [1] where "devices" could be multiple GPUs or multiple APIs for one or
> >> more GPUs, but also includes non-GPU devices like camera, video
> >> decoder, "image processor" (which may or may not be part of camera),
> >> etc, etc
> >
> >
> > I'm not quite some sure what I think about this.  I think I would like to
> > see $new_thing at least replace the guts of GBM. Whether GBM becomes a
> > wrapper around $new_thing or $new_thing implements the GBM API, I'm not
> > sure.  What I don't think I want is to see GBM development continuing on
> > it's own so we have two competing solutions.
>
> I don't really view them as competing.. there is *some* overlap, ie.
> allocating a buffer.. but even if you are using GBM w/out $new_thing
> you could allocate a buffer externally and import it.  I don't see
> $new_thing as that much different from GBM PoV.
>
> But things like surfaces (aka swap chains) seem a bit out of place
> when you are thinking about implementing $new_thing for non-gpu
> devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
> about a (for ex.) camera.  I kinda

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-29 Thread Nicolai Hähnle


On 25.11.2017 18:46, Jason Ekstrand wrote:
I'm not quite some sure what I think about this.  I think I would like 
to see $new_thing at least replace the guts of GBM. Whether GBM becomes 
a wrapper around $new_thing or $new_thing implements the GBM API, I'm 
not sure.  What I don't think I want is to see GBM development 
continuing on it's own so we have two competing solutions.


I *think* I like the idea of having $new_thing implement GBM as a 
deprecated legacy API.  Whether that means we start by pulling GBM out 
into it's own project or we start over, I don't know.  My feeling is 
that the current dri_interface is *not* what we want which is why 
starting with GBM makes me nervous.


Why not?

The most basic part of the dri_interface is just a 
__driDriverGetExtensions_xxx function that returns an array of pointers 
to extension structures derived from __DRIextension.


That is *perfectly fine*.

I completely agree if you limit your statement to saying that the 
current *set of extensions* that are exposed by this interface are full 
of X-isms, and it's a good idea to do a thorough house-cleaning in 
there. This can go all the way up to eventually phasing out the DRI_Core 
"extension" as far as I'm concerned.


I know it's tempting to reinvent the world every couple of years, but 
it's just *better* to find an evolutionary path that makes sense.


Cheers,
Nicolai

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-25 Thread Rob Clark

On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand  wrote:
> On November 24, 2017 09:29:43 Rob Clark  wrote:
>>
>>
>> On Mon, Nov 20, 2017 at 8:11 PM, James Jones  wrote:
>>>
>>> As many here know at this point, I've been working on solving issues
>>> related
>>> to DMA-capable memory allocation for various devices for some time now.
>>> I'd
>>> like to take this opportunity to apologize for the way I handled the EGL
>>> stream proposals.  I understand now that the development process followed
>>> there was unacceptable to the community and likely offended many great
>>> engineers.
>>>
>>> Moving forward, I attempted to reboot talks in a more constructive manner
>>> with the generic allocator library proposals & discussion forum at XDC
>>> 2016.
>>> Some great design ideas came out of that, and I've since been prototyping
>>> some code to prove them out before bringing them back as official
>>> proposals.
>>> Again, I understand some people are growing concerned that I've been
>>> doing
>>> this off on the side in a github project that has primarily NVIDIA
>>> contributors.  My goal was only to avoid wasting everyone's time with
>>> unproven ideas.  The intent was never to dump the prototype code as-is on
>>> the community and presume acceptance. It's just a public research
>>> project.
>>>
>>> Now the prototyping is nearing completion, and I'd like to renew
>>> discussion
>>> on whether and how the new mechanisms can be integrated with the Linux
>>> graphics stack.
>>>
>>> I'd be interested to know if more work is needed to demonstrate the
>>> usefulness of the new mechanisms, or whether people think they have value
>>> at
>>> this point.
>>>
>>> After talking with people on the hallway track at XDC this year, I've
>>> heard
>>> several proposals for incorporating the new mechanisms:
>>>
>>> -Include ideas from the generic allocator design into GBM.  This could
>>> take
>>> the form of designing a "GBM 2.0" API, or incrementally adding to the
>>> existing GBM API.
>>>
>>> -Develop a library to replace GBM.  The allocator prototype code could be
>>> massaged into something production worthy to jump start this process.
>>>
>>> -Develop a library that sits beside or on top of GBM, using GBM for
>>> low-level graphics buffer allocation, while supporting non-graphics
>>> kernel
>>> APIs directly.  The additional cross-device negotiation and sorting of
>>> capabilities would be handled in this slightly higher-level API before
>>> handing off to GBM and other APIs for actual allocation somehow.
>>
>>
>> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
>> still the "winsys" for running on "bare metal" (ie. kms).  And we
>> don't want to saddle $new_thing with aspects of that, but rather have
>> it focus on being the thing that in multiple-"device"[1] scenarious
>> figures out what sort of buffer can be allocated by who for sharing.
>> Ie $new_thing should really not care about winsys level things like
>> cursors or surfaces.. only buffers.
>>
>> The mesa implementation of $new_thing could sit on top of GBM,
>> although it could also just sit on top of the same internal APIs that
>> GBM sits on top of.  That is an implementation detail.  It could be
>> that GBM grows an API to return an instance of $new_thing for
>> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
>> is exposed via some sort of EGL extension.  (We probably also need a
>> way to get an instance from libdrm (?) for display-only KMS drivers,
>> to cover cases like etnaviv sharing a buffer with a separate display
>> driver.)
>>
>> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>> more GPUs, but also includes non-GPU devices like camera, video
>> decoder, "image processor" (which may or may not be part of camera),
>> etc, etc
>
>
> I'm not quite some sure what I think about this.  I think I would like to
> see $new_thing at least replace the guts of GBM. Whether GBM becomes a
> wrapper around $new_thing or $new_thing implements the GBM API, I'm not
> sure.  What I don't think I want is to see GBM development continuing on
> it's own so we have two competing solutions.

I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.

*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-25 Thread Jason Ekstrand


On November 24, 2017 09:29:43 Rob Clark  wrote:


On Mon, Nov 20, 2017 at 8:11 PM, James Jones  wrote:

As many here know at this point, I've been working on solving issues related
to DMA-capable memory allocation for various devices for some time now.  I'd
like to take this opportunity to apologize for the way I handled the EGL
stream proposals.  I understand now that the development process followed
there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive manner
with the generic allocator library proposals & discussion forum at XDC 2016.
Some great design ideas came out of that, and I've since been prototyping
some code to prove them out before bringing them back as official proposals.
Again, I understand some people are growing concerned that I've been doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is on
the community and presume acceptance. It's just a public research project.

Now the prototyping is nearing completion, and I'd like to renew discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have value at
this point.

After talking with people on the hallway track at XDC this year, I've heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could be
massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.


tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc


I'm not quite some sure what I think about this.  I think I would like to 
see $new_thing at least replace the guts of GBM. Whether GBM becomes a 
wrapper around $new_thing or $new_thing implements the GBM API, I'm not 
sure.  What I don't think I want is to see GBM development continuing on 
it's own so we have two competing solutions.


I *think* I like the idea of having $new_thing implement GBM as a 
deprecated legacy API.  Whether that means we start by pulling GBM out into 
it's own project or we start over, I don't know.  My feeling is that the 
current dri_interface is *not* what we want which is why starting with GBM 
makes me nervous.


I need to go read through your code before I can provide a stronger or more 
nuanced opinion.  That's not going to happen before the end of the year.



-I have also heard some general comments that regardless of the relationship
between GBM and the new allocator mechanisms, it might be time to move GBM
out of Mesa so it can be developed as a stand-alone project.  I'd be
interested what others think about that, as it would be something worth
coordinating with any other new development based on or inside of GBM.


+1

We already have at least a couple different non-mesa implementations
of GBM (which afaict tend to lag behind mesa's GBM and cause
headaches).

The extracted part probably isn't much more than a header and shim.
But probably does need to grow some versioning for the backend to know
if, for example,

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-24 Thread Jason Ekstrand


On November 24, 2017 09:45:07 Jason Ekstrand  wrote:


On November 23, 2017 09:00:05 Emil Velikov  wrote:


Hi James,

On 21 November 2017 at 01:11, James Jones  wrote:


-I have also heard some general comments that regardless of the relationship
between GBM and the new allocator mechanisms, it might be time to move GBM
out of Mesa so it can be developed as a stand-alone project.  I'd be
interested what others think about that, as it would be something worth
coordinating with any other new development based on or inside of GBM.


Having a GBM frontend is one thing I've been pondering as well.

Regardless of exact solution wrt the new allocator, having a clear
frontend/backend separation for GBM will be beneficial.
I'll be giving it a stab these days.


I'm not sure what you mean by that.  It currently has something that looks
like separation but it's a joke.  Unless we have a real reason to have
anything other than a dri_interface back-end, I'd rather we just stop
pretending and drop the extra layer of function pointer indirection entirely.


Gah!  I didn't read Rob's email before writing this.  It looks like there 
is a use-case for this.  I'm still a bit skeptical about whether or not we 
really want to extend what we have our if it would be better to start over 
and just require that the new thing also support the current GBM ABI.



--Jason


Disclaimer: Mostly thinking out loud, so please take the following
with grain of salt.

On the details wrt the new allocator project, I think that having a
new lean library would be a good idea.
One could borrow ideas from GBM, but by default no connection between
the two should be required.

That might lead to having a the initial hurdle of porting a bit
harder, but it will allow for more efficient driver implementation.

HTH
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-24 Thread Jason Ekstrand


On November 23, 2017 09:00:05 Emil Velikov  wrote:


Hi James,

On 21 November 2017 at 01:11, James Jones  wrote:


-I have also heard some general comments that regardless of the relationship
between GBM and the new allocator mechanisms, it might be time to move GBM
out of Mesa so it can be developed as a stand-alone project.  I'd be
interested what others think about that, as it would be something worth
coordinating with any other new development based on or inside of GBM.


Having a GBM frontend is one thing I've been pondering as well.

Regardless of exact solution wrt the new allocator, having a clear
frontend/backend separation for GBM will be beneficial.
I'll be giving it a stab these days.


I'm not sure what you mean by that.  It currently has something that looks 
like separation but it's a joke.  Unless we have a real reason to have 
anything other than a dri_interface back-end, I'd rather we just stop 
pretending and drop the extra layer of function pointer indirection entirely.


--Jason


Disclaimer: Mostly thinking out loud, so please take the following
with grain of salt.

On the details wrt the new allocator project, I think that having a
new lean library would be a good idea.
One could borrow ideas from GBM, but by default no connection between
the two should be required.

That might lead to having a the initial hurdle of porting a bit
harder, but it will allow for more efficient driver implementation.

HTH
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-24 Thread Rob Clark

On Mon, Nov 20, 2017 at 8:11 PM, James Jones  wrote:
> As many here know at this point, I've been working on solving issues related
> to DMA-capable memory allocation for various devices for some time now.  I'd
> like to take this opportunity to apologize for the way I handled the EGL
> stream proposals.  I understand now that the development process followed
> there was unacceptable to the community and likely offended many great
> engineers.
>
> Moving forward, I attempted to reboot talks in a more constructive manner
> with the generic allocator library proposals & discussion forum at XDC 2016.
> Some great design ideas came out of that, and I've since been prototyping
> some code to prove them out before bringing them back as official proposals.
> Again, I understand some people are growing concerned that I've been doing
> this off on the side in a github project that has primarily NVIDIA
> contributors.  My goal was only to avoid wasting everyone's time with
> unproven ideas.  The intent was never to dump the prototype code as-is on
> the community and presume acceptance. It's just a public research project.
>
> Now the prototyping is nearing completion, and I'd like to renew discussion
> on whether and how the new mechanisms can be integrated with the Linux
> graphics stack.
>
> I'd be interested to know if more work is needed to demonstrate the
> usefulness of the new mechanisms, or whether people think they have value at
> this point.
>
> After talking with people on the hallway track at XDC this year, I've heard
> several proposals for incorporating the new mechanisms:
>
> -Include ideas from the generic allocator design into GBM.  This could take
> the form of designing a "GBM 2.0" API, or incrementally adding to the
> existing GBM API.
>
> -Develop a library to replace GBM.  The allocator prototype code could be
> massaged into something production worthy to jump start this process.
>
> -Develop a library that sits beside or on top of GBM, using GBM for
> low-level graphics buffer allocation, while supporting non-graphics kernel
> APIs directly.  The additional cross-device negotiation and sorting of
> capabilities would be handled in this slightly higher-level API before
> handing off to GBM and other APIs for actual allocation somehow.

tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc

> -I have also heard some general comments that regardless of the relationship
> between GBM and the new allocator mechanisms, it might be time to move GBM
> out of Mesa so it can be developed as a stand-alone project.  I'd be
> interested what others think about that, as it would be something worth
> coordinating with any other new development based on or inside of GBM.

+1

We already have at least a couple different non-mesa implementations
of GBM (which afaict tend to lag behind mesa's GBM and cause
headaches).

The extracted part probably isn't much more than a header and shim.
But probably does need to grow some versioning for the backend to know
if, for example, gbm->bo_map() is supported.. at least it could
provide stubs that return an error, rather than having link-time fail
if building something w/ $vendor's old gbm implementation.

> And of course I'm open to any other ideas for integration.  Beyond just
> where this code would live, there is much to debate about the mechanisms
> themselves and all the implementation details.  I was just hoping to kick
> things off with something high level to start.

My $0.02, is that the place where devel happens and place to go for
releases could be different.  Either way, I would like to see git tree
for tagged release versions live on fd.o and use the common release
process[2] for generating/uploading release tarballs that distros can
use.

[2]

Re: [Mesa-dev] GBM and the Device Memory Allocator Proposals

2017-11-23 Thread Emil Velikov

Hi James,

On 21 November 2017 at 01:11, James Jones  wrote:

> -I have also heard some general comments that regardless of the relationship
> between GBM and the new allocator mechanisms, it might be time to move GBM
> out of Mesa so it can be developed as a stand-alone project.  I'd be
> interested what others think about that, as it would be something worth
> coordinating with any other new development based on or inside of GBM.
>
Having a GBM frontend is one thing I've been pondering as well.

Regardless of exact solution wrt the new allocator, having a clear
frontend/backend separation for GBM will be beneficial.
I'll be giving it a stab these days.

Disclaimer: Mostly thinking out loud, so please take the following
with grain of salt.

On the details wrt the new allocator project, I think that having a
new lean library would be a good idea.
One could borrow ideas from GBM, but by default no connection between
the two should be required.

That might lead to having a the initial hurdle of porting a bit
harder, but it will allow for more efficient driver implementation.

HTH
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

42 matches

Mail list logo