Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-28 Thread Nicolai Hähnle

Hi Daniel,

On 28.07.2017 12:46, Daniel Stone wrote:

On 28 July 2017 at 10:24, Nicolai Hähnle  wrote:

On 28.07.2017 09:44, Daniel Stone wrote:

No, I don't think it is. Tiled layouts still have a stride: if you
look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
auxiliary compression/fast-clear buffer), iMX/etnaviv
tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
both for DRIImage and KMS interchange, they all specify a stride which
is conceptually the same as linear, if you imagine linear to be 1x1
tiled.

Most tiling users accept any integer units of tiles (buffer width
aligned to tile width), but not always. The NV12MT format used by
Samsung media decoders (also shipped in some Qualcomm SoCs) is
particularly special, IIRC requiring height to be aligned to a
multiple of two tiles.


Fair enough, but I think you need to distinguish between the chosen stride
and stride *requirements*. I do think it makes sense to consider the stride
requirement as part of the format/layout description, but more below.


Right. Stride is a property of one buffer, stride requirements are a
property of the users of that buffer (GPU, display control, media
encode, etc). The requirements also depend on use, e.g. trying to do
rotation through your scanout engine can change those requirements.


Right.



It definitely seems attractive to kill two birds with one stone, but
I'd really much rather not conflate format description/advertisement,
and allocation restriction, into one enum. I'm still on the side of
saying that this is a problem modifiers do not solve, deferring to the
allocator we need anyway in order to determine things like placement
and global optimality (e.g. rotated scanout placing further
restrictions on allocation).


Okay, the original issue here is that the allocator *cannot* determine the
alignment requirement in the use case that prompted this sub-thread.

The use case is PRIME off-loading, where the rendering GPU supports linear
layouts with a 64 byte stride, while the display GPU requires a 256 byte
stride.

The allocator *cannot* solve this issue, because the allocation happens on
the rendering GPU. We need to communicate somehow what the display GPU's
stride requirements are.

How do you propose to do that?


The allocator[0] in itself can't magically reach across processes to
determine desired usage and resolve dependencies. But the entire
design behind it was to be able to solve cross-device usage: between
GPU and scanout, between both of those and media encode/decode, etc.
Obviously it can't do that without help, so winsys will need to gain
protocol in order to express those in terms the allocator will
understand.

The idea was to split information into positive capabilities and
negative constraints. Modifier queries fall into the same boat as
format queries: you're expressing an additional capability ('I can
speak tiled'). Stride alignment, for me, falls into a negative
constraint ('linear allocations must have stride aligned to 256
bytes'). Similarly, placement constraints (VRAM possibly only
accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc)
are in the same boat AFAICT. So this helps solve one side of the
equation, but not the other.


I've been thinking about this some more, and I can see now that the 
changed modifier scheme that I originally proposed does not fit well 
into places where modifiers are used to express buffer properties (e.g. 
DRI3PixmapFromBuffers, DRI3BuffersFromPixmap).


But I see no proposal on how to fix the issue so far. You cannot fully 
separate capabilities from constraints. As is, we (AMD) cannot properly 
implement the proposed DRI3 v1.1: what would we return in 
DRI3GetSupportedModifiers?


The natural option is to return (at least) DRM_FORMAT_MOD_LINEAR, but 
that would be a lie, because we *don't* speak arbitrary linear formats.


I don't think this is difficult to fix in terms of protocol, although 
there's plenty of opportunity for bike-shedding :)


I see roughly two options:

1. Make the constraints per-modifier, and add a "constraints: 
ListOfCard32" (or 64) to the response to DRI3GetSupportedModifiers. We 
can then reserve some bits for global constraints (e.g. placement) and 
some bits on a per-modifier basis (e.g. stride alignment for linear). 
You could build constraints like DRM_CONSTRAINT_PLACEMENT_SYSTEM | 
DRM_CONSTRAINT_LINEAR_STRIDE_256B.


2. Make the constraints global, and add a DRI3GetConstraints protocol 
with the same signature as DRI3GetSupportedModifiers. We'd need vendor 
namespaces for the constraint defines, to support constraints that are 
specific to vendor-specific modifiers. You could have entries like 
DRM_CONSTRAINT_PLACEMENT(DRM_CONSTRAINT_PLACEMENT_SYSTEM) and, as a 
separate list entry, DRM_CONSTRAINT_LINEAR_STRIDE(256).


Cheers,
Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/m

Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-28 Thread Daniel Stone
Hi,

On 28 July 2017 at 10:24, Nicolai Hähnle  wrote:
> On 28.07.2017 09:44, Daniel Stone wrote:
>> No, I don't think it is. Tiled layouts still have a stride: if you
>> look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
>> auxiliary compression/fast-clear buffer), iMX/etnaviv
>> tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
>> both for DRIImage and KMS interchange, they all specify a stride which
>> is conceptually the same as linear, if you imagine linear to be 1x1
>> tiled.
>>
>> Most tiling users accept any integer units of tiles (buffer width
>> aligned to tile width), but not always. The NV12MT format used by
>> Samsung media decoders (also shipped in some Qualcomm SoCs) is
>> particularly special, IIRC requiring height to be aligned to a
>> multiple of two tiles.
>
> Fair enough, but I think you need to distinguish between the chosen stride
> and stride *requirements*. I do think it makes sense to consider the stride
> requirement as part of the format/layout description, but more below.

Right. Stride is a property of one buffer, stride requirements are a
property of the users of that buffer (GPU, display control, media
encode, etc). The requirements also depend on use, e.g. trying to do
rotation through your scanout engine can change those requirements.

>> It definitely seems attractive to kill two birds with one stone, but
>> I'd really much rather not conflate format description/advertisement,
>> and allocation restriction, into one enum. I'm still on the side of
>> saying that this is a problem modifiers do not solve, deferring to the
>> allocator we need anyway in order to determine things like placement
>> and global optimality (e.g. rotated scanout placing further
>> restrictions on allocation).
>
> Okay, the original issue here is that the allocator *cannot* determine the
> alignment requirement in the use case that prompted this sub-thread.
>
> The use case is PRIME off-loading, where the rendering GPU supports linear
> layouts with a 64 byte stride, while the display GPU requires a 256 byte
> stride.
>
> The allocator *cannot* solve this issue, because the allocation happens on
> the rendering GPU. We need to communicate somehow what the display GPU's
> stride requirements are.
>
> How do you propose to do that?

The allocator[0] in itself can't magically reach across processes to
determine desired usage and resolve dependencies. But the entire
design behind it was to be able to solve cross-device usage: between
GPU and scanout, between both of those and media encode/decode, etc.
Obviously it can't do that without help, so winsys will need to gain
protocol in order to express those in terms the allocator will
understand.

The idea was to split information into positive capabilities and
negative constraints. Modifier queries fall into the same boat as
format queries: you're expressing an additional capability ('I can
speak tiled'). Stride alignment, for me, falls into a negative
constraint ('linear allocations must have stride aligned to 256
bytes'). Similarly, placement constraints (VRAM possibly only
accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc)
are in the same boat AFAICT. So this helps solve one side of the
equation, but not the other.

Cheers,
Daniel

[0]: Read as, 'the design we discussed for the allocator at XDC'.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-28 Thread Nicolai Hähnle

On 28.07.2017 09:44, Daniel Stone wrote:

Hi Nicolai,
Trying to tackle the stride subthread in one go ...

On 25 July 2017 at 09:28, Nicolai Hähnle  wrote:

On 22.07.2017 14:00, Daniel Stone wrote:

On 21 July 2017 at 18:32, Michel Dänzer  wrote:

We just ran into an issue which might mean that there's still something
missing in this v2 proposal:

The context is DRI3 PRIME render offloading of glxgears (not useful in
practice, but bear with me). The display GPU is Raven Ridge, which
requires
that the stride even of linear textures is a multiple of 256 bytes. The
renderer GPU is Polaris12, which still supports smaller alignment of the
stride. With the glxgears window of width 300, the renderer GPU driver
chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would
require 320 (* 4 / 256 = 5). This cannot work.



The obvious answer is just to increase padding on external/winsys
surfaces? Increasing it for all allocations would probably be a
non-starter, but winsys surfaces are rare enough that you could
probably afford to take the hit, I guess.


I see two basic approaches to solve this:
[...]
Maybe there are other possible approaches I'm missing? Other comments?


I don't have any great solution off the top of my head, but I'd be
inclined to bundle stride in with placement. TTBOMK (from having
looked at radv), buffers shared cross-GPU also need to be allocated
from a separate externally-visible memory heap. And at the moment,
lacking placement information at allocation time (at least for EGL
allocations, via DRIImage), that happens via transparent migration at
import time I think. Placement restrictions would probably also
involve communicating base address alignment requirements.


Stride isn't really in the same category as placement and base address
alignment, though.

Placement and base address alignment requirements can apply to all possible
texture layouts, while the concept of stride is specific to linear layouts.


No, I don't think it is. Tiled layouts still have a stride: if you
look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
auxiliary compression/fast-clear buffer), iMX/etnaviv
tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
both for DRIImage and KMS interchange, they all specify a stride which
is conceptually the same as linear, if you imagine linear to be 1x1
tiled.

Most tiling users accept any integer units of tiles (buffer width
aligned to tile width), but not always. The NV12MT format used by
Samsung media decoders (also shipped in some Qualcomm SoCs) is
particularly special, IIRC requiring height to be aligned to a
multiple of two tiles.


Fair enough, but I think you need to distinguish between the chosen 
stride and stride *requirements*. I do think it makes sense to consider 
the stride requirement as part of the format/layout description, but 
more below.




Given that, I'm fairly inclined to punt those until we have the grand
glorious allocator, rather than trying to add it to EGL/GBM
separately. The modifiers stuff was a fairly obvious augmentation -
EGL already had no-modifier format import but no query as to which
formats it would accept, and modifiers are a logical extension of
format - but adding the other restrictions is a bigger step forward.


Perhaps a third option would be to encode the required pitch_in_bytes
alignment as part of the modifier?

So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when
the display GPU is a Raven Ridge.

More generally, we could say that fourcc_mod_code(NONE, k) means that the
pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you
prefer, we could have a stride requirement in elements or pixels instead of
bytes.


I've been thinking this over for the past couple of days, and we could
make it work in clients. But we already have DRM_FORMAT_MOD_LINEAR
with a well-defined meaning, implemented in quite a few places. I
think special-casing linear to encode stride alignment is something
we'd regret in the long run, especially given that some hardware
_does_ have more strict alignment requirements for tiled modes than
just an integer number of tiles allocated.

AFAICT, both AMD and NVIDIA are both going to use a fair bit of the
tiling enum space to encode tile size as well as layout. If allocation
alignment requirements (in both dimensions) needed to be added to
that, it's entirely likely that there wouldn't be enough space and
you'd need to put it somewhere else than the modifier, in which case
we've not even really solved the problem.


At least for AMD, the alignment requirements are de facto part of the 
tiling description, so I don't think it makes a difference.




It definitely seems attractive to kill two birds with one stone, but
I'd really much rather not conflate format description/advertisement,
and allocation restriction, into one enum. I'm still on the side of
saying that this is a problem modifiers do not solve, deferring to the
allocator we need anyway

Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-28 Thread Daniel Stone
Hi Nicolai,
Trying to tackle the stride subthread in one go ...

On 25 July 2017 at 09:28, Nicolai Hähnle  wrote:
> On 22.07.2017 14:00, Daniel Stone wrote:
>> On 21 July 2017 at 18:32, Michel Dänzer  wrote:
>>> We just ran into an issue which might mean that there's still something
>>> missing in this v2 proposal:
>>>
>>> The context is DRI3 PRIME render offloading of glxgears (not useful in
>>> practice, but bear with me). The display GPU is Raven Ridge, which
>>> requires
>>> that the stride even of linear textures is a multiple of 256 bytes. The
>>> renderer GPU is Polaris12, which still supports smaller alignment of the
>>> stride. With the glxgears window of width 300, the renderer GPU driver
>>> chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would
>>> require 320 (* 4 / 256 = 5). This cannot work.
>>
>>
>> The obvious answer is just to increase padding on external/winsys
>> surfaces? Increasing it for all allocations would probably be a
>> non-starter, but winsys surfaces are rare enough that you could
>> probably afford to take the hit, I guess.
>>
>>> I see two basic approaches to solve this:
>>> [...]
>>> Maybe there are other possible approaches I'm missing? Other comments?
>>
>> I don't have any great solution off the top of my head, but I'd be
>> inclined to bundle stride in with placement. TTBOMK (from having
>> looked at radv), buffers shared cross-GPU also need to be allocated
>> from a separate externally-visible memory heap. And at the moment,
>> lacking placement information at allocation time (at least for EGL
>> allocations, via DRIImage), that happens via transparent migration at
>> import time I think. Placement restrictions would probably also
>> involve communicating base address alignment requirements.
>
> Stride isn't really in the same category as placement and base address
> alignment, though.
>
> Placement and base address alignment requirements can apply to all possible
> texture layouts, while the concept of stride is specific to linear layouts.

No, I don't think it is. Tiled layouts still have a stride: if you
look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
auxiliary compression/fast-clear buffer), iMX/etnaviv
tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
both for DRIImage and KMS interchange, they all specify a stride which
is conceptually the same as linear, if you imagine linear to be 1x1
tiled.

Most tiling users accept any integer units of tiles (buffer width
aligned to tile width), but not always. The NV12MT format used by
Samsung media decoders (also shipped in some Qualcomm SoCs) is
particularly special, IIRC requiring height to be aligned to a
multiple of two tiles.

>> Given that, I'm fairly inclined to punt those until we have the grand
>> glorious allocator, rather than trying to add it to EGL/GBM
>> separately. The modifiers stuff was a fairly obvious augmentation -
>> EGL already had no-modifier format import but no query as to which
>> formats it would accept, and modifiers are a logical extension of
>> format - but adding the other restrictions is a bigger step forward.
>
> Perhaps a third option would be to encode the required pitch_in_bytes
> alignment as part of the modifier?
>
> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when
> the display GPU is a Raven Ridge.
>
> More generally, we could say that fourcc_mod_code(NONE, k) means that the
> pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you
> prefer, we could have a stride requirement in elements or pixels instead of
> bytes.

I've been thinking this over for the past couple of days, and we could
make it work in clients. But we already have DRM_FORMAT_MOD_LINEAR
with a well-defined meaning, implemented in quite a few places. I
think special-casing linear to encode stride alignment is something
we'd regret in the long run, especially given that some hardware
_does_ have more strict alignment requirements for tiled modes than
just an integer number of tiles allocated.

AFAICT, both AMD and NVIDIA are both going to use a fair bit of the
tiling enum space to encode tile size as well as layout. If allocation
alignment requirements (in both dimensions) needed to be added to
that, it's entirely likely that there wouldn't be enough space and
you'd need to put it somewhere else than the modifier, in which case
we've not even really solved the problem.

It definitely seems attractive to kill two birds with one stone, but
I'd really much rather not conflate format description/advertisement,
and allocation restriction, into one enum. I'm still on the side of
saying that this is a problem modifiers do not solve, deferring to the
allocator we need anyway in order to determine things like placement
and global optimality (e.g. rotated scanout placing further
restrictions on allocation).

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.fr

Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-27 Thread Michel Dänzer
On 27/07/17 04:08 PM, Nicolai Hähnle wrote:
> On 27.07.2017 03:14, Michel Dänzer wrote:
>> On 26/07/17 09:15 PM, Nicolai Hähnle wrote:
>>> On 26.07.2017 08:29, Michel Dänzer wrote:
 On 25/07/17 05:28 PM, Nicolai Hähnle wrote:
> On 22.07.2017 14:00, Daniel Stone wrote:
>>
>> Given that, I'm fairly inclined to punt those until we have the grand
>> glorious allocator, rather than trying to add it to EGL/GBM
>> separately. The modifiers stuff was a fairly obvious augmentation -
>> EGL already had no-modifier format import but no query as to which
>> formats it would accept, and modifiers are a logical extension of
>> format - but adding the other restrictions is a bigger step forward.
>
> Perhaps a third option would be to encode the required pitch_in_bytes
> alignment as part of the modifier?
>
> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B
> when the display GPU is a Raven Ridge.
>
> More generally, we could say that fourcc_mod_code(NONE, k) means that
> the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31.
> Or if
> you prefer, we could have a stride requirement in elements or pixels
> instead of bytes.

 Interesting idea. FWIW, I suspect we'd need to support specifying the
 requirement in both bytes or pixels, since one or the other alone may
 not be sufficient to describe the constraints of all hardware.
>>>
>>>  From what I've seen, modifiers are always specified together with one
>>> specific format, so the bytes-per-pixel are always known, so I don't
>>> think we need both.
>>
>> The proposal adds two DRI3 extension requests for querying the list of
>> supported formats and modifiers, respectively. This suggests that the
>> supported formats and modifiers can be freely combined.
> 
> Which are these? I only saw DRI3GetSupportedModifiers, which takes both
> a window and a format argument.

You're right, I missed the format argument.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-27 Thread Nicolai Hähnle

On 27.07.2017 03:14, Michel Dänzer wrote:

On 26/07/17 09:15 PM, Nicolai Hähnle wrote:

On 26.07.2017 08:29, Michel Dänzer wrote:

On 25/07/17 05:28 PM, Nicolai Hähnle wrote:

On 22.07.2017 14:00, Daniel Stone wrote:


Given that, I'm fairly inclined to punt those until we have the grand
glorious allocator, rather than trying to add it to EGL/GBM
separately. The modifiers stuff was a fairly obvious augmentation -
EGL already had no-modifier format import but no query as to which
formats it would accept, and modifiers are a logical extension of
format - but adding the other restrictions is a bigger step forward.


Perhaps a third option would be to encode the required pitch_in_bytes
alignment as part of the modifier?

So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B
when the display GPU is a Raven Ridge.

More generally, we could say that fourcc_mod_code(NONE, k) means that
the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if
you prefer, we could have a stride requirement in elements or pixels
instead of bytes.


Interesting idea. FWIW, I suspect we'd need to support specifying the
requirement in both bytes or pixels, since one or the other alone may
not be sufficient to describe the constraints of all hardware.


 From what I've seen, modifiers are always specified together with one
specific format, so the bytes-per-pixel are always known, so I don't
think we need both.


The proposal adds two DRI3 extension requests for querying the list of
supported formats and modifiers, respectively. This suggests that the
supported formats and modifiers can be freely combined.


Which are these? I only saw DRI3GetSupportedModifiers, which takes both 
a window and a format argument.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-26 Thread Michel Dänzer
On 26/07/17 09:15 PM, Nicolai Hähnle wrote:
> On 26.07.2017 08:29, Michel Dänzer wrote:
>> On 25/07/17 05:28 PM, Nicolai Hähnle wrote:
>>> On 22.07.2017 14:00, Daniel Stone wrote:

 Given that, I'm fairly inclined to punt those until we have the grand
 glorious allocator, rather than trying to add it to EGL/GBM
 separately. The modifiers stuff was a fairly obvious augmentation -
 EGL already had no-modifier format import but no query as to which
 formats it would accept, and modifiers are a logical extension of
 format - but adding the other restrictions is a bigger step forward.
>>>
>>> Perhaps a third option would be to encode the required pitch_in_bytes
>>> alignment as part of the modifier?
>>>
>>> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B
>>> when the display GPU is a Raven Ridge.
>>>
>>> More generally, we could say that fourcc_mod_code(NONE, k) means that
>>> the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if
>>> you prefer, we could have a stride requirement in elements or pixels
>>> instead of bytes.
>>
>> Interesting idea. FWIW, I suspect we'd need to support specifying the
>> requirement in both bytes or pixels, since one or the other alone may
>> not be sufficient to describe the constraints of all hardware.
> 
> From what I've seen, modifiers are always specified together with one
> specific format, so the bytes-per-pixel are always known, so I don't
> think we need both.

The proposal adds two DRI3 extension requests for querying the list of
supported formats and modifiers, respectively. This suggests that the
supported formats and modifiers can be freely combined.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-26 Thread Nicolai Hähnle

On 26.07.2017 19:42, Marek Olšák wrote:

On Wed, Jul 26, 2017 at 7:05 PM, Alex Deucher  wrote:

On Wed, Jul 26, 2017 at 8:15 AM, Nicolai Hähnle  wrote:

On 26.07.2017 08:29, Michel Dänzer wrote:


On 25/07/17 05:28 PM, Nicolai Hähnle wrote:


On 22.07.2017 14:00, Daniel Stone wrote:



I don't have any great solution off the top of my head, but I'd be
inclined to bundle stride in with placement. TTBOMK (from having
looked at radv), buffers shared cross-GPU also need to be allocated
from a separate externally-visible memory heap. And at the moment,
lacking placement information at allocation time (at least for EGL
allocations, via DRIImage), that happens via transparent migration at
import time I think. Placement restrictions would probably also
involve communicating base address alignment requirements.



Stride isn't really in the same category as placement and base address
alignment, though.

Placement and base address alignment requirements can apply to all
possible texture layouts, while the concept of stride is specific to
linear layouts.



Also, the starting address of shareable buffers is generally aligned to
at least the CPU page size anyway. Do we know of any cases requiring
higher alignment than that, and if so, which address space does the
requirement apply to?



Only tiling modes, as Marek mentioned. We don't do tiling shares across
different GPUs right now.

Maybe we can do it in the future with gfx9 GPUs. But then the alignment
requirements should be known on both sides based on the tiling mode anyway
-- if they even apply for non-VRAM textures.


We should be able to do some 1D tiling modes.  That doesn't have any
per sku alignment dependencies.


Yeah, I think 1D tiling for displayable 32bpp is compatible across all
radeon GPUs newer than R600.

All non-X non-VAR tiling modes on Radeon/GFX9 (Vega, Raven) are the
same on all GFX9 GPUs and might be the same on all future products.
The only catch is that X modes are better optimized for the memory
config, so non-X modes can be slower. I think the non-X modes might
also be compatible with Intel (the first 12 at least), so some
cross-vendor interface might be possible. All GFX9 tiling modes:


Right. It might be worth it to try to use some of these tiling modes to 
make PRIME a bit more efficient in some cases. Non-X modes may be 
non-optimal, but certainly they're better than linear :)


Cheers,
Nicolai




SW_LINEAR (256B pitch alignment)
SW_256B_S
SW_256B_D (compatible with older Radeons if bpp == 32)
SW_256B_R (compatible with older Radeons if bpp == 32)
SW_4KB_Z (Z = depth/stencil sample order)
SW_4KB_S (S = standard)
SW_4KB_D (D = displayable)
SW_4KB_R (R = displayable rotated)
SW_64KB_Z
SW_64KB_S
SW_64KB_D
SW_64KB_R
SW_VAR_Z (VAR = tile size depends on memory config)
SW_VAR_S
SW_VAR_D
SW_VAR_R
SW_64KB_Z_T
SW_64KB_S_T
SW_64KB_D_T
SW_64KB_R_T
SW_4KB_Z_X (X = optimized for memory config)
SW_4KB_S_X
SW_4KB_D_X
SW_4KB_R_X
SW_64KB_Z_X
SW_64KB_S_X
SW_64KB_D_X
SW_64KB_R_X
SW_VAR_Z_X
SW_VAR_S_X
SW_VAR_D_X
SW_VAR_R_X

Marek




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-26 Thread Alex Deucher
On Wed, Jul 26, 2017 at 8:15 AM, Nicolai Hähnle  wrote:
> On 26.07.2017 08:29, Michel Dänzer wrote:
>>
>> On 25/07/17 05:28 PM, Nicolai Hähnle wrote:
>>>
>>> On 22.07.2017 14:00, Daniel Stone wrote:


 I don't have any great solution off the top of my head, but I'd be
 inclined to bundle stride in with placement. TTBOMK (from having
 looked at radv), buffers shared cross-GPU also need to be allocated
 from a separate externally-visible memory heap. And at the moment,
 lacking placement information at allocation time (at least for EGL
 allocations, via DRIImage), that happens via transparent migration at
 import time I think. Placement restrictions would probably also
 involve communicating base address alignment requirements.
>>>
>>>
>>> Stride isn't really in the same category as placement and base address
>>> alignment, though.
>>>
>>> Placement and base address alignment requirements can apply to all
>>> possible texture layouts, while the concept of stride is specific to
>>> linear layouts.
>>
>>
>> Also, the starting address of shareable buffers is generally aligned to
>> at least the CPU page size anyway. Do we know of any cases requiring
>> higher alignment than that, and if so, which address space does the
>> requirement apply to?
>
>
> Only tiling modes, as Marek mentioned. We don't do tiling shares across
> different GPUs right now.
>
> Maybe we can do it in the future with gfx9 GPUs. But then the alignment
> requirements should be known on both sides based on the tiling mode anyway
> -- if they even apply for non-VRAM textures.

We should be able to do some 1D tiling modes.  That doesn't have any
per sku alignment dependencies.

Alex

>
>
 Given that, I'm fairly inclined to punt those until we have the grand
 glorious allocator, rather than trying to add it to EGL/GBM
 separately. The modifiers stuff was a fairly obvious augmentation -
 EGL already had no-modifier format import but no query as to which
 formats it would accept, and modifiers are a logical extension of
 format - but adding the other restrictions is a bigger step forward.
>>>
>>>
>>> Perhaps a third option would be to encode the required pitch_in_bytes
>>> alignment as part of the modifier?
>>>
>>> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B
>>> when the display GPU is a Raven Ridge.
>>>
>>> More generally, we could say that fourcc_mod_code(NONE, k) means that
>>> the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if
>>> you prefer, we could have a stride requirement in elements or pixels
>>> instead of bytes.
>>
>>
>> Interesting idea. FWIW, I suspect we'd need to support specifying the
>> requirement in both bytes or pixels, since one or the other alone may
>> not be sufficient to describe the constraints of all hardware.
>
>
> From what I've seen, modifiers are always specified together with one
> specific format, so the bytes-per-pixel are always known, so I don't think
> we need both. Specifying it in bytes is a bit more natural for our hardware,
> that's all.
>
> Cheers,
> Nicolai
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-26 Thread Nicolai Hähnle

On 26.07.2017 08:29, Michel Dänzer wrote:

On 25/07/17 05:28 PM, Nicolai Hähnle wrote:

On 22.07.2017 14:00, Daniel Stone wrote:


I don't have any great solution off the top of my head, but I'd be
inclined to bundle stride in with placement. TTBOMK (from having
looked at radv), buffers shared cross-GPU also need to be allocated
from a separate externally-visible memory heap. And at the moment,
lacking placement information at allocation time (at least for EGL
allocations, via DRIImage), that happens via transparent migration at
import time I think. Placement restrictions would probably also
involve communicating base address alignment requirements.


Stride isn't really in the same category as placement and base address
alignment, though.

Placement and base address alignment requirements can apply to all
possible texture layouts, while the concept of stride is specific to
linear layouts.


Also, the starting address of shareable buffers is generally aligned to
at least the CPU page size anyway. Do we know of any cases requiring
higher alignment than that, and if so, which address space does the
requirement apply to?


Only tiling modes, as Marek mentioned. We don't do tiling shares across 
different GPUs right now.


Maybe we can do it in the future with gfx9 GPUs. But then the alignment 
requirements should be known on both sides based on the tiling mode 
anyway -- if they even apply for non-VRAM textures.




Given that, I'm fairly inclined to punt those until we have the grand
glorious allocator, rather than trying to add it to EGL/GBM
separately. The modifiers stuff was a fairly obvious augmentation -
EGL already had no-modifier format import but no query as to which
formats it would accept, and modifiers are a logical extension of
format - but adding the other restrictions is a bigger step forward.


Perhaps a third option would be to encode the required pitch_in_bytes
alignment as part of the modifier?

So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B
when the display GPU is a Raven Ridge.

More generally, we could say that fourcc_mod_code(NONE, k) means that
the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if
you prefer, we could have a stride requirement in elements or pixels
instead of bytes.


Interesting idea. FWIW, I suspect we'd need to support specifying the
requirement in both bytes or pixels, since one or the other alone may
not be sufficient to describe the constraints of all hardware.


From what I've seen, modifiers are always specified together with one 
specific format, so the bytes-per-pixel are always known, so I don't 
think we need both. Specifying it in bytes is a bit more natural for our 
hardware, that's all.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-26 Thread Marek Olšák
On Wed, Jul 26, 2017 at 8:29 AM, Michel Dänzer  wrote:
> On 25/07/17 05:28 PM, Nicolai Hähnle wrote:
>> On 22.07.2017 14:00, Daniel Stone wrote:
>>>
>>> I don't have any great solution off the top of my head, but I'd be
>>> inclined to bundle stride in with placement. TTBOMK (from having
>>> looked at radv), buffers shared cross-GPU also need to be allocated
>>> from a separate externally-visible memory heap. And at the moment,
>>> lacking placement information at allocation time (at least for EGL
>>> allocations, via DRIImage), that happens via transparent migration at
>>> import time I think. Placement restrictions would probably also
>>> involve communicating base address alignment requirements.
>>
>> Stride isn't really in the same category as placement and base address
>> alignment, though.
>>
>> Placement and base address alignment requirements can apply to all
>> possible texture layouts, while the concept of stride is specific to
>> linear layouts.
>
> Also, the starting address of shareable buffers is generally aligned to
> at least the CPU page size anyway. Do we know of any cases requiring
> higher alignment than that, and if so, which address space does the
> requirement apply to?

The highest base address alignment I know of is 2D tiling on Fiji =
256KB alignment.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-25 Thread Michel Dänzer
On 25/07/17 05:28 PM, Nicolai Hähnle wrote:
> On 22.07.2017 14:00, Daniel Stone wrote:
>>
>> I don't have any great solution off the top of my head, but I'd be
>> inclined to bundle stride in with placement. TTBOMK (from having
>> looked at radv), buffers shared cross-GPU also need to be allocated
>> from a separate externally-visible memory heap. And at the moment,
>> lacking placement information at allocation time (at least for EGL
>> allocations, via DRIImage), that happens via transparent migration at
>> import time I think. Placement restrictions would probably also
>> involve communicating base address alignment requirements.
> 
> Stride isn't really in the same category as placement and base address
> alignment, though.
> 
> Placement and base address alignment requirements can apply to all
> possible texture layouts, while the concept of stride is specific to
> linear layouts.

Also, the starting address of shareable buffers is generally aligned to
at least the CPU page size anyway. Do we know of any cases requiring
higher alignment than that, and if so, which address space does the
requirement apply to?


>> Given that, I'm fairly inclined to punt those until we have the grand
>> glorious allocator, rather than trying to add it to EGL/GBM
>> separately. The modifiers stuff was a fairly obvious augmentation -
>> EGL already had no-modifier format import but no query as to which
>> formats it would accept, and modifiers are a logical extension of
>> format - but adding the other restrictions is a bigger step forward.
> 
> Perhaps a third option would be to encode the required pitch_in_bytes
> alignment as part of the modifier?
> 
> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B
> when the display GPU is a Raven Ridge.
> 
> More generally, we could say that fourcc_mod_code(NONE, k) means that
> the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if
> you prefer, we could have a stride requirement in elements or pixels
> instead of bytes.

Interesting idea. FWIW, I suspect we'd need to support specifying the
requirement in both bytes or pixels, since one or the other alone may
not be sufficient to describe the constraints of all hardware.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-25 Thread Nicolai Hähnle

On 22.07.2017 14:00, Daniel Stone wrote:

On 21 July 2017 at 18:32, Michel Dänzer  wrote:

On 20/07/17 01:08 PM, Daniel Stone wrote:

DRI3 version 1.1 adds support for explicit format modifiers, including
multi-planar buffers.


Adding mesa-dev, Nicolai and Marek.

We just ran into an issue which might mean that there's still something
missing in this v2 proposal:

The context is DRI3 PRIME render offloading of glxgears (not useful in
practice, but bear with me). The display GPU is Raven Ridge, which requires
that the stride even of linear textures is a multiple of 256 bytes. The
renderer GPU is Polaris12, which still supports smaller alignment of the
stride. With the glxgears window of width 300, the renderer GPU driver
chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would
require 320 (* 4 / 256 = 5). This cannot work.


The obvious answer is just to increase padding on external/winsys
surfaces? Increasing it for all allocations would probably be a
non-starter, but winsys surfaces are rare enough that you could
probably afford to take the hit, I guess.


I see two basic approaches to solve this:

1. A protocol request for the client to retrieve the display
GPU constraints on the stride (and possibly other parameters) for a
given format and modifier.


+ corresponding new EGL request and new GBM/KMS API :\


2. A protocol request which allows the creation of a pixmap with
given format and modifier. The renderer GPU driver needs to pass in
the stride it would choose, then the display GPU driver can choose a
stride satisfying the constraints on both sides.


Heh, that sounds familiar - DRI2!


Maybe there are other possible approaches I'm missing? Other comments?


I don't have any great solution off the top of my head, but I'd be
inclined to bundle stride in with placement. TTBOMK (from having
looked at radv), buffers shared cross-GPU also need to be allocated
from a separate externally-visible memory heap. And at the moment,
lacking placement information at allocation time (at least for EGL
allocations, via DRIImage), that happens via transparent migration at
import time I think. Placement restrictions would probably also
involve communicating base address alignment requirements.


Stride isn't really in the same category as placement and base address 
alignment, though.


Placement and base address alignment requirements can apply to all 
possible texture layouts, while the concept of stride is specific to 
linear layouts.




Given that, I'm fairly inclined to punt those until we have the grand
glorious allocator, rather than trying to add it to EGL/GBM
separately. The modifiers stuff was a fairly obvious augmentation -
EGL already had no-modifier format import but no query as to which
formats it would accept, and modifiers are a logical extension of
format - but adding the other restrictions is a bigger step forward.


Perhaps a third option would be to encode the required pitch_in_bytes 
alignment as part of the modifier?


So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B 
when the display GPU is a Raven Ridge.


More generally, we could say that fourcc_mod_code(NONE, k) means that 
the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if 
you prefer, we could have a stride requirement in elements or pixels 
instead of bytes.


Cheers,
Nicolai






Cheers,
Daniel




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-22 Thread Daniel Stone
Hi Michel,

On 21 July 2017 at 18:32, Michel Dänzer  wrote:
> On 20/07/17 01:08 PM, Daniel Stone wrote:
>> DRI3 version 1.1 adds support for explicit format modifiers, including
>> multi-planar buffers.
>
> Adding mesa-dev, Nicolai and Marek.
>
> We just ran into an issue which might mean that there's still something
> missing in this v2 proposal:
>
> The context is DRI3 PRIME render offloading of glxgears (not useful in
> practice, but bear with me). The display GPU is Raven Ridge, which requires
> that the stride even of linear textures is a multiple of 256 bytes. The
> renderer GPU is Polaris12, which still supports smaller alignment of the
> stride. With the glxgears window of width 300, the renderer GPU driver
> chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would
> require 320 (* 4 / 256 = 5). This cannot work.

The obvious answer is just to increase padding on external/winsys
surfaces? Increasing it for all allocations would probably be a
non-starter, but winsys surfaces are rare enough that you could
probably afford to take the hit, I guess.

> I see two basic approaches to solve this:
>
> 1. A protocol request for the client to retrieve the display
>GPU constraints on the stride (and possibly other parameters) for a
>given format and modifier.

+ corresponding new EGL request and new GBM/KMS API :\

> 2. A protocol request which allows the creation of a pixmap with
>given format and modifier. The renderer GPU driver needs to pass in
>the stride it would choose, then the display GPU driver can choose a
>stride satisfying the constraints on both sides.

Heh, that sounds familiar - DRI2!

> Maybe there are other possible approaches I'm missing? Other comments?

I don't have any great solution off the top of my head, but I'd be
inclined to bundle stride in with placement. TTBOMK (from having
looked at radv), buffers shared cross-GPU also need to be allocated
from a separate externally-visible memory heap. And at the moment,
lacking placement information at allocation time (at least for EGL
allocations, via DRIImage), that happens via transparent migration at
import time I think. Placement restrictions would probably also
involve communicating base address alignment requirements.

Given that, I'm fairly inclined to punt those until we have the grand
glorious allocator, rather than trying to add it to EGL/GBM
separately. The modifiers stuff was a fairly obvious augmentation -
EGL already had no-modifier format import but no query as to which
formats it would accept, and modifiers are a logical extension of
format - but adding the other restrictions is a bigger step forward.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1

2017-07-21 Thread Michel Dänzer

On 20/07/17 01:08 PM, Daniel Stone wrote:

DRI3 version 1.1 adds support for explicit format modifiers, including
multi-planar buffers.


Adding mesa-dev, Nicolai and Marek.

We just ran into an issue which might mean that there's still something 
missing in this v2 proposal:


The context is DRI3 PRIME render offloading of glxgears (not useful in 
practice, but bear with me). The display GPU is Raven Ridge, which 
requires that the stride even of linear textures is a multiple of 256 
bytes. The renderer GPU is Polaris12, which still supports smaller 
alignment of the stride. With the glxgears window of width 300, the 
renderer GPU driver chooses a stride of 304 (* 4 / 256 = 4.75), whereas 
the display GPU would require 320 (* 4 / 256 = 5). This cannot work.



I see two basic approaches to solve this:

1. A protocol request for the client to retrieve the display
   GPU constraints on the stride (and possibly other parameters) for a
   given format and modifier.

2. A protocol request which allows the creation of a pixmap with
   given format and modifier. The renderer GPU driver needs to pass in
   the stride it would choose, then the display GPU driver can choose a
   stride satisfying the constraints on both sides.

Maybe there are other possible approaches I'm missing? Other comments?


Relevant proposed new protocol requests quoted below for the benefit of 
mesa-dev readers:



@@ -199,6 +202,182 @@ The name of this extension is "DRI3"
associated with a direct rendering device that 'fence' can
work with, otherwise a Match error results.
  
+┌───

+DRI3GetSupportedFormats
+   window: WINDOW
+  ▶
+   num_formats: CARD32
+   formats: ListOfCARD32
+└───
+   Errors: Window, Match
+
+   For the Screen associated with 'window', return a list of
+   supported DRM FourCC formats, as defined in drm_fourcc.h,
+   supported as formats for DRI3 pixmap/buffer interchange.
+   The length of the list, in number of CARD32 elements,
+   is returned in 'num_formats'.
+
+┌───
+DRI3GetSupportedModifiers
+   window: WINDOW
+   format: CARD32
+  ▶
+   num_modifiers: CARD32
+   modifiers: ListOfCARD32
+└───
+   Errors: Window, Match
+
+   For the Screen associated with 'window', return a list of
+   supported DRM FourCC modifiers, as defined in drm_fourcc.h,
+   supported as formats for DRI3 pixmap/buffer interchange.
+   Each modifier is returned as returned as a CARD32
+   containing the most significant 32 bits, followed by a
+   CARD32 containing the least significant 32 bits. The hi/lo
+   pattern repeats 'num_modifiers' times, thus there are
+   '2 * num_modifiers' CARD32 elements returned.
+
+┌───
+DRI3PixmapFromBuffers
+   pixmap: PIXMAP
+   drawable: DRAWABLE
+   num_buffers: CARD8
+   width, height: CARD16
+   stride0, offset0: CARD32
+   stride1, offset1: CARD32
+   stride2, offset2: CARD32
+   stride3, offset3: CARD32
+   format, modifier_hi, modifier_lo: CARD32
+   buffers: ListOfFD
+└───
+   Errors: Alloc, Drawable, IDChoice, Value, Match
+
+   Creates a pixmap for the direct rendering object associated
+   with 'buffers'. Changes to pixmap will be visible in that
+   direct rendered object and changes to the direct rendered
+   object will be visible in the pixmap.
+
+   In contrast to PixmapFromBuffers, multiple buffers may be
+   combined to specify a single logical source for pixel
+   sampling: 'num_buffers' may be set from 1 (single buffer,
+   akin to PixmapFromBuffer) to 4. This is the number of file
+   descriptors which will be sent with this request; one per
+   buffer.
+   
+   The exact configuration of the buffer is specified by 'format',
+   a DRM FourCC format token as defined in that project's
+   drm_fourcc.h header, in combination with the modifier.
+
+   Modifiers allow explicit specification of non-linear sources,
+   such as tiled or compressed buffers. 'modifier_hi' (the most
+   significant 32 bits of a 64-bit value) and 'modifier_lo' are
+   combined to produce a single DRM format modifier token, again
+   as defined in drm_fourcc.h. The combination of format and
+   modifier allows unambiguous declaration of the buffer layout
+   in a manner defined by the DRM tokens.
+
+   DRM_FORMAT_MOD_INVALID may be passed for 'modifier', in which
+   case the driver may make its own inference as to the exact
+   layout of the buffer(s).
+
+   'width' and 'height' describe the geometry (in pixels) of the
+   logical pixel-sample source.
+
+   'strideN' and 'offsetN' define the number of bytes per logical
+   scanline, and the distance in bytes from the beginning of the
+   buffer passed for that plane until the start of the sample
+   source for that plane, respectively for plane N. If the plane
+   is not used