Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
Hi Daniel, On 28.07.2017 12:46, Daniel Stone wrote: On 28 July 2017 at 10:24, Nicolai Hähnle wrote: On 28.07.2017 09:44, Daniel Stone wrote: No, I don't think it is. Tiled layouts still have a stride: if you look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an auxiliary compression/fast-clear buffer), iMX/etnaviv tiled/supertiled, or VC4 T-tiled modifiers and how they're handled both for DRIImage and KMS interchange, they all specify a stride which is conceptually the same as linear, if you imagine linear to be 1x1 tiled. Most tiling users accept any integer units of tiles (buffer width aligned to tile width), but not always. The NV12MT format used by Samsung media decoders (also shipped in some Qualcomm SoCs) is particularly special, IIRC requiring height to be aligned to a multiple of two tiles. Fair enough, but I think you need to distinguish between the chosen stride and stride *requirements*. I do think it makes sense to consider the stride requirement as part of the format/layout description, but more below. Right. Stride is a property of one buffer, stride requirements are a property of the users of that buffer (GPU, display control, media encode, etc). The requirements also depend on use, e.g. trying to do rotation through your scanout engine can change those requirements. Right. It definitely seems attractive to kill two birds with one stone, but I'd really much rather not conflate format description/advertisement, and allocation restriction, into one enum. I'm still on the side of saying that this is a problem modifiers do not solve, deferring to the allocator we need anyway in order to determine things like placement and global optimality (e.g. rotated scanout placing further restrictions on allocation). Okay, the original issue here is that the allocator *cannot* determine the alignment requirement in the use case that prompted this sub-thread. The use case is PRIME off-loading, where the rendering GPU supports linear layouts with a 64 byte stride, while the display GPU requires a 256 byte stride. The allocator *cannot* solve this issue, because the allocation happens on the rendering GPU. We need to communicate somehow what the display GPU's stride requirements are. How do you propose to do that? The allocator[0] in itself can't magically reach across processes to determine desired usage and resolve dependencies. But the entire design behind it was to be able to solve cross-device usage: between GPU and scanout, between both of those and media encode/decode, etc. Obviously it can't do that without help, so winsys will need to gain protocol in order to express those in terms the allocator will understand. The idea was to split information into positive capabilities and negative constraints. Modifier queries fall into the same boat as format queries: you're expressing an additional capability ('I can speak tiled'). Stride alignment, for me, falls into a negative constraint ('linear allocations must have stride aligned to 256 bytes'). Similarly, placement constraints (VRAM possibly only accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc) are in the same boat AFAICT. So this helps solve one side of the equation, but not the other. I've been thinking about this some more, and I can see now that the changed modifier scheme that I originally proposed does not fit well into places where modifiers are used to express buffer properties (e.g. DRI3PixmapFromBuffers, DRI3BuffersFromPixmap). But I see no proposal on how to fix the issue so far. You cannot fully separate capabilities from constraints. As is, we (AMD) cannot properly implement the proposed DRI3 v1.1: what would we return in DRI3GetSupportedModifiers? The natural option is to return (at least) DRM_FORMAT_MOD_LINEAR, but that would be a lie, because we *don't* speak arbitrary linear formats. I don't think this is difficult to fix in terms of protocol, although there's plenty of opportunity for bike-shedding :) I see roughly two options: 1. Make the constraints per-modifier, and add a "constraints: ListOfCard32" (or 64) to the response to DRI3GetSupportedModifiers. We can then reserve some bits for global constraints (e.g. placement) and some bits on a per-modifier basis (e.g. stride alignment for linear). You could build constraints like DRM_CONSTRAINT_PLACEMENT_SYSTEM | DRM_CONSTRAINT_LINEAR_STRIDE_256B. 2. Make the constraints global, and add a DRI3GetConstraints protocol with the same signature as DRI3GetSupportedModifiers. We'd need vendor namespaces for the constraint defines, to support constraints that are specific to vendor-specific modifiers. You could have entries like DRM_CONSTRAINT_PLACEMENT(DRM_CONSTRAINT_PLACEMENT_SYSTEM) and, as a separate list entry, DRM_CONSTRAINT_LINEAR_STRIDE(256). Cheers, Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/m
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
Hi, On 28 July 2017 at 10:24, Nicolai Hähnle wrote: > On 28.07.2017 09:44, Daniel Stone wrote: >> No, I don't think it is. Tiled layouts still have a stride: if you >> look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an >> auxiliary compression/fast-clear buffer), iMX/etnaviv >> tiled/supertiled, or VC4 T-tiled modifiers and how they're handled >> both for DRIImage and KMS interchange, they all specify a stride which >> is conceptually the same as linear, if you imagine linear to be 1x1 >> tiled. >> >> Most tiling users accept any integer units of tiles (buffer width >> aligned to tile width), but not always. The NV12MT format used by >> Samsung media decoders (also shipped in some Qualcomm SoCs) is >> particularly special, IIRC requiring height to be aligned to a >> multiple of two tiles. > > Fair enough, but I think you need to distinguish between the chosen stride > and stride *requirements*. I do think it makes sense to consider the stride > requirement as part of the format/layout description, but more below. Right. Stride is a property of one buffer, stride requirements are a property of the users of that buffer (GPU, display control, media encode, etc). The requirements also depend on use, e.g. trying to do rotation through your scanout engine can change those requirements. >> It definitely seems attractive to kill two birds with one stone, but >> I'd really much rather not conflate format description/advertisement, >> and allocation restriction, into one enum. I'm still on the side of >> saying that this is a problem modifiers do not solve, deferring to the >> allocator we need anyway in order to determine things like placement >> and global optimality (e.g. rotated scanout placing further >> restrictions on allocation). > > Okay, the original issue here is that the allocator *cannot* determine the > alignment requirement in the use case that prompted this sub-thread. > > The use case is PRIME off-loading, where the rendering GPU supports linear > layouts with a 64 byte stride, while the display GPU requires a 256 byte > stride. > > The allocator *cannot* solve this issue, because the allocation happens on > the rendering GPU. We need to communicate somehow what the display GPU's > stride requirements are. > > How do you propose to do that? The allocator[0] in itself can't magically reach across processes to determine desired usage and resolve dependencies. But the entire design behind it was to be able to solve cross-device usage: between GPU and scanout, between both of those and media encode/decode, etc. Obviously it can't do that without help, so winsys will need to gain protocol in order to express those in terms the allocator will understand. The idea was to split information into positive capabilities and negative constraints. Modifier queries fall into the same boat as format queries: you're expressing an additional capability ('I can speak tiled'). Stride alignment, for me, falls into a negative constraint ('linear allocations must have stride aligned to 256 bytes'). Similarly, placement constraints (VRAM possibly only accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc) are in the same boat AFAICT. So this helps solve one side of the equation, but not the other. Cheers, Daniel [0]: Read as, 'the design we discussed for the allocator at XDC'. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 28.07.2017 09:44, Daniel Stone wrote: Hi Nicolai, Trying to tackle the stride subthread in one go ... On 25 July 2017 at 09:28, Nicolai Hähnle wrote: On 22.07.2017 14:00, Daniel Stone wrote: On 21 July 2017 at 18:32, Michel Dänzer wrote: We just ran into an issue which might mean that there's still something missing in this v2 proposal: The context is DRI3 PRIME render offloading of glxgears (not useful in practice, but bear with me). The display GPU is Raven Ridge, which requires that the stride even of linear textures is a multiple of 256 bytes. The renderer GPU is Polaris12, which still supports smaller alignment of the stride. With the glxgears window of width 300, the renderer GPU driver chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would require 320 (* 4 / 256 = 5). This cannot work. The obvious answer is just to increase padding on external/winsys surfaces? Increasing it for all allocations would probably be a non-starter, but winsys surfaces are rare enough that you could probably afford to take the hit, I guess. I see two basic approaches to solve this: [...] Maybe there are other possible approaches I'm missing? Other comments? I don't have any great solution off the top of my head, but I'd be inclined to bundle stride in with placement. TTBOMK (from having looked at radv), buffers shared cross-GPU also need to be allocated from a separate externally-visible memory heap. And at the moment, lacking placement information at allocation time (at least for EGL allocations, via DRIImage), that happens via transparent migration at import time I think. Placement restrictions would probably also involve communicating base address alignment requirements. Stride isn't really in the same category as placement and base address alignment, though. Placement and base address alignment requirements can apply to all possible texture layouts, while the concept of stride is specific to linear layouts. No, I don't think it is. Tiled layouts still have a stride: if you look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an auxiliary compression/fast-clear buffer), iMX/etnaviv tiled/supertiled, or VC4 T-tiled modifiers and how they're handled both for DRIImage and KMS interchange, they all specify a stride which is conceptually the same as linear, if you imagine linear to be 1x1 tiled. Most tiling users accept any integer units of tiles (buffer width aligned to tile width), but not always. The NV12MT format used by Samsung media decoders (also shipped in some Qualcomm SoCs) is particularly special, IIRC requiring height to be aligned to a multiple of two tiles. Fair enough, but I think you need to distinguish between the chosen stride and stride *requirements*. I do think it makes sense to consider the stride requirement as part of the format/layout description, but more below. Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. Perhaps a third option would be to encode the required pitch_in_bytes alignment as part of the modifier? So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when the display GPU is a Raven Ridge. More generally, we could say that fourcc_mod_code(NONE, k) means that the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you prefer, we could have a stride requirement in elements or pixels instead of bytes. I've been thinking this over for the past couple of days, and we could make it work in clients. But we already have DRM_FORMAT_MOD_LINEAR with a well-defined meaning, implemented in quite a few places. I think special-casing linear to encode stride alignment is something we'd regret in the long run, especially given that some hardware _does_ have more strict alignment requirements for tiled modes than just an integer number of tiles allocated. AFAICT, both AMD and NVIDIA are both going to use a fair bit of the tiling enum space to encode tile size as well as layout. If allocation alignment requirements (in both dimensions) needed to be added to that, it's entirely likely that there wouldn't be enough space and you'd need to put it somewhere else than the modifier, in which case we've not even really solved the problem. At least for AMD, the alignment requirements are de facto part of the tiling description, so I don't think it makes a difference. It definitely seems attractive to kill two birds with one stone, but I'd really much rather not conflate format description/advertisement, and allocation restriction, into one enum. I'm still on the side of saying that this is a problem modifiers do not solve, deferring to the allocator we need anyway
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
Hi Nicolai, Trying to tackle the stride subthread in one go ... On 25 July 2017 at 09:28, Nicolai Hähnle wrote: > On 22.07.2017 14:00, Daniel Stone wrote: >> On 21 July 2017 at 18:32, Michel Dänzer wrote: >>> We just ran into an issue which might mean that there's still something >>> missing in this v2 proposal: >>> >>> The context is DRI3 PRIME render offloading of glxgears (not useful in >>> practice, but bear with me). The display GPU is Raven Ridge, which >>> requires >>> that the stride even of linear textures is a multiple of 256 bytes. The >>> renderer GPU is Polaris12, which still supports smaller alignment of the >>> stride. With the glxgears window of width 300, the renderer GPU driver >>> chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would >>> require 320 (* 4 / 256 = 5). This cannot work. >> >> >> The obvious answer is just to increase padding on external/winsys >> surfaces? Increasing it for all allocations would probably be a >> non-starter, but winsys surfaces are rare enough that you could >> probably afford to take the hit, I guess. >> >>> I see two basic approaches to solve this: >>> [...] >>> Maybe there are other possible approaches I'm missing? Other comments? >> >> I don't have any great solution off the top of my head, but I'd be >> inclined to bundle stride in with placement. TTBOMK (from having >> looked at radv), buffers shared cross-GPU also need to be allocated >> from a separate externally-visible memory heap. And at the moment, >> lacking placement information at allocation time (at least for EGL >> allocations, via DRIImage), that happens via transparent migration at >> import time I think. Placement restrictions would probably also >> involve communicating base address alignment requirements. > > Stride isn't really in the same category as placement and base address > alignment, though. > > Placement and base address alignment requirements can apply to all possible > texture layouts, while the concept of stride is specific to linear layouts. No, I don't think it is. Tiled layouts still have a stride: if you look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an auxiliary compression/fast-clear buffer), iMX/etnaviv tiled/supertiled, or VC4 T-tiled modifiers and how they're handled both for DRIImage and KMS interchange, they all specify a stride which is conceptually the same as linear, if you imagine linear to be 1x1 tiled. Most tiling users accept any integer units of tiles (buffer width aligned to tile width), but not always. The NV12MT format used by Samsung media decoders (also shipped in some Qualcomm SoCs) is particularly special, IIRC requiring height to be aligned to a multiple of two tiles. >> Given that, I'm fairly inclined to punt those until we have the grand >> glorious allocator, rather than trying to add it to EGL/GBM >> separately. The modifiers stuff was a fairly obvious augmentation - >> EGL already had no-modifier format import but no query as to which >> formats it would accept, and modifiers are a logical extension of >> format - but adding the other restrictions is a bigger step forward. > > Perhaps a third option would be to encode the required pitch_in_bytes > alignment as part of the modifier? > > So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when > the display GPU is a Raven Ridge. > > More generally, we could say that fourcc_mod_code(NONE, k) means that the > pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you > prefer, we could have a stride requirement in elements or pixels instead of > bytes. I've been thinking this over for the past couple of days, and we could make it work in clients. But we already have DRM_FORMAT_MOD_LINEAR with a well-defined meaning, implemented in quite a few places. I think special-casing linear to encode stride alignment is something we'd regret in the long run, especially given that some hardware _does_ have more strict alignment requirements for tiled modes than just an integer number of tiles allocated. AFAICT, both AMD and NVIDIA are both going to use a fair bit of the tiling enum space to encode tile size as well as layout. If allocation alignment requirements (in both dimensions) needed to be added to that, it's entirely likely that there wouldn't be enough space and you'd need to put it somewhere else than the modifier, in which case we've not even really solved the problem. It definitely seems attractive to kill two birds with one stone, but I'd really much rather not conflate format description/advertisement, and allocation restriction, into one enum. I'm still on the side of saying that this is a problem modifiers do not solve, deferring to the allocator we need anyway in order to determine things like placement and global optimality (e.g. rotated scanout placing further restrictions on allocation). Cheers, Daniel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.fr
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 27/07/17 04:08 PM, Nicolai Hähnle wrote: > On 27.07.2017 03:14, Michel Dänzer wrote: >> On 26/07/17 09:15 PM, Nicolai Hähnle wrote: >>> On 26.07.2017 08:29, Michel Dänzer wrote: On 25/07/17 05:28 PM, Nicolai Hähnle wrote: > On 22.07.2017 14:00, Daniel Stone wrote: >> >> Given that, I'm fairly inclined to punt those until we have the grand >> glorious allocator, rather than trying to add it to EGL/GBM >> separately. The modifiers stuff was a fairly obvious augmentation - >> EGL already had no-modifier format import but no query as to which >> formats it would accept, and modifiers are a logical extension of >> format - but adding the other restrictions is a bigger step forward. > > Perhaps a third option would be to encode the required pitch_in_bytes > alignment as part of the modifier? > > So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B > when the display GPU is a Raven Ridge. > > More generally, we could say that fourcc_mod_code(NONE, k) means that > the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. > Or if > you prefer, we could have a stride requirement in elements or pixels > instead of bytes. Interesting idea. FWIW, I suspect we'd need to support specifying the requirement in both bytes or pixels, since one or the other alone may not be sufficient to describe the constraints of all hardware. >>> >>> From what I've seen, modifiers are always specified together with one >>> specific format, so the bytes-per-pixel are always known, so I don't >>> think we need both. >> >> The proposal adds two DRI3 extension requests for querying the list of >> supported formats and modifiers, respectively. This suggests that the >> supported formats and modifiers can be freely combined. > > Which are these? I only saw DRI3GetSupportedModifiers, which takes both > a window and a format argument. You're right, I missed the format argument. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 27.07.2017 03:14, Michel Dänzer wrote: On 26/07/17 09:15 PM, Nicolai Hähnle wrote: On 26.07.2017 08:29, Michel Dänzer wrote: On 25/07/17 05:28 PM, Nicolai Hähnle wrote: On 22.07.2017 14:00, Daniel Stone wrote: Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. Perhaps a third option would be to encode the required pitch_in_bytes alignment as part of the modifier? So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when the display GPU is a Raven Ridge. More generally, we could say that fourcc_mod_code(NONE, k) means that the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you prefer, we could have a stride requirement in elements or pixels instead of bytes. Interesting idea. FWIW, I suspect we'd need to support specifying the requirement in both bytes or pixels, since one or the other alone may not be sufficient to describe the constraints of all hardware. From what I've seen, modifiers are always specified together with one specific format, so the bytes-per-pixel are always known, so I don't think we need both. The proposal adds two DRI3 extension requests for querying the list of supported formats and modifiers, respectively. This suggests that the supported formats and modifiers can be freely combined. Which are these? I only saw DRI3GetSupportedModifiers, which takes both a window and a format argument. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 26/07/17 09:15 PM, Nicolai Hähnle wrote: > On 26.07.2017 08:29, Michel Dänzer wrote: >> On 25/07/17 05:28 PM, Nicolai Hähnle wrote: >>> On 22.07.2017 14:00, Daniel Stone wrote: Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. >>> >>> Perhaps a third option would be to encode the required pitch_in_bytes >>> alignment as part of the modifier? >>> >>> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B >>> when the display GPU is a Raven Ridge. >>> >>> More generally, we could say that fourcc_mod_code(NONE, k) means that >>> the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if >>> you prefer, we could have a stride requirement in elements or pixels >>> instead of bytes. >> >> Interesting idea. FWIW, I suspect we'd need to support specifying the >> requirement in both bytes or pixels, since one or the other alone may >> not be sufficient to describe the constraints of all hardware. > > From what I've seen, modifiers are always specified together with one > specific format, so the bytes-per-pixel are always known, so I don't > think we need both. The proposal adds two DRI3 extension requests for querying the list of supported formats and modifiers, respectively. This suggests that the supported formats and modifiers can be freely combined. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 26.07.2017 19:42, Marek Olšák wrote: On Wed, Jul 26, 2017 at 7:05 PM, Alex Deucher wrote: On Wed, Jul 26, 2017 at 8:15 AM, Nicolai Hähnle wrote: On 26.07.2017 08:29, Michel Dänzer wrote: On 25/07/17 05:28 PM, Nicolai Hähnle wrote: On 22.07.2017 14:00, Daniel Stone wrote: I don't have any great solution off the top of my head, but I'd be inclined to bundle stride in with placement. TTBOMK (from having looked at radv), buffers shared cross-GPU also need to be allocated from a separate externally-visible memory heap. And at the moment, lacking placement information at allocation time (at least for EGL allocations, via DRIImage), that happens via transparent migration at import time I think. Placement restrictions would probably also involve communicating base address alignment requirements. Stride isn't really in the same category as placement and base address alignment, though. Placement and base address alignment requirements can apply to all possible texture layouts, while the concept of stride is specific to linear layouts. Also, the starting address of shareable buffers is generally aligned to at least the CPU page size anyway. Do we know of any cases requiring higher alignment than that, and if so, which address space does the requirement apply to? Only tiling modes, as Marek mentioned. We don't do tiling shares across different GPUs right now. Maybe we can do it in the future with gfx9 GPUs. But then the alignment requirements should be known on both sides based on the tiling mode anyway -- if they even apply for non-VRAM textures. We should be able to do some 1D tiling modes. That doesn't have any per sku alignment dependencies. Yeah, I think 1D tiling for displayable 32bpp is compatible across all radeon GPUs newer than R600. All non-X non-VAR tiling modes on Radeon/GFX9 (Vega, Raven) are the same on all GFX9 GPUs and might be the same on all future products. The only catch is that X modes are better optimized for the memory config, so non-X modes can be slower. I think the non-X modes might also be compatible with Intel (the first 12 at least), so some cross-vendor interface might be possible. All GFX9 tiling modes: Right. It might be worth it to try to use some of these tiling modes to make PRIME a bit more efficient in some cases. Non-X modes may be non-optimal, but certainly they're better than linear :) Cheers, Nicolai SW_LINEAR (256B pitch alignment) SW_256B_S SW_256B_D (compatible with older Radeons if bpp == 32) SW_256B_R (compatible with older Radeons if bpp == 32) SW_4KB_Z (Z = depth/stencil sample order) SW_4KB_S (S = standard) SW_4KB_D (D = displayable) SW_4KB_R (R = displayable rotated) SW_64KB_Z SW_64KB_S SW_64KB_D SW_64KB_R SW_VAR_Z (VAR = tile size depends on memory config) SW_VAR_S SW_VAR_D SW_VAR_R SW_64KB_Z_T SW_64KB_S_T SW_64KB_D_T SW_64KB_R_T SW_4KB_Z_X (X = optimized for memory config) SW_4KB_S_X SW_4KB_D_X SW_4KB_R_X SW_64KB_Z_X SW_64KB_S_X SW_64KB_D_X SW_64KB_R_X SW_VAR_Z_X SW_VAR_S_X SW_VAR_D_X SW_VAR_R_X Marek -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On Wed, Jul 26, 2017 at 8:15 AM, Nicolai Hähnle wrote: > On 26.07.2017 08:29, Michel Dänzer wrote: >> >> On 25/07/17 05:28 PM, Nicolai Hähnle wrote: >>> >>> On 22.07.2017 14:00, Daniel Stone wrote: I don't have any great solution off the top of my head, but I'd be inclined to bundle stride in with placement. TTBOMK (from having looked at radv), buffers shared cross-GPU also need to be allocated from a separate externally-visible memory heap. And at the moment, lacking placement information at allocation time (at least for EGL allocations, via DRIImage), that happens via transparent migration at import time I think. Placement restrictions would probably also involve communicating base address alignment requirements. >>> >>> >>> Stride isn't really in the same category as placement and base address >>> alignment, though. >>> >>> Placement and base address alignment requirements can apply to all >>> possible texture layouts, while the concept of stride is specific to >>> linear layouts. >> >> >> Also, the starting address of shareable buffers is generally aligned to >> at least the CPU page size anyway. Do we know of any cases requiring >> higher alignment than that, and if so, which address space does the >> requirement apply to? > > > Only tiling modes, as Marek mentioned. We don't do tiling shares across > different GPUs right now. > > Maybe we can do it in the future with gfx9 GPUs. But then the alignment > requirements should be known on both sides based on the tiling mode anyway > -- if they even apply for non-VRAM textures. We should be able to do some 1D tiling modes. That doesn't have any per sku alignment dependencies. Alex > > Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. >>> >>> >>> Perhaps a third option would be to encode the required pitch_in_bytes >>> alignment as part of the modifier? >>> >>> So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B >>> when the display GPU is a Raven Ridge. >>> >>> More generally, we could say that fourcc_mod_code(NONE, k) means that >>> the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if >>> you prefer, we could have a stride requirement in elements or pixels >>> instead of bytes. >> >> >> Interesting idea. FWIW, I suspect we'd need to support specifying the >> requirement in both bytes or pixels, since one or the other alone may >> not be sufficient to describe the constraints of all hardware. > > > From what I've seen, modifiers are always specified together with one > specific format, so the bytes-per-pixel are always known, so I don't think > we need both. Specifying it in bytes is a bit more natural for our hardware, > that's all. > > Cheers, > Nicolai > -- > Lerne, wie die Welt wirklich ist, > Aber vergiss niemals, wie sie sein sollte. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 26.07.2017 08:29, Michel Dänzer wrote: On 25/07/17 05:28 PM, Nicolai Hähnle wrote: On 22.07.2017 14:00, Daniel Stone wrote: I don't have any great solution off the top of my head, but I'd be inclined to bundle stride in with placement. TTBOMK (from having looked at radv), buffers shared cross-GPU also need to be allocated from a separate externally-visible memory heap. And at the moment, lacking placement information at allocation time (at least for EGL allocations, via DRIImage), that happens via transparent migration at import time I think. Placement restrictions would probably also involve communicating base address alignment requirements. Stride isn't really in the same category as placement and base address alignment, though. Placement and base address alignment requirements can apply to all possible texture layouts, while the concept of stride is specific to linear layouts. Also, the starting address of shareable buffers is generally aligned to at least the CPU page size anyway. Do we know of any cases requiring higher alignment than that, and if so, which address space does the requirement apply to? Only tiling modes, as Marek mentioned. We don't do tiling shares across different GPUs right now. Maybe we can do it in the future with gfx9 GPUs. But then the alignment requirements should be known on both sides based on the tiling mode anyway -- if they even apply for non-VRAM textures. Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. Perhaps a third option would be to encode the required pitch_in_bytes alignment as part of the modifier? So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when the display GPU is a Raven Ridge. More generally, we could say that fourcc_mod_code(NONE, k) means that the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you prefer, we could have a stride requirement in elements or pixels instead of bytes. Interesting idea. FWIW, I suspect we'd need to support specifying the requirement in both bytes or pixels, since one or the other alone may not be sufficient to describe the constraints of all hardware. From what I've seen, modifiers are always specified together with one specific format, so the bytes-per-pixel are always known, so I don't think we need both. Specifying it in bytes is a bit more natural for our hardware, that's all. Cheers, Nicolai -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On Wed, Jul 26, 2017 at 8:29 AM, Michel Dänzer wrote: > On 25/07/17 05:28 PM, Nicolai Hähnle wrote: >> On 22.07.2017 14:00, Daniel Stone wrote: >>> >>> I don't have any great solution off the top of my head, but I'd be >>> inclined to bundle stride in with placement. TTBOMK (from having >>> looked at radv), buffers shared cross-GPU also need to be allocated >>> from a separate externally-visible memory heap. And at the moment, >>> lacking placement information at allocation time (at least for EGL >>> allocations, via DRIImage), that happens via transparent migration at >>> import time I think. Placement restrictions would probably also >>> involve communicating base address alignment requirements. >> >> Stride isn't really in the same category as placement and base address >> alignment, though. >> >> Placement and base address alignment requirements can apply to all >> possible texture layouts, while the concept of stride is specific to >> linear layouts. > > Also, the starting address of shareable buffers is generally aligned to > at least the CPU page size anyway. Do we know of any cases requiring > higher alignment than that, and if so, which address space does the > requirement apply to? The highest base address alignment I know of is 2D tiling on Fiji = 256KB alignment. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 25/07/17 05:28 PM, Nicolai Hähnle wrote: > On 22.07.2017 14:00, Daniel Stone wrote: >> >> I don't have any great solution off the top of my head, but I'd be >> inclined to bundle stride in with placement. TTBOMK (from having >> looked at radv), buffers shared cross-GPU also need to be allocated >> from a separate externally-visible memory heap. And at the moment, >> lacking placement information at allocation time (at least for EGL >> allocations, via DRIImage), that happens via transparent migration at >> import time I think. Placement restrictions would probably also >> involve communicating base address alignment requirements. > > Stride isn't really in the same category as placement and base address > alignment, though. > > Placement and base address alignment requirements can apply to all > possible texture layouts, while the concept of stride is specific to > linear layouts. Also, the starting address of shareable buffers is generally aligned to at least the CPU page size anyway. Do we know of any cases requiring higher alignment than that, and if so, which address space does the requirement apply to? >> Given that, I'm fairly inclined to punt those until we have the grand >> glorious allocator, rather than trying to add it to EGL/GBM >> separately. The modifiers stuff was a fairly obvious augmentation - >> EGL already had no-modifier format import but no query as to which >> formats it would accept, and modifiers are a logical extension of >> format - but adding the other restrictions is a bigger step forward. > > Perhaps a third option would be to encode the required pitch_in_bytes > alignment as part of the modifier? > > So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B > when the display GPU is a Raven Ridge. > > More generally, we could say that fourcc_mod_code(NONE, k) means that > the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if > you prefer, we could have a stride requirement in elements or pixels > instead of bytes. Interesting idea. FWIW, I suspect we'd need to support specifying the requirement in both bytes or pixels, since one or the other alone may not be sufficient to describe the constraints of all hardware. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 22.07.2017 14:00, Daniel Stone wrote: On 21 July 2017 at 18:32, Michel Dänzer wrote: On 20/07/17 01:08 PM, Daniel Stone wrote: DRI3 version 1.1 adds support for explicit format modifiers, including multi-planar buffers. Adding mesa-dev, Nicolai and Marek. We just ran into an issue which might mean that there's still something missing in this v2 proposal: The context is DRI3 PRIME render offloading of glxgears (not useful in practice, but bear with me). The display GPU is Raven Ridge, which requires that the stride even of linear textures is a multiple of 256 bytes. The renderer GPU is Polaris12, which still supports smaller alignment of the stride. With the glxgears window of width 300, the renderer GPU driver chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would require 320 (* 4 / 256 = 5). This cannot work. The obvious answer is just to increase padding on external/winsys surfaces? Increasing it for all allocations would probably be a non-starter, but winsys surfaces are rare enough that you could probably afford to take the hit, I guess. I see two basic approaches to solve this: 1. A protocol request for the client to retrieve the display GPU constraints on the stride (and possibly other parameters) for a given format and modifier. + corresponding new EGL request and new GBM/KMS API :\ 2. A protocol request which allows the creation of a pixmap with given format and modifier. The renderer GPU driver needs to pass in the stride it would choose, then the display GPU driver can choose a stride satisfying the constraints on both sides. Heh, that sounds familiar - DRI2! Maybe there are other possible approaches I'm missing? Other comments? I don't have any great solution off the top of my head, but I'd be inclined to bundle stride in with placement. TTBOMK (from having looked at radv), buffers shared cross-GPU also need to be allocated from a separate externally-visible memory heap. And at the moment, lacking placement information at allocation time (at least for EGL allocations, via DRIImage), that happens via transparent migration at import time I think. Placement restrictions would probably also involve communicating base address alignment requirements. Stride isn't really in the same category as placement and base address alignment, though. Placement and base address alignment requirements can apply to all possible texture layouts, while the concept of stride is specific to linear layouts. Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. Perhaps a third option would be to encode the required pitch_in_bytes alignment as part of the modifier? So DRI3GetSupportedModifiers would return DRM_FORMAT_MOD_LINEAR_256B when the display GPU is a Raven Ridge. More generally, we could say that fourcc_mod_code(NONE, k) means that the pitch_in_bytes has to be a multiple of 2**k for e.g. k <= 31. Or if you prefer, we could have a stride requirement in elements or pixels instead of bytes. Cheers, Nicolai Cheers, Daniel -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
Hi Michel, On 21 July 2017 at 18:32, Michel Dänzer wrote: > On 20/07/17 01:08 PM, Daniel Stone wrote: >> DRI3 version 1.1 adds support for explicit format modifiers, including >> multi-planar buffers. > > Adding mesa-dev, Nicolai and Marek. > > We just ran into an issue which might mean that there's still something > missing in this v2 proposal: > > The context is DRI3 PRIME render offloading of glxgears (not useful in > practice, but bear with me). The display GPU is Raven Ridge, which requires > that the stride even of linear textures is a multiple of 256 bytes. The > renderer GPU is Polaris12, which still supports smaller alignment of the > stride. With the glxgears window of width 300, the renderer GPU driver > chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would > require 320 (* 4 / 256 = 5). This cannot work. The obvious answer is just to increase padding on external/winsys surfaces? Increasing it for all allocations would probably be a non-starter, but winsys surfaces are rare enough that you could probably afford to take the hit, I guess. > I see two basic approaches to solve this: > > 1. A protocol request for the client to retrieve the display >GPU constraints on the stride (and possibly other parameters) for a >given format and modifier. + corresponding new EGL request and new GBM/KMS API :\ > 2. A protocol request which allows the creation of a pixmap with >given format and modifier. The renderer GPU driver needs to pass in >the stride it would choose, then the display GPU driver can choose a >stride satisfying the constraints on both sides. Heh, that sounds familiar - DRI2! > Maybe there are other possible approaches I'm missing? Other comments? I don't have any great solution off the top of my head, but I'd be inclined to bundle stride in with placement. TTBOMK (from having looked at radv), buffers shared cross-GPU also need to be allocated from a separate externally-visible memory heap. And at the moment, lacking placement information at allocation time (at least for EGL allocations, via DRIImage), that happens via transparent migration at import time I think. Placement restrictions would probably also involve communicating base address alignment requirements. Given that, I'm fairly inclined to punt those until we have the grand glorious allocator, rather than trying to add it to EGL/GBM separately. The modifiers stuff was a fairly obvious augmentation - EGL already had no-modifier format import but no query as to which formats it would accept, and modifiers are a logical extension of format - but adding the other restrictions is a bigger step forward. Cheers, Daniel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH dri3proto v2] Add modifier/multi-plane requests, bump to v1.1
On 20/07/17 01:08 PM, Daniel Stone wrote: DRI3 version 1.1 adds support for explicit format modifiers, including multi-planar buffers. Adding mesa-dev, Nicolai and Marek. We just ran into an issue which might mean that there's still something missing in this v2 proposal: The context is DRI3 PRIME render offloading of glxgears (not useful in practice, but bear with me). The display GPU is Raven Ridge, which requires that the stride even of linear textures is a multiple of 256 bytes. The renderer GPU is Polaris12, which still supports smaller alignment of the stride. With the glxgears window of width 300, the renderer GPU driver chooses a stride of 304 (* 4 / 256 = 4.75), whereas the display GPU would require 320 (* 4 / 256 = 5). This cannot work. I see two basic approaches to solve this: 1. A protocol request for the client to retrieve the display GPU constraints on the stride (and possibly other parameters) for a given format and modifier. 2. A protocol request which allows the creation of a pixmap with given format and modifier. The renderer GPU driver needs to pass in the stride it would choose, then the display GPU driver can choose a stride satisfying the constraints on both sides. Maybe there are other possible approaches I'm missing? Other comments? Relevant proposed new protocol requests quoted below for the benefit of mesa-dev readers: @@ -199,6 +202,182 @@ The name of this extension is "DRI3" associated with a direct rendering device that 'fence' can work with, otherwise a Match error results. +┌─── +DRI3GetSupportedFormats + window: WINDOW + ▶ + num_formats: CARD32 + formats: ListOfCARD32 +└─── + Errors: Window, Match + + For the Screen associated with 'window', return a list of + supported DRM FourCC formats, as defined in drm_fourcc.h, + supported as formats for DRI3 pixmap/buffer interchange. + The length of the list, in number of CARD32 elements, + is returned in 'num_formats'. + +┌─── +DRI3GetSupportedModifiers + window: WINDOW + format: CARD32 + ▶ + num_modifiers: CARD32 + modifiers: ListOfCARD32 +└─── + Errors: Window, Match + + For the Screen associated with 'window', return a list of + supported DRM FourCC modifiers, as defined in drm_fourcc.h, + supported as formats for DRI3 pixmap/buffer interchange. + Each modifier is returned as returned as a CARD32 + containing the most significant 32 bits, followed by a + CARD32 containing the least significant 32 bits. The hi/lo + pattern repeats 'num_modifiers' times, thus there are + '2 * num_modifiers' CARD32 elements returned. + +┌─── +DRI3PixmapFromBuffers + pixmap: PIXMAP + drawable: DRAWABLE + num_buffers: CARD8 + width, height: CARD16 + stride0, offset0: CARD32 + stride1, offset1: CARD32 + stride2, offset2: CARD32 + stride3, offset3: CARD32 + format, modifier_hi, modifier_lo: CARD32 + buffers: ListOfFD +└─── + Errors: Alloc, Drawable, IDChoice, Value, Match + + Creates a pixmap for the direct rendering object associated + with 'buffers'. Changes to pixmap will be visible in that + direct rendered object and changes to the direct rendered + object will be visible in the pixmap. + + In contrast to PixmapFromBuffers, multiple buffers may be + combined to specify a single logical source for pixel + sampling: 'num_buffers' may be set from 1 (single buffer, + akin to PixmapFromBuffer) to 4. This is the number of file + descriptors which will be sent with this request; one per + buffer. + + The exact configuration of the buffer is specified by 'format', + a DRM FourCC format token as defined in that project's + drm_fourcc.h header, in combination with the modifier. + + Modifiers allow explicit specification of non-linear sources, + such as tiled or compressed buffers. 'modifier_hi' (the most + significant 32 bits of a 64-bit value) and 'modifier_lo' are + combined to produce a single DRM format modifier token, again + as defined in drm_fourcc.h. The combination of format and + modifier allows unambiguous declaration of the buffer layout + in a manner defined by the DRM tokens. + + DRM_FORMAT_MOD_INVALID may be passed for 'modifier', in which + case the driver may make its own inference as to the exact + layout of the buffer(s). + + 'width' and 'height' describe the geometry (in pixels) of the + logical pixel-sample source. + + 'strideN' and 'offsetN' define the number of bytes per logical + scanline, and the distance in bytes from the beginning of the + buffer passed for that plane until the start of the sample + source for that plane, respectively for plane N. If the plane + is not used