Re: [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics
On 8/3/23 08:47, Daniel Stone wrote: Hi all, This is v2 to the linked patch series; thanks to everyone for reviewing the initial version. I've moved this out of a pure DRM scope and into the general userspace-API design section. Hopefully it helps others and answers a bunch of questions. Again, thanks for writing this up. I think it is great to have all this knowledge collected in one place. For the series: Reviewed-by: James Jones I think it'd be great to have input/links/reflections from other subsystems as well here. Agreed, though I'll reiterate my comment on the v1 series from a few years ago: I hope this can be merged relatively soon with additional documentation added in follow-up patches as needed. While you can always note more interactions, details, etc., everything here appears to be correct from my understanding and is strictly an improvement over the current lack of documentation. Thanks, -James Cheers, Daniel
Re: DMA-heap driver hints
On 1/24/23 15:14, T.J. Mercier wrote: On Mon, Jan 23, 2023 at 11:49 PM Christian König wrote: Am 24.01.23 um 04:56 schrieb James Jones: On 1/23/23 08:58, Laurent Pinchart wrote: Hi Christian, On Mon, Jan 23, 2023 at 05:29:18PM +0100, Christian König wrote: Am 23.01.23 um 14:55 schrieb Laurent Pinchart: Hi Christian, CC'ing James as I think this is related to his work on the unix device memory allocator ([1]). Thank you for including me. Sorry for not having you in initially. I wasn't aware of your previous work in this area. No worries. I've embarrassingly made no progress here since the last XDC talk, so I wouldn't expect everyone to know or remember. [1] https://lore.kernel.org/dri-devel/8b555674-1c5b-c791-4547-2ea7c16ae...@nvidia.com/ On Mon, Jan 23, 2023 at 01:37:54PM +0100, Christian König wrote: Hi guys, this is just an RFC! The last time we discussed the DMA-buf coherency problem [1] we concluded that DMA-heap first needs a better way to communicate to userspace which heap to use for a certain device. As far as I know userspace currently just hard codes that information which is certainly not desirable considering that we should have this inside the kernel as well. So what those two patches here do is to first add some dma_heap_create_device_link() and dma_heap_remove_device_link() function and then demonstrating the functionality with uvcvideo driver. The preferred DMA-heap is represented with a symlink in sysfs between the device and the virtual DMA-heap device node. I'll start with a few high-level comments/questions: - Instead of tying drivers to heaps, have you considered a system where a driver would expose constraints, and a heap would then be selected based on those constraints ? A tight coupling between heaps and drivers means downstream patches to drivers in order to use vendor-specific heaps, that sounds painful. I was wondering the same thing as well, but came to the conclusion that just the other way around is the less painful approach. From a kernel point of view, sure, it's simpler and thus less painful. From the point of view of solving the whole issue, I'm not sure :-) The problem is that there are so many driver specific constrains that I don't even know where to start from. That's where I was hoping James would have some feedback for us, based on the work he did on the Unix device memory allocator. If that's not the case, we can brainstorm this from scratch. Simon Ser's and my presentation from XDC 2020 focused entirely on this. The idea was not to try to enumerate every constraint up front, but rather to develop an extensible mechanism that would be flexible enough to encapsulate many disparate types of constraints and perform set operations on them (merging sets was the only operation we tried to solve). Simon implemented a prototype header-only library to implement the mechanism: https://gitlab.freedesktop.org/emersion/drm-constraints The links to the presentation and talk are below, along with notes from the follow-up workshop. https://lpc.events/event/9/contributions/615/attachments/704/1301/XDC_2020__Allocation_Constraints.pdf https://www.youtube.com/watch?v=HZEClOP5TIk https://paste.sr.ht/~emersion/c43b30be08bab1882f1b107402074462bba3b64a Note one of the hard parts of this was figuring out how to express a device or heap within the constraint structs. One of the better ideas proposed back then was something like heap IDs, where dma heaps would each have one, We already have that. Each dma_heap has it's own unique name. Cool. and devices could register their own heaps (or even just themselves?) with the heap subsystem and be assigned a locally-unique ID that userspace could pass around. I was more considering that we expose some kind of flag noting that a certain device needs its buffer allocated from that device to utilize all use cases. This sounds similar to what you're proposing. Perhaps a reasonable identifier is a device (major, minor) pair. Such a constraint could be expressed as a symlink for easy visualization/discoverability from userspace, but might be easier to serialize over the wire as the (major, minor) pair. I'm not clear which direction is better to express this either: As a link from heap->device, or device->heap. A constraint-based system would also, I think, be easier to extend with additional constraints in the future. - I assume some drivers will be able to support multiple heaps. How do you envision this being implemented ? I don't really see an use case for this. One use case I know of here is same-vendor GPU local memory on different GPUs. NVIDIA GPUs have certain things they can only do on local memory, certain things they can do on all memory, and certain things they can only do on memory local to another NVIDIA GPU, especially when there exists an NVLink interface between the two. So they'd ideally express different constraints fo
Re: DMA-heap driver hints
On 1/23/23 08:58, Laurent Pinchart wrote: Hi Christian, On Mon, Jan 23, 2023 at 05:29:18PM +0100, Christian König wrote: Am 23.01.23 um 14:55 schrieb Laurent Pinchart: Hi Christian, CC'ing James as I think this is related to his work on the unix device memory allocator ([1]). Thank you for including me. [1] https://lore.kernel.org/dri-devel/8b555674-1c5b-c791-4547-2ea7c16ae...@nvidia.com/ On Mon, Jan 23, 2023 at 01:37:54PM +0100, Christian König wrote: Hi guys, this is just an RFC! The last time we discussed the DMA-buf coherency problem [1] we concluded that DMA-heap first needs a better way to communicate to userspace which heap to use for a certain device. As far as I know userspace currently just hard codes that information which is certainly not desirable considering that we should have this inside the kernel as well. So what those two patches here do is to first add some dma_heap_create_device_link() and dma_heap_remove_device_link() function and then demonstrating the functionality with uvcvideo driver. The preferred DMA-heap is represented with a symlink in sysfs between the device and the virtual DMA-heap device node. I'll start with a few high-level comments/questions: - Instead of tying drivers to heaps, have you considered a system where a driver would expose constraints, and a heap would then be selected based on those constraints ? A tight coupling between heaps and drivers means downstream patches to drivers in order to use vendor-specific heaps, that sounds painful. I was wondering the same thing as well, but came to the conclusion that just the other way around is the less painful approach. From a kernel point of view, sure, it's simpler and thus less painful. From the point of view of solving the whole issue, I'm not sure :-) The problem is that there are so many driver specific constrains that I don't even know where to start from. That's where I was hoping James would have some feedback for us, based on the work he did on the Unix device memory allocator. If that's not the case, we can brainstorm this from scratch. Simon Ser's and my presentation from XDC 2020 focused entirely on this. The idea was not to try to enumerate every constraint up front, but rather to develop an extensible mechanism that would be flexible enough to encapsulate many disparate types of constraints and perform set operations on them (merging sets was the only operation we tried to solve). Simon implemented a prototype header-only library to implement the mechanism: https://gitlab.freedesktop.org/emersion/drm-constraints The links to the presentation and talk are below, along with notes from the follow-up workshop. https://lpc.events/event/9/contributions/615/attachments/704/1301/XDC_2020__Allocation_Constraints.pdf https://www.youtube.com/watch?v=HZEClOP5TIk https://paste.sr.ht/~emersion/c43b30be08bab1882f1b107402074462bba3b64a Note one of the hard parts of this was figuring out how to express a device or heap within the constraint structs. One of the better ideas proposed back then was something like heap IDs, where dma heaps would each have one, and devices could register their own heaps (or even just themselves?) with the heap subsystem and be assigned a locally-unique ID that userspace could pass around. This sounds similar to what you're proposing. Perhaps a reasonable identifier is a device (major, minor) pair. Such a constraint could be expressed as a symlink for easy visualization/discoverability from userspace, but might be easier to serialize over the wire as the (major, minor) pair. I'm not clear which direction is better to express this either: As a link from heap->device, or device->heap. A constraint-based system would also, I think, be easier to extend with additional constraints in the future. - I assume some drivers will be able to support multiple heaps. How do you envision this being implemented ? I don't really see an use case for this. One use case I know of here is same-vendor GPU local memory on different GPUs. NVIDIA GPUs have certain things they can only do on local memory, certain things they can do on all memory, and certain things they can only do on memory local to another NVIDIA GPU, especially when there exists an NVLink interface between the two. So they'd ideally express different constraints for heap representing each of those. The same thing is often true of memory on remote devices that are at various points in a PCIe topology. We've had situations where we could only get enough bandwidth between two PCIe devices when they were less than some number of hops away on the PCI tree. We hard-coded logic to detect that in our userspace drivers, but we could instead expose it as a constraint on heaps that would express which devices can accomplish certain operations as pairs. Similarly to the last one, I would assume (But haven't yet run into in my personal experience)
Re: [PATCH] doc: gpu: Add document describing buffer exchange
On 9/8/21 2:44 AM, Simon Ser wrote: stride I think what's clear is: - Per-plane property - In bytes - Offset between two consecutive rows How that applies to weird YUV formats is the tricky question… Btw. there was a fun argument whether the same modifier value could mean different things on different devices. There were also arguments that a certain modifier could reference additional implicit memory on the device - memory that can only be accessed by very specific devices. I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those. A recent exmaple of this is [1]. [1]: https://patchwork.freedesktop.org/patch/452461/ What was the resolution to that argument? It took some fiddling to get the NV format modifiers to be robust enough that they actually do differentiate "identical" layouts that actually mismatch between devices (E.g., some of our SoC GPUs interpret layouts differently than our discrete GPUs, so that's reflected in the format modifier-building macro and hence applications can properly deduce that they can *not* share images directly between these devices, but can share between two similar discrete GPUs), so I hope the modifier definition allows that. Cross-device sharing using tiled formats in machines with multiple similar NV GPUs was an important use case for modifiers on our side. Thanks, -James
Re: [PATCH] doc: gpu: Add document describing buffer exchange
On 9/6/21 5:28 AM, Simon Ser wrote: Since there's a lot of confusion around this, document both the rules and the best practice around negotiating, allocating, importing, and using buffers when crossing context/process/device/subsystem boundaries. This ties up all of dmabuf, formats and modifiers, and their usage. Signed-off-by: Daniel Stone Thanks a lot for this write-up! This looks very good to me, a few comments below. Agreed, it would be awesome if this were merged somewhere. IMHO, a lot of the non-trivial/typo suggestions below could be taken care of as follow-on patches, as the content here is better in than out, even if it could be clarified a bit. Further feedback inline: --- This is just a quick first draft, inspired by: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637 It's not complete or perfect, but I'm off to eat a roast then have a nice walk in the sun, so figured it'd be better to dash it off rather than let it rot on my hard drive. .../gpu/exchanging-pixel-buffers.rst | 285 ++ Documentation/gpu/index.rst | 1 + 2 files changed, 286 insertions(+) create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst b/Documentation/gpu/exchanging-pixel-buffers.rst new file mode 100644 index ..75c4de13d5c8 --- /dev/null +++ b/Documentation/gpu/exchanging-pixel-buffers.rst @@ -0,0 +1,285 @@ +.. Copyright 2021 Collabora Ltd. + + +Exchanging pixel buffers + + +As originally designed, the Linux graphics subsystem had extremely limited +support for sharing pixel-buffer allocations between processes, devices, and +subsystems. Modern systems require extensive integration between all three +classes; this document details how applications and kernel subsystems should +approach this sharing for two-dimensional image data. + +It is written with reference to the DRM subsystem for GPU and display devices, +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace +support, however any other subsystems should also follow this design and advice. + + +Formats and modifiers += + +Each buffer must have an underlying format. This format describes the data which +can be stored and loaded for each pixel. Although each subsystem has its own +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be RST uses double backticks for inline code blocks (applies to the whole document). +reused wherever possible, as they are the standard descriptions used for +interchange. Maybe mention that the canonical source of formats and modifiers can be found in include/uapi/drm/drm_fourcc.h. +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of +the translation between one or more pixels in memory, and the color data +contained within that memory. The number and type of color channels are Pekka uses the term "color value", which I find a bit better than repeating "data". +described: whether they are RGB or YUV, integer or floating-point, the size +of each channel and their locations within the pixel memory, and the +relationship between color planes. + +For example, `DRM_FORMAT_ARGB` describes a format in which each pixel has a +single 32-bit value in memory. Alpha, red, green, and blue, color channels are +available at 8-byte precision per channel, ordered respectively from most to +least significant bits in little-endian storage. As a more complex example, +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are +stored in separate memory planes, where the chroma plane is stored at half the +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 2x2 +pixel grouping). + +Format modifiers describe a translation mechanism between these per-pixel memory +samples, and the actual memory storage for the buffer. The most straightforward +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel has +contiguous storage beginning at (0,0); each pixel's location in memory will be +`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange +format, and most convenient for CPU access. Hm, maybe in more simple terms we could explain that the pixels are stored sequentially row-by-row from the top-left corner to the bottom-right one? I wouldn't mention top-left. I'm not clear DRM_FORMAT_MOD_LINEAR excludes GL-style bottom-left-oriented images. Maybe we can drop the "base" from the formula and say that each pixel's location in memory will be at offset `y * stride + x * bpp`? Or maybe this is confusing with offset being mentioned below as an additional parameter? +Modern hardware employs much more sophisticated access mechanisms, typically +making use of tiled access and possibly also compression. For example, the
Re: [PATCH 1/3] drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes
Gah, yes, good catch. Reviewed-by: James Jones On 1/18/21 5:54 PM, Lyude Paul wrote: Nvidia hardware doesn't actually support using tiling formats with the cursor plane, only linear is allowed. In the future, we should write a testcase for this. Fixes: c586f30bf74c ("drm/nouveau/kms: Add format mod prop to base/ovly/nvdisp") Cc: James Jones Cc: Martin Peres Cc: Jeremy Cline Cc: Simon Ser Cc: # v5.8+ Signed-off-by: Lyude Paul --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ce451242f79e..271de3a63f21 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -702,6 +702,11 @@ nv50_wndw_init(struct nv50_wndw *wndw) nvif_notify_get(>notify); } +static const u64 nv50_cursor_format_modifiers[] = { + DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_INVALID, +}; + int nv50_wndw_new_(const struct nv50_wndw_func *func, struct drm_device *dev, enum drm_plane_type type, const char *name, int index, @@ -713,6 +718,7 @@ nv50_wndw_new_(const struct nv50_wndw_func *func, struct drm_device *dev, struct nvif_mmu *mmu = >client.mmu; struct nv50_disp *disp = nv50_disp(dev); struct nv50_wndw *wndw; + const u64 *format_modifiers; int nformat; int ret; @@ -728,10 +734,13 @@ nv50_wndw_new_(const struct nv50_wndw_func *func, struct drm_device *dev, for (nformat = 0; format[nformat]; nformat++); - ret = drm_universal_plane_init(dev, >plane, heads, _wndw, - format, nformat, - nouveau_display(dev)->format_modifiers, - type, "%s-%d", name, index); + if (type == DRM_PLANE_TYPE_CURSOR) + format_modifiers = nv50_cursor_format_modifiers; + else + format_modifiers = nouveau_display(dev)->format_modifiers; + + ret = drm_universal_plane_init(dev, >plane, heads, _wndw, format, nformat, + format_modifiers, type, "%s-%d", name, index); if (ret) { kfree(*pwndw); *pwndw = NULL; ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
On 9/1/20 3:59 AM, Karol Herbst wrote: On Tue, Sep 1, 2020 at 9:13 AM Daniel Vetter wrote: On Tue, Aug 18, 2020 at 04:37:51PM +0200, Thierry Reding wrote: On Fri, Aug 14, 2020 at 07:25:17PM +0200, Daniel Vetter wrote: On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone wrote: Hi, On Fri, 14 Aug 2020 at 17:22, Thierry Reding wrote: I suspect that the reason why this works in X but not in Wayland is because X passes the right usage flags, whereas Weston may not. But I'll have to investigate more in order to be sure. Weston allocates its own buffers for displaying the result of composition through GBM with USE_SCANOUT, which is definitely correct. Wayland clients (common to all compositors, in Mesa's src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but _not_ USE_SCANOUT, which is correct in that they are guaranteed to be shared, but not guaranteed to be scanned out. The expectation is that non-scanout-compatible buffers would be rejected by gbm_bo_import if not drmModeAddFB2. One difference between Weston and all other compositors (GNOME Shell, KWin, Sway, etc) is that Weston uses KMS planes for composition when it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from gbm_bo handle + atomic check succeed), but the other compositors only use the GPU. So if you have different assumptions about the layout of imported buffers between the GPU and KMS, that would explain a fair bit. Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I think. I guess the only option is if the tegra mesa driver forces linear and an extra copy on everything that's USE_SHARED or USE_SCANOUT. I ended up trying this, but this fails for the X case, unfortunately, because there doesn't seem to be a good synchronization point at which the de-tiling blit could be done. Weston and kmscube end up calling a gallium driver's ->flush_resource() implementation, but that never happens for X and glamor. But after looking into this some more, I don't think that's even the problem that we're facing here. The root of the problem that causes the glxgears crash that Karol was originally reporting is because we end up allocating the glxgears pixmaps using the dri3 loader from Mesa. But the dri3 loader will unconditionally pass both __DRI_IMAGE_USE_SHARE and __DRI_IMAGE_USE_SCANOUT, irrespective of whether the buffer will end up being scanned out directly or whether it will be composited onto the root window. What exactly happens depends on whether I run glxgears in fullscreen mode or windowed mode. In windowed mode, the glxgears buffers will be composited onto the root window, so there's no need for the buffers to be scanout-capable. If I modify the dri3 loader to not pass those flags I can make this work just fine. When I run glxgears in fullscreen mode, the modesetting driver ends up wanting to display the glxgears buffer directly on screen, without compositing it onto the root window. This ends up working if I leave out the _USE_SHARE and _USE_SCANOUT flags, but I notice that the kernel then complains about being unable to create a framebuffer, which in turn is caused by the fact that those buffers are not exported (the Tegra Mesa driver only exports/imports buffers that are meant for scanout, under the assumption that those are the only ones that will ever need to be used by KMS) and therefore Tegra DRM doesn't have a valid handle for them. So I think an ideal solution would probably be for glxgears to somehow pass better usage information when allocating buffers, but I suspect that that's just not possible, or would be way too much work and require additional protocol at the DRI level, so it's not really a good option when all we want to fix is backwards-compatibility with pre-modifiers userspace. Given that glamor also doesn't have any synchronization points, I don't see how I can implement the de-tiling blit reliably. I was wondering if it shouldn't be possible to flush the framebuffer resource (and perform the blit) at presentation time, but I couldn't find a good entry point to do this. One other solution that occurred to me was to reintroduce an old IOCTL that we used to have in the Tegra DRM driver. That IOCTL was meant to attach tiling meta data to an imported buffer and was basically a simplified, driver-specific way of doing framebuffer modifiers. That's a very ugly solution, but it would allow us to be backwards-compatible with pre-modifiers userspace and even use an optimal path for rendering and scanning out. The only prerequisite would be that the driver IOCTL was implemented and that a recent enough Mesa was used to make use of it. I don't like this very much because framebuffer modifiers are a much more generic solution, but all of the other options above are pretty much just as ugly. One other idea that I haven't explored yet is to be a little more clever about the export/import dance that we do for buffers. Currently we export/import at allocation time, and that seems to
Re: [RFC] Experimental DMA-BUF Device Heaps
On 8/23/20 1:46 PM, Laurent Pinchart wrote: Hi James, On Sun, Aug 23, 2020 at 01:04:43PM -0700, James Jones wrote: On 8/20/20 1:15 AM, Ezequiel Garcia wrote: On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: On 8/17/20 8:18 AM, Brian Starkey wrote: On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: This heap is basically a wrapper around DMA-API dma_alloc_attrs, which will allocate memory suitable for the given device. The implementation is mostly a port of the Contiguous Videobuf2 memory allocator (see videobuf2/videobuf2-dma-contig.c) over to the DMA-BUF Heap interface. The intention of this allocator is to provide applications with a more system-agnostic API: the only thing the application needs to know is which device to get the buffer for. Whether the buffer is backed by CMA, IOMMU or a DMA Pool is unknown to the application. I'm not really expecting this patch to be correct or even a good idea, but just submitting it to start a discussion on DMA-BUF heap discovery and negotiation. My initial reaction is that I thought dmabuf heaps are meant for use to allocate buffers for sharing across devices, which doesn't fit very well with having per-device heaps. For single-device allocations, would using the buffer allocation functionality of that device's native API be better in most cases? (Some other possibly relevant discussion at [1]) I can see that this can save some boilerplate for devices that want to expose private chunks of memory, but might it also lead to 100 aliases for the system's generic coherent memory pool? I wonder if a set of helpers to allow devices to expose whatever they want with minimal effort would be better. I'm rather interested on where this goes, as I was toying with using some sort of heap ID as a basis for a "device-local" constraint in the memory constraints proposals Simon and I will be discussing at XDC this year. It would be rather elegant if there was one type of heap ID used universally throughout the kernel that could provide a unique handle for the shared system memory heap(s), as well as accelerator-local heaps on fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could negotiate a location among themselves. This patch seems to be a step towards that in a way, but I agree it would be counterproductive if a bunch of devices that were using the same underlying system memory ended up each getting their own heap ID just because they used some SW framework that worked that way. Would appreciate it if you could send along a pointer to your BoF if it happens! Here is it: https://linuxplumbersconf.org/event/7/contributions/818/ It would be great to see you there and discuss this, given I was hoping we could talk about how to meet a userspace allocator library expectations as well. Thanks! I hadn't registered for LPC and it looks like it's sold out, but I'll try to watch the live stream. This is very interesting, in that it looks like we're both trying to solve roughly the same set of problems but approaching it from different angles. From what I gather, your approach is that a "heap" encompasses all the allocation constraints a device may have. The approach Simon Ser and I are tossing around so far is somewhat different, but may potentially leverage dma-buf heaps a bit as well. Our approach looks more like what I described at XDC a few years ago, where memory constraints for a given device's usage of an image are exposed up to applications, which can then somehow perform boolean intersection/union operations on them to arrive at a common set of constraints that describe something compatible with all the devices & usages desired (or fail to do so, and fall back to copying things around presumably). I believe this is more flexible than your initial proposal in that devices often support multiple usages (E.g., different formats, different proprietary layouts represented by format modifiers, etc.), and it avoids adding a combinatorial number of heaps to manage that. In my view, heaps are more like blobs of memory that can be allocated from in various different ways to satisfy constraints. I realize heaps mean something specific in the dma-buf heap design (specifically, something closer to an association between an "allocation mechanism" and "physical memory"), but I hope we don't have massive heap/allocator mechanism proliferation due to constraints alone. Perhaps some constraints, such as contiguous memory or device-local memory, are properly expressed as a specific heap, but consider the proliferation implied by even that simple pair of examples: How do you express contiguous device-local memory? Do you need to spawn two heaps on the underlying device-local memory, one for contiguous allocations and one for non-contiguous allocations? Seems excessive. Of course, our approach also has downsides and is still being worked on. For example, it works best in an id
Re: [RFC] Experimental DMA-BUF Device Heaps
On 8/20/20 1:15 AM, Ezequiel Garcia wrote: On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: On 8/17/20 8:18 AM, Brian Starkey wrote: Hi Ezequiel, On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: This heap is basically a wrapper around DMA-API dma_alloc_attrs, which will allocate memory suitable for the given device. The implementation is mostly a port of the Contiguous Videobuf2 memory allocator (see videobuf2/videobuf2-dma-contig.c) over to the DMA-BUF Heap interface. The intention of this allocator is to provide applications with a more system-agnostic API: the only thing the application needs to know is which device to get the buffer for. Whether the buffer is backed by CMA, IOMMU or a DMA Pool is unknown to the application. I'm not really expecting this patch to be correct or even a good idea, but just submitting it to start a discussion on DMA-BUF heap discovery and negotiation. My initial reaction is that I thought dmabuf heaps are meant for use to allocate buffers for sharing across devices, which doesn't fit very well with having per-device heaps. For single-device allocations, would using the buffer allocation functionality of that device's native API be better in most cases? (Some other possibly relevant discussion at [1]) I can see that this can save some boilerplate for devices that want to expose private chunks of memory, but might it also lead to 100 aliases for the system's generic coherent memory pool? I wonder if a set of helpers to allow devices to expose whatever they want with minimal effort would be better. I'm rather interested on where this goes, as I was toying with using some sort of heap ID as a basis for a "device-local" constraint in the memory constraints proposals Simon and I will be discussing at XDC this year. It would be rather elegant if there was one type of heap ID used universally throughout the kernel that could provide a unique handle for the shared system memory heap(s), as well as accelerator-local heaps on fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could negotiate a location among themselves. This patch seems to be a step towards that in a way, but I agree it would be counterproductive if a bunch of devices that were using the same underlying system memory ended up each getting their own heap ID just because they used some SW framework that worked that way. Would appreciate it if you could send along a pointer to your BoF if it happens! Here is it: https://linuxplumbersconf.org/event/7/contributions/818/ It would be great to see you there and discuss this, given I was hoping we could talk about how to meet a userspace allocator library expectations as well. Thanks! I hadn't registered for LPC and it looks like it's sold out, but I'll try to watch the live stream. This is very interesting, in that it looks like we're both trying to solve roughly the same set of problems but approaching it from different angles. From what I gather, your approach is that a "heap" encompasses all the allocation constraints a device may have. The approach Simon Ser and I are tossing around so far is somewhat different, but may potentially leverage dma-buf heaps a bit as well. Our approach looks more like what I described at XDC a few years ago, where memory constraints for a given device's usage of an image are exposed up to applications, which can then somehow perform boolean intersection/union operations on them to arrive at a common set of constraints that describe something compatible with all the devices & usages desired (or fail to do so, and fall back to copying things around presumably). I believe this is more flexible than your initial proposal in that devices often support multiple usages (E.g., different formats, different proprietary layouts represented by format modifiers, etc.), and it avoids adding a combinatorial number of heaps to manage that. In my view, heaps are more like blobs of memory that can be allocated from in various different ways to satisfy constraints. I realize heaps mean something specific in the dma-buf heap design (specifically, something closer to an association between an "allocation mechanism" and "physical memory"), but I hope we don't have massive heap/allocator mechanism proliferation due to constraints alone. Perhaps some constraints, such as contiguous memory or device-local memory, are properly expressed as a specific heap, but consider the proliferation implied by even that simple pair of examples: How do you express contiguous device-local memory? Do you need to spawn two heaps on the underlying device-local memory, one for contiguous allocations and one for non-contiguous allocations? Seems excessive. Of course, our approach also has downsides and is still being worked on. For example, it works best in an ideal world where all the allocators available understand all the constraint
Re: [RFC] Experimental DMA-BUF Device Heaps
On 8/17/20 8:18 AM, Brian Starkey wrote: Hi Ezequiel, On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: This heap is basically a wrapper around DMA-API dma_alloc_attrs, which will allocate memory suitable for the given device. The implementation is mostly a port of the Contiguous Videobuf2 memory allocator (see videobuf2/videobuf2-dma-contig.c) over to the DMA-BUF Heap interface. The intention of this allocator is to provide applications with a more system-agnostic API: the only thing the application needs to know is which device to get the buffer for. Whether the buffer is backed by CMA, IOMMU or a DMA Pool is unknown to the application. I'm not really expecting this patch to be correct or even a good idea, but just submitting it to start a discussion on DMA-BUF heap discovery and negotiation. My initial reaction is that I thought dmabuf heaps are meant for use to allocate buffers for sharing across devices, which doesn't fit very well with having per-device heaps. For single-device allocations, would using the buffer allocation functionality of that device's native API be better in most cases? (Some other possibly relevant discussion at [1]) I can see that this can save some boilerplate for devices that want to expose private chunks of memory, but might it also lead to 100 aliases for the system's generic coherent memory pool? I wonder if a set of helpers to allow devices to expose whatever they want with minimal effort would be better. I'm rather interested on where this goes, as I was toying with using some sort of heap ID as a basis for a "device-local" constraint in the memory constraints proposals Simon and I will be discussing at XDC this year. It would be rather elegant if there was one type of heap ID used universally throughout the kernel that could provide a unique handle for the shared system memory heap(s), as well as accelerator-local heaps on fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could negotiate a location among themselves. This patch seems to be a step towards that in a way, but I agree it would be counterproductive if a bunch of devices that were using the same underlying system memory ended up each getting their own heap ID just because they used some SW framework that worked that way. Would appreciate it if you could send along a pointer to your BoF if it happens! Thanks, -James Cheers, -Brian 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a40...@kapsi.fi/ Given Plumbers is just a couple weeks from now, I've submitted a BoF proposal to discuss this, as perhaps it would make sense to discuss this live? Not-signed-off-by: Ezequiel Garcia ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
I'll defer to Thierry, but I think that may be by design. Tegra format modifiers were added to get things like this working in the first place, right? It's not a regression, is it? Thanks, -James On 8/13/20 10:19 AM, Karol Herbst wrote: another thing: with gsettings set org.gnome.mutter experimental-features '["kms-modifiers"]' it all just works out of the box with wayland, but that won't be enabled for quite some time, so we need to figure out what is broken (less so with my patch) under wayland with gnome :) On Thu, Aug 13, 2020 at 5:39 PM Karol Herbst wrote: btw, I just noticed that wayland with gnome-shell is totally busted. With this MR it at least displays something, but without it doesn't work at all. On Thu, Aug 13, 2020 at 3:00 PM Karol Herbst wrote: At least for now I've created an MR to revert the change: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300 But it seems like there was probably a good reason why it got added? Happy to have better fixes, but that's the best we've got so far I think? Thierry, what do you think? On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst wrote: in case you all were wondering, it works on xorg-server git because of this commit: https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95 On Wed, Aug 12, 2020 at 8:25 PM James Jones wrote: On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: ...and in merging my code with Alyssa's new panfrost format modifier support, I see panfrost does the opposite of this and treats a format modifier list of only INVALID as "don't care". I modeled the new nouveau behavior on the Iris driver. Now I'm not sure which is correct :-( and neither am I. Uh-oh. I modeled the panfrost code after v3d_resource_create_with_modifiers, which treats INVALID as "don't care". I can confirm the panfrost code works (in the sense that it's functional on the machines I've tested), but I don't know if it is actually correct. I think it is, since otherwise you end up using linear in places it's unnecessary, but I'm not sure where this is spec'd. It would depend on whether an app actually calls the function this way, and whether that app was tested I suppose. If I'm interpreting the Iris code correctly and it doesn't break anything, then I'm assuming both implementations are equally valid in that nothing exercises this path, but it would be good to have the intended behavior documented somewhere so we can try to work towards consistent in case someone tries it in the future. My nouveau change runs afoul of assumptions in the tegra driver, but that's easy enough to fix in lockstep if desired. Also, heads up: I'll ping you on my format modifier cleanup MR once I've pushed the latest version. The panfrost modifier usage was harder to merge into the refactoring than most, so it'll be good to have your review and if you have time, some testing. I think I landed on an elegant solution, but open to suggestions. Thanks, -James ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote: ...and in merging my code with Alyssa's new panfrost format modifier support, I see panfrost does the opposite of this and treats a format modifier list of only INVALID as "don't care". I modeled the new nouveau behavior on the Iris driver. Now I'm not sure which is correct :-( and neither am I. Uh-oh. I modeled the panfrost code after v3d_resource_create_with_modifiers, which treats INVALID as "don't care". I can confirm the panfrost code works (in the sense that it's functional on the machines I've tested), but I don't know if it is actually correct. I think it is, since otherwise you end up using linear in places it's unnecessary, but I'm not sure where this is spec'd. It would depend on whether an app actually calls the function this way, and whether that app was tested I suppose. If I'm interpreting the Iris code correctly and it doesn't break anything, then I'm assuming both implementations are equally valid in that nothing exercises this path, but it would be good to have the intended behavior documented somewhere so we can try to work towards consistent in case someone tries it in the future. My nouveau change runs afoul of assumptions in the tegra driver, but that's easy enough to fix in lockstep if desired. Also, heads up: I'll ping you on my format modifier cleanup MR once I've pushed the latest version. The panfrost modifier usage was harder to merge into the refactoring than most, so it'll be good to have your review and if you have time, some testing. I think I landed on an elegant solution, but open to suggestions. Thanks, -James ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
On 8/12/20 10:10 AM, Karol Herbst wrote: On Wed, Aug 12, 2020 at 7:03 PM James Jones wrote: On 8/12/20 5:37 AM, Ilia Mirkin wrote: On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst wrote: On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst wrote: On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst wrote: On Wed, Aug 12, 2020 at 2:19 AM James Jones wrote: Sorry for the slow reply here as well. I've been in the process of rebasing and reworking the userspace patches. I'm not clear my changes will address the Jetson Nano issue, but if you'd like to try them, the latest userspace changes are available here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 And the tegra-drm kernel patches are here: https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/ Those + the kernel changes addressed in this thread are everything I had outstanding. I don't know if that's caused by your changes or not, but now the assert I hit is a different one pointing out that nvc0_miptree_select_best_modifier fails in a certain case and returns MOD_INVALID... anyway, it seems like with your patches applied it's now way easier to debug and figure out what's going wrong, so maybe I can figure it out now :) collected some information which might help to track it down. src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} inside tegra_screen_resource_create modifier says DRM_FORMAT_MOD_INVALID as template->bind is 1 and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, so the call just returns NULL leading to the assert. Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. So I digged a bit deeper and here is what tripps it of: when the context gets made current, the normal framebuffer validation and render buffer allocation is done, but we end up inside tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set in template->bind. Now the tegra driver forces the DRM_FORMAT_MOD_LINEAR modifier and calls into resource_create_with_modifiers. If it wouldn't do that, nouveau would allocate a tiled buffer, with that it's linear and we at some point end up with an assert about a depth_stencil buffer being there even though it shouldn't. If I always use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things just work. That's kind of the cause I pinpointed the issue down to. But I have no idea what's supposed to happen and what the actual bug is. Yeah, the bug with tegra has always been "trying to render to linear color + tiled depth", which the hardware plain doesn't support. (And linear depth isn't a thing.) Question is whether what it's doing necessary. PIPE_BIND_SCANOUT (/linear) requirements are needed for DRI2 to work (well, maybe not in theory, but at least in practice the nouveau ddx expects linear buffers). However tegra operates on a more DRI3-like basis, so with "client" allocations, tiled should work OK as long as there's something in tegra to copy it to linear when necessary? I can confirm the above: Our hardware can't render to linear depth buffers, nor can it mix linear color buffers with block linear depth buffers. I think there's a misunderstanding on expected behavior of resource_create_with_modifiers() here too: tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the only modifier in non-scanout cases. Previously, I believe nouveau may have treated that as "no modifiers specified. Fall back to internal layout selection logic", but in my patches I "fixed" it to match other drivers' behavior, in that allocation will fail if that is the only modifier in the list, since it is equivalent to passing in a list containing only unsupported modifiers. To get fallback behavior, tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, count), or just call resource_create() on the underlying screen instead. ...and in merging my code with Alyssa's new panfrost format modifier support, I see panfrost does the opposite of this and treats a format modifier list of only INVALID as "don't care". I modeled the new nouveau behavior on the Iris driver. Now I'm not sure which is correct :-( Thanks, -James Beyond that, I can only offer my thoughts based on analysis of the code referenced here so far: While I've learned from the origins of this thread applications/things external to Mesa in general shouldn't be querying format modifiers of buffers created without format modifiers, tegra is a Mesa internal component that already has some intimate knowledge of how the nouveau driver it sits on top of works. Nouveau will always be able to const
Re: [git pull] drm for 5.8-rc1
On 8/12/20 5:37 AM, Ilia Mirkin wrote: On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst wrote: On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst wrote: On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst wrote: On Wed, Aug 12, 2020 at 2:19 AM James Jones wrote: Sorry for the slow reply here as well. I've been in the process of rebasing and reworking the userspace patches. I'm not clear my changes will address the Jetson Nano issue, but if you'd like to try them, the latest userspace changes are available here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 And the tegra-drm kernel patches are here: https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/ Those + the kernel changes addressed in this thread are everything I had outstanding. I don't know if that's caused by your changes or not, but now the assert I hit is a different one pointing out that nvc0_miptree_select_best_modifier fails in a certain case and returns MOD_INVALID... anyway, it seems like with your patches applied it's now way easier to debug and figure out what's going wrong, so maybe I can figure it out now :) collected some information which might help to track it down. src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf) templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0 = 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target = PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples = 0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0} inside tegra_screen_resource_create modifier says DRM_FORMAT_MOD_INVALID as template->bind is 1 and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID, so the call just returns NULL leading to the assert. Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears. So I digged a bit deeper and here is what tripps it of: when the context gets made current, the normal framebuffer validation and render buffer allocation is done, but we end up inside tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set in template->bind. Now the tegra driver forces the DRM_FORMAT_MOD_LINEAR modifier and calls into resource_create_with_modifiers. If it wouldn't do that, nouveau would allocate a tiled buffer, with that it's linear and we at some point end up with an assert about a depth_stencil buffer being there even though it shouldn't. If I always use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things just work. That's kind of the cause I pinpointed the issue down to. But I have no idea what's supposed to happen and what the actual bug is. Yeah, the bug with tegra has always been "trying to render to linear color + tiled depth", which the hardware plain doesn't support. (And linear depth isn't a thing.) Question is whether what it's doing necessary. PIPE_BIND_SCANOUT (/linear) requirements are needed for DRI2 to work (well, maybe not in theory, but at least in practice the nouveau ddx expects linear buffers). However tegra operates on a more DRI3-like basis, so with "client" allocations, tiled should work OK as long as there's something in tegra to copy it to linear when necessary? I can confirm the above: Our hardware can't render to linear depth buffers, nor can it mix linear color buffers with block linear depth buffers. I think there's a misunderstanding on expected behavior of resource_create_with_modifiers() here too: tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the only modifier in non-scanout cases. Previously, I believe nouveau may have treated that as "no modifiers specified. Fall back to internal layout selection logic", but in my patches I "fixed" it to match other drivers' behavior, in that allocation will fail if that is the only modifier in the list, since it is equivalent to passing in a list containing only unsupported modifiers. To get fallback behavior, tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, count), or just call resource_create() on the underlying screen instead. Beyond that, I can only offer my thoughts based on analysis of the code referenced here so far: While I've learned from the origins of this thread applications/things external to Mesa in general shouldn't be querying format modifiers of buffers created without format modifiers, tegra is a Mesa internal component that already has some intimate knowledge of how the nouveau driver it sits on top of works. Nouveau will always be able to construct and return a valid format modifier for unorm single sampled color buffers (and hopefully, anything that can scan out going forward), both before and after my patches I believe, regardless of how they were allocated. After my patches, it should even work for things that can't scan out in theory. Hence, looking at this without knowledge of what motivated the original changes, it s
Re: [git pull] drm for 5.8-rc1
Sorry for the slow reply here as well. I've been in the process of rebasing and reworking the userspace patches. I'm not clear my changes will address the Jetson Nano issue, but if you'd like to try them, the latest userspace changes are available here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 And the tegra-drm kernel patches are here: https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/ Those + the kernel changes addressed in this thread are everything I had outstanding. Thanks, -James On 8/4/20 1:58 AM, Karol Herbst wrote: Hi James, I don't know if you knew, but on the Jetson nano we had the issue for quite some time, that GLX/EGL through mesa on X was broken due to some fix in mesa related to modifiers. And I was wondering if the overall state just caused the issue we saw here and wanted to know what branches/patches I needed for the various projects to see if the work you have been doing since the last upstream nouveau regression would be of any help here? Mind pointing me towards everything I'd need to check that? I'd really like to fix this, but didn't have the time to investigate what the core problem here was, but I think it's very likely that a fixed/improved modifier support could actually fix it as well. Alternately I'd like to move to kmsro in mesa as this fixes it as well, but that could just be by coincidence and would break other devices.. Thanks On Tue, Jul 14, 2020 at 4:32 PM James Jones wrote: Still testing. I'll get a Sign-off version out this week unless I find a problem. Thanks, -James On 7/12/20 6:37 PM, Dave Airlie wrote: How are we going with a fix for this regression I can commit? Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v4] drm/nouveau: Accept 'legacy' format modifiers
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..07373bbc2acf 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,6 +191,7 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); if (modifier == DRM_FORMAT_MOD_LINEAR) { @@ -202,6 +203,12 @@ nouveau_decode_mod(struct nouveau_drm *drm, * Extract the block height and kind from the corresponding * modifier fields. See drm_fourcc.h for details. */ + + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this dev's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + *tile_mode = (uint32_t)(modifier & 0xF); *kind = (uint8_t)((modifier >> 12) & 0xFF); @@ -227,6 +234,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +264,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3] drm/nouveau: Accept 'legacy' format modifiers
On 7/30/20 3:19 PM, Kirill A. Shutemov wrote: On Thu, Jul 30, 2020 at 10:26:17AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, kmscube hacked to use linear mod, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; Em. I thought Ben's suggestion was to move it under != MOD_LINEAR. I don't see it here. Yes, it looks like I forgot to commit before generating the patch. v4 sent. Thanks, -James ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH v2] drm/nouveau: Accept 'legacy' format modifiers
On 7/29/20 7:47 AM, Kirill A. Shutemov wrote: On Wed, Jul 29, 2020 at 01:40:13PM +1000, Ben Skeggs wrote: On Wed, 29 Jul 2020 at 12:48, Dave Airlie wrote: On Tue, 28 Jul 2020 at 04:51, James Jones wrote: On 7/23/20 9:06 PM, Ben Skeggs wrote: On Sat, 18 Jul 2020 at 13:34, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } I believe this should be moved into the != MOD_LINEAR case. Yes, of course, thanks. I need to re-evaluate my testing yet again to make sure I hit that case too. Preparing a v3... Going to need something here in the next day, two max. Linus may wait for another week, but it's not guaranteed. I tested a whole bunch of GPUs before sending nouveau's -next tree, and with the change I suggested to this patch + the other stuff I sent through -fixes already, things seemed to be in OK shape. JFYI, the adjusted (moved into != MOD_LINEAR case) patch works fine for me on top of drm-fixes-2020-07-29. Sorry again for the delays (life is terrible lately), but the signed-off version with Ben's suggestion went out this morning, and I specifically tested linear modifiers in addition to retesting all the other test cases mentioned in the patch. Thanks, -James ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3] drm/nouveau: Accept 'legacy' format modifiers
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, kmscube hacked to use linear mod, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH v2] drm/nouveau: Accept 'legacy' format modifiers
On 7/23/20 9:06 PM, Ben Skeggs wrote: On Sat, 18 Jul 2020 at 13:34, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } I believe this should be moved into the != MOD_LINEAR case. Yes, of course, thanks. I need to re-evaluate my testing yet again to make sure I hit that case too. Preparing a v3... Thanks, -James + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ Nouveau mailing list nouv...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers
Did you just cherry-pick my change, or were you running the latest drm-next or drm-fixes code? There do appear to be various MM-related fixes that may be related to this in drm-fixes when I scroll down the log looking for nouveau stuff. Shot in the dark, but might be worth trying with Dave's tree if you weren't already. I was testing with drm-fixes-2020-07-17-1 from here: git://anongit.freedesktop.org/drm/drm Thanks, -James On 7/17/20 8:13 PM, James Jones wrote: This doesn't look related. -James On 7/17/20 2:30 PM, Kirill A. Shutemov wrote: On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones I tried and it crashes. Not sure if it's related. [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:vblank_disable_fn] disabling vblank on crtc 0 [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF BUG: unable to handle page fault for address: 059c #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 13 PID: 3351 Comm: alacritty Tainted: G I 5.8.0-rc5-00191-g086f86c033f9 #53 Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F5d 11/28/2019 RIP: 0010:kmem_cache_alloc_trace (/home/kas/linux/torvalds/mm/slub.c:272 /home/kas/linux/torvalds/mm/slub.c:278 /home/kas/linux/torvalds/mm/slub.c:292 /home/kas/linux/torvalds/mm/slub.c:2791 /home/kas/linux/torvalds/mm/slub.c:2832 /home/kas/linux/torvalds/mm/slub.c:2849) Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 75 e7 4c 8b 38 4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 1c 07 40 f6 c7 0f 0f 85 95 01 00 00 48 8d 8a 80 00 00 00 4c All code 0: 8b 51 08 mov 0x8(%rcx),%edx 3: 48 89 c8 mov %rcx,%rax 6: 65 48 03 05 d4 0e ca add %gs:0x70ca0ed4(%rip),%rax # 0x70ca0ee2 d: 70 e: 48 8b 70 08 mov 0x8(%rax),%rsi 12: 48 39 f2 cmp %rsi,%rdx 15: 75 e7 jne 0xfffe 17: 4c 8b 38 mov (%rax),%r15 1a: 4d 85 ff test %r15,%r15 1d: 0f 84 8f 01 00 00 je 0x1b2 23: 8b 45 20 mov 0x20(%rbp),%eax 26: 48 8b 7d 00 mov 0x0(%rbp),%rdi 2a:* 49 8b 1c 07 mov (%r15,%rax,1),%rbx <-- trapping instruction 2e: 40 f6 c7 0f test $0xf,%dil 32: 0f 85 95 01 00 00 jne 0x1cd 38: 48 8d 8a 80 00 00 00 lea 0x80(%rdx),%rcx 3f: 4c rex.WR Code starting with the faulting instruction === 0: 49 8b 1c 07 mov (%r15,%rax,1),%rbx 4: 40 f6 c7 0f test $0xf,%dil 8: 0f 85 95 01 00 00 jne 0x1a3 e: 48 8d 8a 80 00 00 00 lea 0x80(%rdx),%rcx 15: 4c rex.WR RSP: 0018:a8a381bcfba0 EFLAGS: 00010206 RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50 RDX: fc8d RSI: fc8d RDI: 0003fe50 RBP: 9c0b18407840 R08: R09: 0001 R10: 9c0b06c28000 R11: 0001 R12: 0dc0 [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE R13: 0060 R14: 8fa35a47 R15: 056c FS: 7fbe7a8e3900() GS:9c0b1f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 059c CR3: 00103c7fe004 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 /home/kas/linux/torvalds/include/linux/slab.h:669 /home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423) [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 nouveau_gem_ioctl_pushbuf (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852) [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_QUEUE_SEQUENCE ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) ? drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793) ? nouveau_gem_ioctl_ne
Re: [PATCH] drm/nouveau: Accept 'legacy' format modifiers
On 7/17/20 12:47 PM, Daniel Vetter wrote: On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Just bikeshed, but maybe a few more words on what exactly is broken and how this works around it. Specifically why we only accept these, but don't advertise them. Added quite a few words. Signed-off-by: James Jones Needs Fixes: line here. Also nice to mention the bug reporter/link. Done in v2. --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } Hm I tried to understand what this magic does by looking at drm_fourcc.h, but the drm_fourcc_canonicalize_nvidia_format_mod() in there implements something else. Is that function wrong, or should we use it here instead? > Or is there something else going on entirely? This may be slightly clearer with the expanded change description: Canonicalize assumes the old modifiers are only used by certain Tegra revisions, because the Mesa patches were supposed to land and obliterate all uses beyond that. That assumption means it can assume the specific page kind (0xfe) used by the display-engine-compatible layout on those specific devices. There is no way to generally canonicalize a legacy modifier without referencing a specific device type, as is indirectly done here. This code does a limited device-specific canonicalization: It substitutes the display-appropriate page kind used by this specific device, ensuring we derive this correct page kind later in the function. I iterated on the best way to accomplish this a few times, and this was the least-invasive thing I came up with, but it does require a pretty thorough understanding of the NVIDIA modifier macros. Thanks for the quick review. -James Cheers, Daniel + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2] drm/nouveau: Accept 'legacy' format modifiers
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Existing Mesa drivers are still aware of only these older format modifiers which do not differentiate between different variations of the block linear layout. When the format modifier support flag was flipped in the nouveau kernel driver, the X.org modesetting driver began attempting to use its format modifier-enabled framebuffer path. Because the set of format modifiers advertised by the kernel prior to this change do not intersect with the set of format modifiers advertised by Mesa, allocating GBM buffers using format modifiers fails and the modesetting driver falls back to non-modifier allocation. However, it still later queries the modifier of the GBM buffer when creating its DRM-KMS framebuffer object, receives the old-format modifier from Mesa, and attempts to create a framebuffer with it. Since the kernel is still not aware of these formats, this fails. Userspace should not be attempting to query format modifiers of GBM buffers allocated with a non- format-modifier-aware allocation path, but to avoid breaking existing userspace behavior, this change accepts the old-style format modifiers when creating framebuffers and applying them to planes by translating them to the equivalent new-style modifier. To accomplish this, some layout parameters must be assumed to match properties of the device targeted by the relevant ioctls. To avoid perpetuating misuse of the old-style modifiers, this change does not advertise support for them. Doing so would imply compatibility between devices with incompatible memory layouts. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Reported-by: Kirill A. Shutemov Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers") Link: https://lkml.org/lkml/2020/6/30/1251 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/nouveau: Accept 'legacy' format modifiers
This doesn't look related. -James On 7/17/20 2:30 PM, Kirill A. Shutemov wrote: On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones I tried and it crashes. Not sure if it's related. [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:vblank_disable_fn] disabling vblank on crtc 0 [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF [drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF BUG: unable to handle page fault for address: 059c #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 13 PID: 3351 Comm: alacritty Tainted: G I 5.8.0-rc5-00191-g086f86c033f9 #53 Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F5d 11/28/2019 RIP: 0010:kmem_cache_alloc_trace (/home/kas/linux/torvalds/mm/slub.c:272 /home/kas/linux/torvalds/mm/slub.c:278 /home/kas/linux/torvalds/mm/slub.c:292 /home/kas/linux/torvalds/mm/slub.c:2791 /home/kas/linux/torvalds/mm/slub.c:2832 /home/kas/linux/torvalds/mm/slub.c:2849) Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 75 e7 4c 8b 38 4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 1c 07 40 f6 c7 0f 0f 85 95 01 00 00 48 8d 8a 80 00 00 00 4c All code 0: 8b 51 08mov0x8(%rcx),%edx 3: 48 89 c8mov%rcx,%rax 6: 65 48 03 05 d4 0e caadd%gs:0x70ca0ed4(%rip),%rax# 0x70ca0ee2 d: 70 e: 48 8b 70 08 mov0x8(%rax),%rsi 12: 48 39 f2cmp%rsi,%rdx 15: 75 e7 jne0xfffe 17: 4c 8b 38mov(%rax),%r15 1a: 4d 85 fftest %r15,%r15 1d: 0f 84 8f 01 00 00 je 0x1b2 23: 8b 45 20mov0x20(%rbp),%eax 26: 48 8b 7d 00 mov0x0(%rbp),%rdi 2a:* 49 8b 1c 07 mov(%r15,%rax,1),%rbx <-- trapping instruction 2e: 40 f6 c7 0f test $0xf,%dil 32: 0f 85 95 01 00 00 jne0x1cd 38: 48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx 3f: 4c rex.WR Code starting with the faulting instruction === 0: 49 8b 1c 07 mov(%r15,%rax,1),%rbx 4: 40 f6 c7 0f test $0xf,%dil 8: 0f 85 95 01 00 00 jne0x1a3 e: 48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx 15: 4c rex.WR RSP: 0018:a8a381bcfba0 EFLAGS: 00010206 RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50 RDX: fc8d RSI: fc8d RDI: 0003fe50 RBP: 9c0b18407840 R08: R09: 0001 R10: 9c0b06c28000 R11: 0001 R12: 0dc0 [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE R13: 0060 R14: 8fa35a47 R15: 056c FS: 7fbe7a8e3900() GS:9c0b1f88() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 059c CR3: 00103c7fe004 CR4: 003606e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 /home/kas/linux/torvalds/include/linux/slab.h:669 /home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423) [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 nouveau_gem_ioctl_pushbuf (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852) [drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_QUEUE_SEQUENCE ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) ? drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793) ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793) drm_ioctl (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:888) ? nouveau_gem_ioctl_new (/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680) ? _raw_spin_unlock_irqrestore (/home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:41 /home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:84 /home/kas/linux/torvalds/include/linux/spinlock_api_smp.h:160 /home/kas/linux/torvalds/k
Re: [PATCH] drm/nouveau: Accept 'legacy' format modifiers
This should resolve the inability to start X with the new NV format modifier support in nouveau. FYI, I'm offline next week, but I'll check in tonight in case there are any review comments. When I'm back, I'll get the associated userspace fixes cleaned up and out to the appropriate lists. Thanks, -James On 7/17/20 11:57 AM, James Jones wrote: Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/nouveau: Accept 'legacy' format modifiers
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK() family of modifiers to handle broken userspace Xorg modesetting and Mesa drivers. Tested with Xorg 1.20 modesetting driver, weston@c46c70dac84a4b3030cd05b380f9f410536690fc, gnome & KDE wayland desktops from Ubuntu 18.04, and sway 1.5 Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 26 +-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 496c4621cc78..31543086254b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm, uint32_t *tile_mode, uint8_t *kind) { + struct nouveau_display *disp = nouveau_display(drm->dev); BUG_ON(!tile_mode || !kind); + if ((modifier & (0xffull << 12)) == 0ull) { + /* Legacy modifier. Translate to this device's 'kind.' */ + modifier |= disp->format_modifiers[0] & (0xffull << 12); + } + if (modifier == DRM_FORMAT_MOD_LINEAR) { /* tile_mode will not be used in this case */ *tile_mode = 0; @@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, } } +static const u64 legacy_modifiers[] = { + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4), + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5), + DRM_FORMAT_MOD_INVALID +}; + static int nouveau_validate_decode_mod(struct nouveau_drm *drm, uint64_t modifier, @@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm, (disp->format_modifiers[mod] != modifier); mod++); - if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) - return -EINVAL; + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) { + for (mod = 0; +(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(legacy_modifiers[mod] != modifier); +mod++); + if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + } nouveau_decode_mod(drm, modifier, tile_mode, kind); -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
Still testing. I'll get a Sign-off version out this week unless I find a problem. Thanks, -James On 7/12/20 6:37 PM, Dave Airlie wrote: How are we going with a fix for this regression I can commit? Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
On 7/2/20 2:14 PM, James Jones wrote: On 7/2/20 1:22 AM, Daniel Stone wrote: Hi, On Wed, 1 Jul 2020 at 20:45, James Jones wrote: OK, I think I see what's going on. In the Xorg modesetting driver, the logic is basically: if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); } else { drmModeAddFB(...); } I read this thread expecting to explain the correct behaviour we implement in Weston and how modesetting needs to be fixed, but ... that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'. Yes, the hazards of reporting findings before verifying. I now see modesetting does query the DRM-KMS modifiers and attempt to allocate with them if it found any. However, I still see a lot of ways things can go wrong, but I'm not going to share my speculation again until I've actually verified it, which is taking a frustratingly long time. The modesetting driver is not my friend right now. OK, several hours of dumb build+config mistakes later, I was actually able to reproduce the failure and walk through things. There is a trivial fix for the issues in the X modesetting driver, working off Daniel Stone's claim that gbm_bo_get_modifier() should only be called when the allocation was made with gbm_bo_create_with_modifiers(). modeset doesn't respect that requirement now in the case that the atomic modesetting path is disabled, which is always the case currently because that path is broken. Respecting that requirement is a half-liner and allows X to start properly. If I force modeset to use the atomic path, X still fails to start with the above fix, validating the second theory I'd had: -Current Mesa nouveau code basically ignores the modifier list passed in unless it is a single modifier requesting linear layout, and goes about allocating whatever layout it sees fit, and succeeds the allocation despite being passed a list of modifiers it knows nothing about. Not great, fixed in my pending patches, obviously doesn't help existing deployed userspace. -Current Mesa nouveau code, when asked what modifier it used for the above allocation, returns one of the "legacy" modifiers nouveau DRM-KMS knows nothing about. -When the modeset driver tries to create an FB for that BO with the returned modifier, the nouveau kernel driver of course refuses. I think it's probably worth fixing the modesetting driver for the reasons Daniel Vetter mentioned. Then if I get my Mesa patches in before a new modesetting driver with working Atomic support is released, there'll be no need for long-term workarounds in the kernel. Down to the real question of what to do in the kernel to support current userspace code: I still think the best fix is to accept the old modifiers but not advertise them. However, Daniel Stone and others, if you think this will actually break userspace in other ways (Could you describe in a bit more detail or point me to test cases if so?), I suppose the only option would be to advertise & accept the old modifiers for now, and I suppose at a config option at some point to phase the old ones out, eventually drop them entirely. This would be unfortunate, because as I mentioned, it could sometimes result in situations where apps think they can share a buffer between two devices but will get garbled data in practice. I've included an initial version of the kernel patch inline below. Needs more testing, but I wanted to share it in case anyone has feedback on the idea, wants to see the general workflow, or wants to help test. There's no attempt to verify the DRM-KMS device supports the modifier, but then, why would there be? GBM presumably chose a supported modifier at buffer creation time, and we don't know which plane the FB is going to be used with yet. GBM doesn't actually ask the kernel which modifiers it supports here either though. Right, it doesn't ask, because userspace tells it which modifiers to use. The correct behaviour is to take the list from the KMS `IN_FORMATS` property and then pass that to `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from that list and only that list. If that call does not succeed and Xorg falls back to `gbm_surface_create`, then it must not call `gbm_bo_get_modifier` - so that would be a modesetting bug. If that call does succeed and `gbm_bo_get_modifier` subsequently reports a modifier which was not in the list, that's a Mesa driver bug. It just goes into Mesa via DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa just hard-codes the modifiers in its driver backends since its thinking in terms of a device's 3D engine, not display. In theory, Mesa's DRI drivers could query KMS for supported modifiers if allocating from GBM using the non-modifiers path and the SCANOUT flag is set (perhaps some drivers
Re: [git pull] drm for 5.8-rc1
On 7/2/20 1:22 AM, Daniel Stone wrote: Hi, On Wed, 1 Jul 2020 at 20:45, James Jones wrote: OK, I think I see what's going on. In the Xorg modesetting driver, the logic is basically: if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); } else { drmModeAddFB(...); } I read this thread expecting to explain the correct behaviour we implement in Weston and how modesetting needs to be fixed, but ... that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'. Yes, the hazards of reporting findings before verifying. I now see modesetting does query the DRM-KMS modifiers and attempt to allocate with them if it found any. However, I still see a lot of ways things can go wrong, but I'm not going to share my speculation again until I've actually verified it, which is taking a frustratingly long time. The modesetting driver is not my friend right now. There's no attempt to verify the DRM-KMS device supports the modifier, but then, why would there be? GBM presumably chose a supported modifier at buffer creation time, and we don't know which plane the FB is going to be used with yet. GBM doesn't actually ask the kernel which modifiers it supports here either though. Right, it doesn't ask, because userspace tells it which modifiers to use. The correct behaviour is to take the list from the KMS `IN_FORMATS` property and then pass that to `gbm_(bo|surface)_create_with_modifiers`; GBM must then select from that list and only that list. If that call does not succeed and Xorg falls back to `gbm_surface_create`, then it must not call `gbm_bo_get_modifier` - so that would be a modesetting bug. If that call does succeed and `gbm_bo_get_modifier` subsequently reports a modifier which was not in the list, that's a Mesa driver bug. It just goes into Mesa via DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa just hard-codes the modifiers in its driver backends since its thinking in terms of a device's 3D engine, not display. In theory, Mesa's DRI drivers could query KMS for supported modifiers if allocating from GBM using the non-modifiers path and the SCANOUT flag is set (perhaps some drivers do this or its equivalent? Haven't checked.), but that seems pretty gnarly and doesn't fix the modifier-based GBM allocation path AFAIK. Bit of a mess. Two options for GBM users: * call gbm_*_create_with_modifiers, it succeeds, call gbm_bo_get_modifier, pass modifier into AddFB * call gbm_*_create (without modifiers), it succeeds, do not call gbm_bo_get_modifier, do not pass a modifier into AddFB Anything else is a bug in the user. Note that falling back from 1 to 2 is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back to the non-modifier path, provided you don't later try to get a modifier back out. For a quick userspace fix that could probably be pushed out everywhere (Only affects Xorg server 1.20+ AFAIK), just retrying drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on failure should be sufficient. This would break other drivers. I think this could be done in a way that wouldn't, though it wouldn't be quite as simple. Let's see what the true root cause is first though. Still need to verify as I'm having trouble wrangling my Xorg build at the moment and I'm pressed for time. A more complete fix would be quite involved, as modesetting isn't really properly plumbed to validate GBM's modifiers against KMS planes, and it doesn't seem like GBM/Mesa/DRI should be responsible for this as noted above given the general modifier workflow/design. Most importantly, options I've considered for fixing from the kernel side: -Accept "legacy" modifiers in nouveau in addition to the new modifiers, though avoid reporting them to userspace as supported to avoid further proliferation. This is pretty straightforward. I'll need to modify both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set plane validation logic (nv50_plane_format_mod_supported), but it should end up just being a few lines of code. I do think that they should also be reported to userspace if they are accepted. Other users can and do look at the modifier list to see if the buffer is acceptable for a given plane, so the consistency is good here. Of course, in Mesa you would want to prioritise the new modifiers over the legacy ones, and not allocate or return the legacy ones unless that was all you were asked for. This would involve tracking the used modifier explicitly through Mesa, rather than throwing it away at alloc time and then later divining it from the tiling mode. Reporting them as supported is equivalent to reporting support for a memory layout the chips don't actually support (It corresponds to a valid layout on Tegra chips, but not on discrete NV chips). This is what the new modifier
Re: [git pull] drm for 5.8-rc1
OK, I think I see what's going on. In the Xorg modesetting driver, the logic is basically: if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) { drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm)); } else { drmModeAddFB(...); } There's no attempt to verify the DRM-KMS device supports the modifier, but then, why would there be? GBM presumably chose a supported modifier at buffer creation time, and we don't know which plane the FB is going to be used with yet. GBM doesn't actually ask the kernel which modifiers it supports here either though. It just goes into Mesa via DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa just hard-codes the modifiers in its driver backends since its thinking in terms of a device's 3D engine, not display. In theory, Mesa's DRI drivers could query KMS for supported modifiers if allocating from GBM using the non-modifiers path and the SCANOUT flag is set (perhaps some drivers do this or its equivalent? Haven't checked.), but that seems pretty gnarly and doesn't fix the modifier-based GBM allocation path AFAIK. Bit of a mess. For a quick userspace fix that could probably be pushed out everywhere (Only affects Xorg server 1.20+ AFAIK), just retrying drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on failure should be sufficient. Still need to verify as I'm having trouble wrangling my Xorg build at the moment and I'm pressed for time. A more complete fix would be quite involved, as modesetting isn't really properly plumbed to validate GBM's modifiers against KMS planes, and it doesn't seem like GBM/Mesa/DRI should be responsible for this as noted above given the general modifier workflow/design. Most importantly, options I've considered for fixing from the kernel side: -Accept "legacy" modifiers in nouveau in addition to the new modifiers, though avoid reporting them to userspace as supported to avoid further proliferation. This is pretty straightforward. I'll need to modify both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set plane validation logic (nv50_plane_format_mod_supported), but it should end up just being a few lines of code. -Don't validate modifiers in AddFB. This doesn't really gain anything because it just pushes the failure down to mode set time, so it's not that useful, so I don't plan on pursuing this. As noted, need to run just now, but I should have a kernel patch to test out either tonight or tomorrow. If anyone's curious, the reason my testing missed this was I did most of my verification of "old" code against the Xorg 1.19 build included with my distro. I did hack up a Xorg 1.20-ish build to test as well that would have included this path, but I must not have properly configured it with GBM modifier support somehow. I was pretty focused on just testing the forcibly-disabled atomic path in the modesetting driver in this build, so I didn't look too closely at things beyond that. Thanks, -James On 7/1/20 12:59 AM, Kirill A. Shutemov wrote: On Wed, Jul 01, 2020 at 10:57:19AM +0300, Kirill A. Shutemov wrote: On Tue, Jun 30, 2020 at 09:40:19PM -0700, James Jones wrote: This implies something is trying to use one of the old DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without first checking whether it is supported by the kernel. I had tried to force an Xorg+Mesa stack without my userspace patches to hit this error when testing, but must have missed some permutation. If the stalled Mesa patches go in, this would stop happening of course, but those were held up for a long time in review, and are now waiting on me to make some modifications. Are you using the modesetting driver in X? If so, with glamor I presume? Yes and yes. I attached Xorg.log. Attached now. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
On 7/1/20 10:04 AM, Karol Herbst wrote: On Wed, Jul 1, 2020 at 6:01 PM Daniel Vetter wrote: On Wed, Jul 1, 2020 at 5:51 PM James Jones wrote: On 7/1/20 4:24 AM, Karol Herbst wrote: On Wed, Jul 1, 2020 at 6:45 AM James Jones wrote: This implies something is trying to use one of the old DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without first checking whether it is supported by the kernel. I had tried to force an Xorg+Mesa stack without my userspace patches to hit this error when testing, but must have missed some permutation. If the stalled Mesa patches go in, this would stop happening of course, but those were held up for a long time in review, and are now waiting on me to make some modifications. that's completely irrelevant. If a kernel change breaks userspace, it's a kernel bug. Agreed it is unacceptable to break userspace, but I don't think it's irrelevant. Perhaps the musings on pending userspace patches are. My intent here was to point out it appears at first glance that something isn't behaving as expected in userspace, so fixing this would likely require some sort of work-around for broken userspace rather than straight-forward fixing of a bug in the kernel logic. My intent was not to shift blame to something besides my code & testing for the regression, though I certainly see how it could be interpreted that way. Regardless, I'm looking in to it. I assume the MR you were talking about is https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 ? Correct. I am also aware of the tegra driver being broken on my jetson nano and I am now curious if this MR could fix this bug as well... and sorry for the harsh reply, I was just a annoyed by the fact that "everything modifier related is just breaking things", first tegra and that nobody is looking into fixing it and then apparently the userspace code being quite broken as well :/ Anyway, yeah I trust you guys on figuring out the keeping "broken" userspace happy from a kernel side and maybe I can help out with reviewing the mesa bits. I am just wondering if it could help with the tegra situation giving me more reasons to look into it as this would solve other issues I should be working on :) Not sure if you're claiming this, but if there's Tegra breakage attributable to this patch series, I'd love to hear more details there as well. The Tegra patches did have backwards-compat code to handle the old modifiers, since Tegra was the only working use case I could find for them within the kernel itself. However, the Tegra kernel patches are independent (and haven't even been reviewed yet to my knowledge), so Tegra shouldn't be affected at all given it uses TegraDRM rather than Nouveau's modesetting driver. If there are just general existing issues with modifier support on Tegra, let's take that to a smaller venue. I probably won't be as much help there, but I can at least try to help get some eyes on it. Thanks, -James If we do need to have a kernel workaround I'm happy to help out, I've done a bunch of these and occasionally it's good to get rather creative :-) Ideally we'd also push a minimal fix in userspace to all stable branches and make sure distros upgrade (might need releases if some distro is stuck on old horrors), so that we don't have to keep the hack in place for 10+ years or so. Definitely if the hack amounts to disabling modifiers on nouveau, that would be kinda sad. -Daniel Thanks, -James Are you using the modesetting driver in X? If so, with glamor I presume? What version of Mesa? Any distro patches? Any non-default xorg.conf options that would affect modesetting, your X driver if it isn't modesetting, or glamour? Thanks, -James On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: James Jones (4): ... drm/nouveau/kms: Support NVIDIA format modifiers This commit is the first one that breaks Xorg startup for my setup: GTX 1080 + Dell UP2414Q (4K DP MST monitor). I believe this is the crucial part of dmesg (full dmesg is attached): [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x314 [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 Any suggestions? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel _
Re: [git pull] drm for 5.8-rc1
On 7/1/20 4:24 AM, Karol Herbst wrote: On Wed, Jul 1, 2020 at 6:45 AM James Jones wrote: This implies something is trying to use one of the old DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without first checking whether it is supported by the kernel. I had tried to force an Xorg+Mesa stack without my userspace patches to hit this error when testing, but must have missed some permutation. If the stalled Mesa patches go in, this would stop happening of course, but those were held up for a long time in review, and are now waiting on me to make some modifications. that's completely irrelevant. If a kernel change breaks userspace, it's a kernel bug. Agreed it is unacceptable to break userspace, but I don't think it's irrelevant. Perhaps the musings on pending userspace patches are. My intent here was to point out it appears at first glance that something isn't behaving as expected in userspace, so fixing this would likely require some sort of work-around for broken userspace rather than straight-forward fixing of a bug in the kernel logic. My intent was not to shift blame to something besides my code & testing for the regression, though I certainly see how it could be interpreted that way. Regardless, I'm looking in to it. Thanks, -James Are you using the modesetting driver in X? If so, with glamor I presume? What version of Mesa? Any distro patches? Any non-default xorg.conf options that would affect modesetting, your X driver if it isn't modesetting, or glamour? Thanks, -James On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: James Jones (4): ... drm/nouveau/kms: Support NVIDIA format modifiers This commit is the first one that breaks Xorg startup for my setup: GTX 1080 + Dell UP2414Q (4K DP MST monitor). I believe this is the crucial part of dmesg (full dmesg is attached): [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x314 [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 Any suggestions? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.8-rc1
This implies something is trying to use one of the old DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without first checking whether it is supported by the kernel. I had tried to force an Xorg+Mesa stack without my userspace patches to hit this error when testing, but must have missed some permutation. If the stalled Mesa patches go in, this would stop happening of course, but those were held up for a long time in review, and are now waiting on me to make some modifications. Are you using the modesetting driver in X? If so, with glamor I presume? What version of Mesa? Any distro patches? Any non-default xorg.conf options that would affect modesetting, your X driver if it isn't modesetting, or glamour? Thanks, -James On 6/30/20 4:08 PM, Kirill A. Shutemov wrote: On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote: James Jones (4): ... drm/nouveau/kms: Support NVIDIA format modifiers This commit is the first one that breaks Xorg startup for my setup: GTX 1080 + Dell UP2414Q (4K DP MST monitor). I believe this is the crucial part of dmesg (full dmesg is attached): [ 29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 0x314 [ 29.997143] [drm:drm_internal_framebuffer_create] could not create framebuffer [ 29.997145] [drm:drm_ioctl] pid=3393, ret = -22 Any suggestions? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer
On 2/10/20 3:35 PM, Ben Skeggs wrote: On Tue, 11 Feb 2020 at 09:17, James Jones wrote: On 2/10/20 12:25 AM, Thomas Zimmermann wrote: Hi Am 10.02.20 um 09:20 schrieb Ben Skeggs: On Sat, 8 Feb 2020 at 07:10, James Jones wrote: I've sent out a v4 version of the format modifier patches which avoid caching values in the nouveau_framebuffer struct. It will have a few trivial conflicts with your series, but should make them structurally compatible. I'm fine with either v3 or v4 of my series personally, but if these cleanup patches are taken, only v4 will work. I've taken Tomas' cleanup patches in my tree, and will take James' also once they've been fixed up to work on top of the cleanup. Thanks! After applying this series locally, I'm hitting a NULL deref loading the nouveau module with fbconsole caused by patch 3/4. I've sent out a trivial fix for review separately. Please have a look, and Ben, feel free to squash it with Thomas's original patch if you prefer. Oops. Squashed! James, are you happy for me to take the drm_fourcc.h patch that's on dri-devel through my tree for the next merge window too? Yes, that would be great. I couldn't find a public version of your tree with Thomas's patches applied, but I pulled them in locally and rebased my series on top of that as v5, resolving all the remaining trivial conflicts. Appologies for all the patch spam this generated. I've pulled in your patches now too. Awesome. Thanks! -James Thank you! Ben. Thanks, -James Ben. Thanks, -James On 2/6/20 8:45 AM, James Jones wrote: Yes, that's certainly viable. If that's the general preference in direction, I'll rework that patches to do so. Thanks, -James On 2/6/20 7:49 AM, Thomas Zimmermann wrote: Hi James Am 06.02.20 um 16:17 schrieb James Jones: Note I'm adding some fields to nouveau_framebuffer in the series "drm/nouveau: Support NVIDIA format modifiers." I sent out v3 of that yesterday. It would probably still be possible to avoid them by re-extracting the relevant data from the format modifier on the fly when needed, but it is simpler and likely less error-prone with the wrapper struct. Thanks for the note. I just took a look at your patchset. I think struct nouveau_framebuffer should not store tile_mode and kind. AFAICT there are only two trivial places where these values are used and they can be extracted from the framebuffer at any time. I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and return the correct values. Kind of what you do in nouveau_framebuffer_new() near line 330. Thoughts? Best regards Thomas [1] https://patchwork.freedesktop.org/series/70786/#rev3 Thanks, -James On 2/6/20 2:19 AM, Thomas Zimmermann wrote: After its cleanup, struct nouveau_framebuffer is only a wrapper around struct drm_framebuffer. Use the latter directly. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 26 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++-- drivers/gpu/drm/nouveau/nouveau_display.h | 12 +-- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 14 ++-- 4 files changed, 28 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ba1399965a1c..4a67a656e007 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma) } static struct nv50_wndw_ctxdma * -nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) +nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { -struct nouveau_drm *drm = nouveau_drm(fb->base.dev); +struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; -struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); +struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); const u8kind = nvbo->kind; const u32 handle = 0xfb00 | kind; struct { @@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, struct nv50_wndw_atom *asyw, struct nv50_head_atom *asyh) { -struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); +struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); -struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); +struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); -if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { -asyw->image.w = fb->base.width; -asyw->image.h = fb->base.height; +if (fb != armw-&g
Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer
On 2/10/20 12:25 AM, Thomas Zimmermann wrote: Hi Am 10.02.20 um 09:20 schrieb Ben Skeggs: On Sat, 8 Feb 2020 at 07:10, James Jones wrote: I've sent out a v4 version of the format modifier patches which avoid caching values in the nouveau_framebuffer struct. It will have a few trivial conflicts with your series, but should make them structurally compatible. I'm fine with either v3 or v4 of my series personally, but if these cleanup patches are taken, only v4 will work. I've taken Tomas' cleanup patches in my tree, and will take James' also once they've been fixed up to work on top of the cleanup. Thanks! After applying this series locally, I'm hitting a NULL deref loading the nouveau module with fbconsole caused by patch 3/4. I've sent out a trivial fix for review separately. Please have a look, and Ben, feel free to squash it with Thomas's original patch if you prefer. James, are you happy for me to take the drm_fourcc.h patch that's on dri-devel through my tree for the next merge window too? Yes, that would be great. I couldn't find a public version of your tree with Thomas's patches applied, but I pulled them in locally and rebased my series on top of that as v5, resolving all the remaining trivial conflicts. Appologies for all the patch spam this generated. Thanks, -James Ben. Thanks, -James On 2/6/20 8:45 AM, James Jones wrote: Yes, that's certainly viable. If that's the general preference in direction, I'll rework that patches to do so. Thanks, -James On 2/6/20 7:49 AM, Thomas Zimmermann wrote: Hi James Am 06.02.20 um 16:17 schrieb James Jones: Note I'm adding some fields to nouveau_framebuffer in the series "drm/nouveau: Support NVIDIA format modifiers." I sent out v3 of that yesterday. It would probably still be possible to avoid them by re-extracting the relevant data from the format modifier on the fly when needed, but it is simpler and likely less error-prone with the wrapper struct. Thanks for the note. I just took a look at your patchset. I think struct nouveau_framebuffer should not store tile_mode and kind. AFAICT there are only two trivial places where these values are used and they can be extracted from the framebuffer at any time. I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and return the correct values. Kind of what you do in nouveau_framebuffer_new() near line 330. Thoughts? Best regards Thomas [1] https://patchwork.freedesktop.org/series/70786/#rev3 Thanks, -James On 2/6/20 2:19 AM, Thomas Zimmermann wrote: After its cleanup, struct nouveau_framebuffer is only a wrapper around struct drm_framebuffer. Use the latter directly. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 26 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++-- drivers/gpu/drm/nouveau/nouveau_display.h | 12 +-- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 14 ++-- 4 files changed, 28 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ba1399965a1c..4a67a656e007 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma) } static struct nv50_wndw_ctxdma * -nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) +nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { -struct nouveau_drm *drm = nouveau_drm(fb->base.dev); +struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; -struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); +struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); const u8kind = nvbo->kind; const u32 handle = 0xfb00 | kind; struct { @@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, struct nv50_wndw_atom *asyw, struct nv50_head_atom *asyh) { -struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); +struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); -struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); +struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); -if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { -asyw->image.w = fb->base.width; -asyw->image.h = fb->base.height; +if (fb != armw->state.fb || !armw->visible || modeset) { +asyw->image.w = fb->width; +asyw->image.h = fb->height; asyw->image.kind = nvbo->kind; ret = nv50_wndw_ato
[PATCH v5 3/3] drm/nouveau: Support NVIDIA format modifiers
Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against the list of modifiers supported by the target display hardware. v2: Used Tesla family instead of NV50 chipset compare v4: Do not cache kind, tile_mode in nouveau_framebuffer v5: Resolved against nouveau_framebuffer cleanup Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 20 +++-- drivers/gpu/drm/nouveau/nouveau_display.c | 89 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 4 + 3 files changed, 104 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 8d6ef70602e1..6821195d65b7 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -44,9 +44,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; - struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); - const u8kind = nvbo->kind; - const u32 handle = 0xfb00 | kind; + u32 handle; + u32 unused; + u8 kind; struct { struct nv_dma_v0 base; union { @@ -58,6 +58,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) u32 argc = sizeof(args.base); int ret; + nouveau_framebuffer_get_layout(fb, , ); + handle = 0xfb00 | kind; + list_for_each_entry(ctxdma, >ctxdma.list, head) { if (ctxdma->object.handle == handle) return ctxdma; @@ -238,15 +241,18 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, { struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); - struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); + uint8_t kind; + uint32_t tile_mode; int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); if (fb != armw->state.fb || !armw->visible || modeset) { + nouveau_framebuffer_get_layout(fb, _mode, ); + asyw->image.w = fb->width; asyw->image.h = fb->height; - asyw->image.kind = nvbo->kind; + asyw->image.kind = kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -258,9 +264,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = nvbo->mode >> 4; + asyw->image.blockh = tile_mode >> 4; else - asyw->image.blockh = nvbo->mode; + asyw->image.blockh = tile_mode; asyw->image.blocks[0] = fb->pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 3048a43a8d36..616c9e486efb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -203,6 +203,76 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = drm_gem_fb_create_handle, }; +static void +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + BUG_ON(!tile_mode || !kind); + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* +* Extract the block height and kind from the corresponding +* modifier fields. See drm_fourcc.h for details. +*/ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } +} + +void +nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, + uint32_t *tile_mode, + uint8_t *kind) +{ + if (fb->flags & DRM_MODE_FB_MODIFIERS) { + struct nouveau_drm *drm = nouveau_drm(fb->dev); + +
[PATCH v5 2/3] drm/nouveau: Check framebuffer size against bo
Make sure framebuffer dimensions and tiling parameters will not result in accesses beyond the end of the GEM buffer they are bound to. v3: Return EINVAL when creating FB against BO with unsupported tiling v5: Resolved against nouveau_framebuffer cleanup Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 98 +++ 1 file changed, 98 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 94f7fd48e1cf..3048a43a8d36 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -203,6 +203,76 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = drm_gem_fb_create_handle, }; +static inline uint32_t +nouveau_get_width_in_blocks(uint32_t stride) +{ + /* GOBs per block in the x direction is always one, and GOBs are +* 64 bytes wide +*/ + static const uint32_t log_block_width = 6; + + return (stride + (1 << log_block_width) - 1) >> log_block_width; +} + +static inline uint32_t +nouveau_get_height_in_blocks(struct nouveau_drm *drm, +uint32_t height, +uint32_t log_block_height_in_gobs) +{ + uint32_t log_gob_height; + uint32_t log_block_height; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + log_gob_height = 2; + else + log_gob_height = 3; + + log_block_height = log_block_height_in_gobs + log_gob_height; + + return (height + (1 << log_block_height) - 1) >> log_block_height; +} + +static int +nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo, + uint32_t offset, uint32_t stride, uint32_t h, + uint32_t tile_mode) +{ + uint32_t gob_size, bw, bh; + uint64_t bl_size; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.chipset >= 0xc0) { + if (tile_mode & 0xF) + return -EINVAL; + tile_mode >>= 4; + } + + if (tile_mode & 0xFFF0) + return -EINVAL; + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + gob_size = 256; + else + gob_size = 512; + + bw = nouveau_get_width_in_blocks(stride); + bh = nouveau_get_height_in_blocks(drm, h, tile_mode); + + bl_size = bw * bh * (1 << tile_mode) * gob_size; + + DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u gob_size=%u bl_size=%llu size=%lu\n", + offset, stride, h, tile_mode, bw, bh, gob_size, bl_size, + nvbo->bo.mem.size); + + if (bl_size + offset > nvbo->bo.mem.size) + return -ERANGE; + + return 0; +} + int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, @@ -210,7 +280,10 @@ nouveau_framebuffer_new(struct drm_device *dev, struct drm_framebuffer **pfb) { struct nouveau_drm *drm = nouveau_drm(dev); + struct nouveau_bo *nvbo = nouveau_gem_object(gem); struct drm_framebuffer *fb; + const struct drm_format_info *info; + unsigned int width, height, i; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -233,6 +306,31 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + info = drm_get_format_info(dev, mode_cmd); + + for (i = 0; i < info->num_planes; i++) { + width = drm_format_info_plane_width(info, + mode_cmd->width, + i); + height = drm_format_info_plane_height(info, + mode_cmd->height, + i); + + if (nvbo->kind) { + ret = nouveau_check_bl_size(drm, nvbo, + mode_cmd->offsets[i], + mode_cmd->pitches[i], + height, nvbo->mode); + if (ret) + return ret; + } else { + uint32_t size = mode_cmd->pitches[i] * height; + + if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size) + return -ERANGE; + } + } + if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL)
[PATCH v5 0/3] drm/nouveau: Support NVIDIA format modifiers
This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available on the Mesa-dev gitlab merge request 3724: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3724 I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. v2: Used Tesla family instead of NV50 chipset compare to avoid treating oddly numbered NV4x-class chipsets as NV50+ GPUs. Other instances of compares with chipset number in the series were audited, deemed safe, and left as-is for consistency with existing code. v3: -Rebased on nouveau linux-5.6 @ 137c4ba7163ad9d5696b9fde78b1c0898a9c115b -Noted corresponding Mesa patches are production-worthy now -Better validate bo tile_mode when checking framebuffer size. v4: Do not cache kind, tile_mode in nouveau_framebuffer v5: Resolved against nouveau_framebuffer cleanup James Jones (3): drm/nouveau: Add format mod prop to base/ovly/nvdisp drm/nouveau: Check framebuffer size against bo drm/nouveau: Support NVIDIA format modifiers drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +++ drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 + drivers/gpu/drm/nouveau/dispnv50/wndw.c | 47 - drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.c | 183 drivers/gpu/drm/nouveau/nouveau_display.h | 6 + 7 files changed, 312 insertions(+), 11 deletions(-) -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v5 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp
Advertise support for the full list of format modifiers supported by each class of NVIDIA desktop GPU display hardware. Stash the array of modifiers in the nouveau_display struct for use when validating userspace framebuffer creation requests, which will be supportd in a subsequent change. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +-- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 + drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 ++ drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +- drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c b/drivers/gpu/drm/nouveau/dispnv50/base507c.c index 00a85f1e1a4a..025b8f996a0a 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c @@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, struct nv50_disp_base_channel_dma_v0 args = { .head = head, }; - struct nv50_disp *disp = nv50_disp(drm->dev); + struct nouveau_display *disp = nouveau_display(drm->dev); + struct nv50_disp *disp50 = nv50_disp(drm->dev); struct nv50_wndw *wndw; int ret; @@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, if (*pwndw = wndw, ret) return ret; - ret = nv50_dmac_create(>client.device, >disp->object, + ret = nv50_dmac_create(>client.device, >disp.object, , head, , sizeof(args), - disp->sync->bo.offset, >wndw); + disp50->sync->bo.offset, >wndw); if (ret) { NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret); return ret; diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index a3dc2ba19fb2..f017d05072b8 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2481,6 +2481,15 @@ nv50_display_create(struct drm_device *dev) if (ret) goto out; + /* Assign the correct format modifiers */ + if (disp->disp->object.oclass >= TU102_DISP) + nouveau_display(dev)->format_modifiers = wndwc57e_modifiers; + else + if (disp->disp->object.oclass >= GF110_DISP) + nouveau_display(dev)->format_modifiers = disp90xx_modifiers; + else + nouveau_display(dev)->format_modifiers = disp50xx_modifiers; + /* create crtc objects to represent the hw heads */ if (disp->disp->object.oclass >= GV100_DISP) crtcs = nvif_rd32(>object, 0x610060) & 0xff; @@ -2576,3 +2585,53 @@ nv50_display_create(struct drm_device *dev) nv50_display_destroy(dev); return ret; } + +/** + * Format modifiers + */ + +/ + *Log2(block height) + * + *Page Kind --+ | * + *Gob Height/Page Kind Generation --+ | | * + * Sector layout ---+ | | | * + * Compression --+ | | | | */ +const u64 disp50xx_modifiers[] = { /* | | | | | */ + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4), + DRM_FORMAT_MOD_NVIDIA_B
[PATCH] drm/nouveau: Fix NULL ptr access in nv50_wndw_prepare_fb()
This fixes a kernel oops when loading the nouveau module with fb console enabled after the change: drm/nouveau: Remove field nvbo from struct nouveau_framebuffer state->fb may be NULL in nv50_wndw_prepare_fb(), so defer initializing nvbo from its obj[] array until after the NULL check. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 4a67a656e007..68c0dc2dc2d3 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -490,7 +490,7 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) struct nouveau_drm *drm = nouveau_drm(plane->dev); struct nv50_wndw *wndw = nv50_wndw(plane); struct nv50_wndw_atom *asyw = nv50_wndw_atom(state); - struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); + struct nouveau_bo *nvbo; struct nv50_head_atom *asyh; struct nv50_wndw_ctxdma *ctxdma; int ret; @@ -499,6 +499,7 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) if (!asyw->state.fb) return 0; + nvbo = nouveau_gem_object(fb->obj[0]); ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_VRAM, true); if (ret) return ret; -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer
I've sent out a v4 version of the format modifier patches which avoid caching values in the nouveau_framebuffer struct. It will have a few trivial conflicts with your series, but should make them structurally compatible. I'm fine with either v3 or v4 of my series personally, but if these cleanup patches are taken, only v4 will work. Thanks, -James On 2/6/20 8:45 AM, James Jones wrote: Yes, that's certainly viable. If that's the general preference in direction, I'll rework that patches to do so. Thanks, -James On 2/6/20 7:49 AM, Thomas Zimmermann wrote: Hi James Am 06.02.20 um 16:17 schrieb James Jones: Note I'm adding some fields to nouveau_framebuffer in the series "drm/nouveau: Support NVIDIA format modifiers." I sent out v3 of that yesterday. It would probably still be possible to avoid them by re-extracting the relevant data from the format modifier on the fly when needed, but it is simpler and likely less error-prone with the wrapper struct. Thanks for the note. I just took a look at your patchset. I think struct nouveau_framebuffer should not store tile_mode and kind. AFAICT there are only two trivial places where these values are used and they can be extracted from the framebuffer at any time. I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and return the correct values. Kind of what you do in nouveau_framebuffer_new() near line 330. Thoughts? Best regards Thomas [1] https://patchwork.freedesktop.org/series/70786/#rev3 Thanks, -James On 2/6/20 2:19 AM, Thomas Zimmermann wrote: After its cleanup, struct nouveau_framebuffer is only a wrapper around struct drm_framebuffer. Use the latter directly. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 26 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++-- drivers/gpu/drm/nouveau/nouveau_display.h | 12 +-- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 14 ++-- 4 files changed, 28 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ba1399965a1c..4a67a656e007 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma) } static struct nv50_wndw_ctxdma * -nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) +nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { - struct nouveau_drm *drm = nouveau_drm(fb->base.dev); + struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); const u8 kind = nvbo->kind; const u32 handle = 0xfb00 | kind; struct { @@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, struct nv50_wndw_atom *asyw, struct nv50_head_atom *asyh) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); + struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); - if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { - asyw->image.w = fb->base.width; - asyw->image.h = fb->base.height; + if (fb != armw->state.fb || !armw->visible || modeset) { + asyw->image.w = fb->width; + asyw->image.h = fb->height; asyw->image.kind = nvbo->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); @@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, asyw->image.blockh = nvbo->mode >> 4; else asyw->image.blockh = nvbo->mode; - asyw->image.blocks[0] = fb->base.pitches[0] / 64; + asyw->image.blocks[0] = fb->pitches[0] / 64; asyw->image.pitch[0] = 0; } else { asyw->image.layout = 1; asyw->image.blockh = 0; asyw->image.blocks[0] = 0; - asyw->image.pitch[0] = fb->base.pitches[0]; + asyw->image.pitch[0] = fb->pitches[0]; } if (!asyh->state.async_flip) @@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state) static int nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) { - struct nouveau_framebuffer
[PATCH v4 2/3] drm/nouveau: Check framebuffer size against bo
Make sure framebuffer dimensions and tiling parameters will not result in accesses beyond the end of the GEM buffer they are bound to. v3: Return EINVAL when creating FB against BO with unsupported tiling Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 97 +++ 1 file changed, 97 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 53f9bceaf17a..4273d9387cda 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,76 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static inline uint32_t +nouveau_get_width_in_blocks(uint32_t stride) +{ + /* GOBs per block in the x direction is always one, and GOBs are +* 64 bytes wide +*/ + static const uint32_t log_block_width = 6; + + return (stride + (1 << log_block_width) - 1) >> log_block_width; +} + +static inline uint32_t +nouveau_get_height_in_blocks(struct nouveau_drm *drm, +uint32_t height, +uint32_t log_block_height_in_gobs) +{ + uint32_t log_gob_height; + uint32_t log_block_height; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + log_gob_height = 2; + else + log_gob_height = 3; + + log_block_height = log_block_height_in_gobs + log_gob_height; + + return (height + (1 << log_block_height) - 1) >> log_block_height; +} + +static int +nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo, + uint32_t offset, uint32_t stride, uint32_t h, + uint32_t tile_mode) +{ + uint32_t gob_size, bw, bh; + uint64_t bl_size; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.chipset >= 0xc0) { + if (tile_mode & 0xF) + return -EINVAL; + tile_mode >>= 4; + } + + if (tile_mode & 0xFFF0) + return -EINVAL; + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + gob_size = 256; + else + gob_size = 512; + + bw = nouveau_get_width_in_blocks(stride); + bh = nouveau_get_height_in_blocks(drm, h, tile_mode); + + bl_size = bw * bh * (1 << tile_mode) * gob_size; + + DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u gob_size=%u bl_size=%llu size=%lu\n", + offset, stride, h, tile_mode, bw, bh, gob_size, bl_size, + nvbo->bo.mem.size); + + if (bl_size + offset > nvbo->bo.mem.size) + return -ERANGE; + + return 0; +} + int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, @@ -232,6 +302,8 @@ nouveau_framebuffer_new(struct drm_device *dev, { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_framebuffer *fb; + const struct drm_format_info *info; + unsigned int width, height, i; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -254,6 +326,31 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + info = drm_get_format_info(dev, mode_cmd); + + for (i = 0; i < info->num_planes; i++) { + width = drm_format_info_plane_width(info, + mode_cmd->width, + i); + height = drm_format_info_plane_height(info, + mode_cmd->height, + i); + + if (nvbo->kind) { + ret = nouveau_check_bl_size(drm, nvbo, + mode_cmd->offsets[i], + mode_cmd->pitches[i], + height, nvbo->mode); + if (ret) + return ret; + } else { + uint32_t size = mode_cmd->pitches[i] * height; + + if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size) + return -ERANGE; + } + } + if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL))) return -ENOMEM; -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v4 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp
Advertise support for the full list of format modifiers supported by each class of NVIDIA desktop GPU display hardware. Stash the array of modifiers in the nouveau_display struct for use when validating userspace framebuffer creation requests, which will be supportd in a subsequent change. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +-- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 + drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 ++ drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +- drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c b/drivers/gpu/drm/nouveau/dispnv50/base507c.c index 00a85f1e1a4a..025b8f996a0a 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c @@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, struct nv50_disp_base_channel_dma_v0 args = { .head = head, }; - struct nv50_disp *disp = nv50_disp(drm->dev); + struct nouveau_display *disp = nouveau_display(drm->dev); + struct nv50_disp *disp50 = nv50_disp(drm->dev); struct nv50_wndw *wndw; int ret; @@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, if (*pwndw = wndw, ret) return ret; - ret = nv50_dmac_create(>client.device, >disp->object, + ret = nv50_dmac_create(>client.device, >disp.object, , head, , sizeof(args), - disp->sync->bo.offset, >wndw); + disp50->sync->bo.offset, >wndw); if (ret) { NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret); return ret; diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index a3dc2ba19fb2..f017d05072b8 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2481,6 +2481,15 @@ nv50_display_create(struct drm_device *dev) if (ret) goto out; + /* Assign the correct format modifiers */ + if (disp->disp->object.oclass >= TU102_DISP) + nouveau_display(dev)->format_modifiers = wndwc57e_modifiers; + else + if (disp->disp->object.oclass >= GF110_DISP) + nouveau_display(dev)->format_modifiers = disp90xx_modifiers; + else + nouveau_display(dev)->format_modifiers = disp50xx_modifiers; + /* create crtc objects to represent the hw heads */ if (disp->disp->object.oclass >= GV100_DISP) crtcs = nvif_rd32(>object, 0x610060) & 0xff; @@ -2576,3 +2585,53 @@ nv50_display_create(struct drm_device *dev) nv50_display_destroy(dev); return ret; } + +/** + * Format modifiers + */ + +/ + *Log2(block height) + * + *Page Kind --+ | * + *Gob Height/Page Kind Generation --+ | | * + * Sector layout ---+ | | | * + * Compression --+ | | | | */ +const u64 disp50xx_modifiers[] = { /* | | | | | */ + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4), + DRM_FORMAT_MOD_NVIDIA_B
[PATCH v4 0/3] drm/nouveau: Support NVIDIA format modifiers
This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available on the Mesa-dev gitlab merge request 3724: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3724 I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. v2: Used Tesla family instead of NV50 chipset compare to avoid treating oddly numbered NV4x-class chipsets as NV50+ GPUs. Other instances of compares with chipset number in the series were audited, deemed safe, and left as-is for consistency with existing code. v3: -Rebased on nouveau linux-5.6 @ 137c4ba7163ad9d5696b9fde78b1c0898a9c115b -Noted corresponding Mesa patches are production-worthy now -Better validate bo tile_mode when checking framebuffer size. v4: Do not cache kind, tile_mode in nouveau_framebuffer James Jones (3): drm/nouveau: Add format mod prop to base/ovly/nvdisp drm/nouveau: Check framebuffer size against bo drm/nouveau: Support NVIDIA format modifiers drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +++ drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 + drivers/gpu/drm/nouveau/dispnv50/wndw.c | 45 - drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.c | 183 drivers/gpu/drm/nouveau/nouveau_display.h | 6 + 7 files changed, 312 insertions(+), 9 deletions(-) -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v4 3/3] drm/nouveau: Support NVIDIA format modifiers
Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against the list of modifiers supported by the target display hardware. v2: Used Tesla family instead of NV50 chipset compare v4: Do not cache kind, tile_mode in nouveau_framebuffer Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 18 +++-- drivers/gpu/drm/nouveau/nouveau_display.c | 90 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 4 + 3 files changed, 105 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index a424ecfdf8e9..064e8825d451 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -43,8 +43,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->base.dev); struct nv50_wndw_ctxdma *ctxdma; - const u8kind = fb->nvbo->kind; - const u32 handle = 0xfb00 | kind; + u32 handle; + u32 unused; + u8 kind; struct { struct nv_dma_v0 base; union { @@ -56,6 +57,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) u32 argc = sizeof(args.base); int ret; + nouveau_framebuffer_get_layout(>base, , ); + handle = 0xfb00 | kind; + list_for_each_entry(ctxdma, >ctxdma.list, head) { if (ctxdma->object.handle == handle) return ctxdma; @@ -236,14 +240,18 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, { struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); + uint8_t kind; + uint32_t tile_mode; int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { + nouveau_framebuffer_get_layout(>base, _mode, ); + asyw->image.w = fb->base.width; asyw->image.h = fb->base.height; - asyw->image.kind = fb->nvbo->kind; + asyw->image.kind = kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -255,9 +263,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = fb->nvbo->mode >> 4; + asyw->image.blockh = tile_mode >> 4; else - asyw->image.blockh = fb->nvbo->mode; + asyw->image.blockh = tile_mode; asyw->image.blocks[0] = fb->base.pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 4273d9387cda..da8319182cf0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,76 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static void +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + BUG_ON(!tile_mode || !kind); + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* +* Extract the block height and kind from the corresponding +* modifier fields. See drm_fourcc.h for details. +*/ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } +} + +void +nouveau_framebuffer_get_layout(struct drm_framebuffer *fb, + uint32_t *tile_mode, + uint8_t *kind) +{ + if (fb->flags & DRM_MODE_FB_MODIFIERS) { + struct nouveau_drm *drm = nouveau_drm(fb->dev); + + nouveau_decode_mod(drm, fb->modifier, tile_mode, kind); + } else { +
Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer
Yes, that's certainly viable. If that's the general preference in direction, I'll rework that patches to do so. Thanks, -James On 2/6/20 7:49 AM, Thomas Zimmermann wrote: Hi James Am 06.02.20 um 16:17 schrieb James Jones: Note I'm adding some fields to nouveau_framebuffer in the series "drm/nouveau: Support NVIDIA format modifiers." I sent out v3 of that yesterday. It would probably still be possible to avoid them by re-extracting the relevant data from the format modifier on the fly when needed, but it is simpler and likely less error-prone with the wrapper struct. Thanks for the note. I just took a look at your patchset. I think struct nouveau_framebuffer should not store tile_mode and kind. AFAICT there are only two trivial places where these values are used and they can be extracted from the framebuffer at any time. I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and return the correct values. Kind of what you do in nouveau_framebuffer_new() near line 330. Thoughts? Best regards Thomas [1] https://patchwork.freedesktop.org/series/70786/#rev3 Thanks, -James On 2/6/20 2:19 AM, Thomas Zimmermann wrote: After its cleanup, struct nouveau_framebuffer is only a wrapper around struct drm_framebuffer. Use the latter directly. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 26 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++-- drivers/gpu/drm/nouveau/nouveau_display.h | 12 +-- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 14 ++-- 4 files changed, 28 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ba1399965a1c..4a67a656e007 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma) } static struct nv50_wndw_ctxdma * -nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) +nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { - struct nouveau_drm *drm = nouveau_drm(fb->base.dev); + struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); const u8 kind = nvbo->kind; const u32 handle = 0xfb00 | kind; struct { @@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, struct nv50_wndw_atom *asyw, struct nv50_head_atom *asyh) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); + struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); - if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { - asyw->image.w = fb->base.width; - asyw->image.h = fb->base.height; + if (fb != armw->state.fb || !armw->visible || modeset) { + asyw->image.w = fb->width; + asyw->image.h = fb->height; asyw->image.kind = nvbo->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); @@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, asyw->image.blockh = nvbo->mode >> 4; else asyw->image.blockh = nvbo->mode; - asyw->image.blocks[0] = fb->base.pitches[0] / 64; + asyw->image.blocks[0] = fb->pitches[0] / 64; asyw->image.pitch[0] = 0; } else { asyw->image.layout = 1; asyw->image.blockh = 0; asyw->image.blocks[0] = 0; - asyw->image.pitch[0] = fb->base.pitches[0]; + asyw->image.pitch[0] = fb->pitches[0]; } if (!asyh->state.async_flip) @@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state) static int nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(state->fb); + struct drm_framebuffer *fb = state->fb; struct nouveau_drm *drm = nouveau_drm(plane->dev); struct nv50_wndw *wndw = nv50_wndw(plane); struct nv50_wndw_atom *asyw = nv50_wndw_atom(state); - struct nouveau_bo *nvbo = nouveau_gem_object(state->fb->obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer
Yes, that's certainly viable. If that's the general preference in direction, I'll rework that patches to do so. Thanks, -James On 2/6/20 7:49 AM, Thomas Zimmermann wrote: Hi James Am 06.02.20 um 16:17 schrieb James Jones: Note I'm adding some fields to nouveau_framebuffer in the series "drm/nouveau: Support NVIDIA format modifiers." I sent out v3 of that yesterday. It would probably still be possible to avoid them by re-extracting the relevant data from the format modifier on the fly when needed, but it is simpler and likely less error-prone with the wrapper struct. Thanks for the note. I just took a look at your patchset. I think struct nouveau_framebuffer should not store tile_mode and kind. AFAICT there are only two trivial places where these values are used and they can be extracted from the framebuffer at any time. I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and return the correct values. Kind of what you do in nouveau_framebuffer_new() near line 330. Thoughts? Best regards Thomas [1] https://patchwork.freedesktop.org/series/70786/#rev3 Thanks, -James On 2/6/20 2:19 AM, Thomas Zimmermann wrote: After its cleanup, struct nouveau_framebuffer is only a wrapper around struct drm_framebuffer. Use the latter directly. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 26 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++-- drivers/gpu/drm/nouveau/nouveau_display.h | 12 +-- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 14 ++-- 4 files changed, 28 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ba1399965a1c..4a67a656e007 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma) } static struct nv50_wndw_ctxdma * -nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) +nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { - struct nouveau_drm *drm = nouveau_drm(fb->base.dev); + struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); const u8 kind = nvbo->kind; const u32 handle = 0xfb00 | kind; struct { @@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, struct nv50_wndw_atom *asyw, struct nv50_head_atom *asyh) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); + struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); - if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { - asyw->image.w = fb->base.width; - asyw->image.h = fb->base.height; + if (fb != armw->state.fb || !armw->visible || modeset) { + asyw->image.w = fb->width; + asyw->image.h = fb->height; asyw->image.kind = nvbo->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); @@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, asyw->image.blockh = nvbo->mode >> 4; else asyw->image.blockh = nvbo->mode; - asyw->image.blocks[0] = fb->base.pitches[0] / 64; + asyw->image.blocks[0] = fb->pitches[0] / 64; asyw->image.pitch[0] = 0; } else { asyw->image.layout = 1; asyw->image.blockh = 0; asyw->image.blocks[0] = 0; - asyw->image.pitch[0] = fb->base.pitches[0]; + asyw->image.pitch[0] = fb->pitches[0]; } if (!asyh->state.async_flip) @@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state) static int nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(state->fb); + struct drm_framebuffer *fb = state->fb; struct nouveau_drm *drm = nouveau_drm(plane->dev); struct nv50_wndw *wndw = nv50_wndw(plane); struct nv50_wndw_atom *asyw = nv50_wndw_atom(state); - struct nouveau_bo *nvbo = nouveau_gem_object(state->fb->obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer
Note I'm adding some fields to nouveau_framebuffer in the series "drm/nouveau: Support NVIDIA format modifiers." I sent out v3 of that yesterday. It would probably still be possible to avoid them by re-extracting the relevant data from the format modifier on the fly when needed, but it is simpler and likely less error-prone with the wrapper struct. Thanks, -James On 2/6/20 2:19 AM, Thomas Zimmermann wrote: After its cleanup, struct nouveau_framebuffer is only a wrapper around struct drm_framebuffer. Use the latter directly. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 26 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++-- drivers/gpu/drm/nouveau/nouveau_display.h | 12 +-- drivers/gpu/drm/nouveau/nouveau_fbcon.c | 14 ++-- 4 files changed, 28 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index ba1399965a1c..4a67a656e007 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma) } static struct nv50_wndw_ctxdma * -nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) +nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb) { - struct nouveau_drm *drm = nouveau_drm(fb->base.dev); + struct nouveau_drm *drm = nouveau_drm(fb->dev); struct nv50_wndw_ctxdma *ctxdma; - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); const u8kind = nvbo->kind; const u32 handle = 0xfb00 | kind; struct { @@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, struct nv50_wndw_atom *asyw, struct nv50_head_atom *asyh) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb); + struct drm_framebuffer *fb = asyw->state.fb; struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev); - struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); int ret; NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name); - if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { - asyw->image.w = fb->base.width; - asyw->image.h = fb->base.height; + if (fb != armw->state.fb || !armw->visible || modeset) { + asyw->image.w = fb->width; + asyw->image.h = fb->height; asyw->image.kind = nvbo->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); @@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, asyw->image.blockh = nvbo->mode >> 4; else asyw->image.blockh = nvbo->mode; - asyw->image.blocks[0] = fb->base.pitches[0] / 64; + asyw->image.blocks[0] = fb->pitches[0] / 64; asyw->image.pitch[0] = 0; } else { asyw->image.layout = 1; asyw->image.blockh = 0; asyw->image.blocks[0] = 0; - asyw->image.pitch[0] = fb->base.pitches[0]; + asyw->image.pitch[0] = fb->pitches[0]; } if (!asyh->state.async_flip) @@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane, struct drm_plane_state *old_state) static int nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) { - struct nouveau_framebuffer *fb = nouveau_framebuffer(state->fb); + struct drm_framebuffer *fb = state->fb; struct nouveau_drm *drm = nouveau_drm(plane->dev); struct nv50_wndw *wndw = nv50_wndw(plane); struct nv50_wndw_atom *asyw = nv50_wndw_atom(state); - struct nouveau_bo *nvbo = nouveau_gem_object(state->fb->obj[0]); + struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]); struct nv50_head_atom *asyh; struct nv50_wndw_ctxdma *ctxdma; int ret; - NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, state->fb); + NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, fb); if (!asyw->state.fb) return 0; diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index bbbff55eb5d5..94f7fd48e1cf 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -207,10 +207,10 @@ int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, struct drm_gem_object *gem, -
Re: [Nouveau] [PATCH v2 0/3] drm/nouveau: Support NVIDIA format modifiers
On 1/6/20 3:27 PM, Ben Skeggs wrote: On Tue, 7 Jan 2020 at 05:17, James Jones wrote: On 1/5/20 5:30 PM, Ben Skeggs wrote: On Tue, 17 Dec 2019 at 10:44, James Jones wrote: This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available here: https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work But those need a bit of cleanup before they're ready to submit. I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. v2: Used Tesla family instead of NV50 chipset compare to avoid treating oddly numbered NV4x-class chipsets as NV50+ GPUs. Other instances of compares with chipset number in the series were audited, deemed safe, and left as-is for consistency with existing code. Hey James, These look OK to me, with the minor issue I mentioned on one of the patches dealt with. I'll hold off merging anything until I get the go-ahead that the modifier definitions are definitely set in stone / userspace is ready for inclusion. Thanks for having a look. I'll try to get the userspace changes finalized soon. I think from the NV side, we consider the modifier definition itself (the v3 version of the patch) final, so if there's any stand-alone feedback from yourself or other drm/nouveau developers on that layout, we'd be eager to hear it. I don't want it rushed in, but we do have several projects blocked on getting that approved & committed. I assume the sequencing should be: * Fix the minor issue you identified here/complete review of nouveau kernel patches * Complete review of the related TegraDRM new modifier support patch * Finalize and complete review of userspace/Mesa nouveau modifier support patches * Get drm_fourcc.h updates committed * Get these patches and TegraDRM patches committed * Integrate final drm_fourcc.h to Mesa patches and get Mesa patches committed Does that sound right to you? Seems very reasonable! Thanks. I needed to do more cleanup than I expected (a rewrite in the end), but the corresponding Mesa patches are out for review now, and I've sent out v3 of this patchset to address the remaining issue raised here. Thanks, -James Ben. Thanks, -James Thanks, Ben. James Jones (3): drm/nouveau: Add format mod prop to base/ovly/nvdisp drm/nouveau: Check framebuffer size against bo drm/nouveau: Support NVIDIA format modifiers drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 + drivers/gpu/drm/nouveau/dispnv50/wndw.c | 35 - drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 154 drivers/gpu/drm/nouveau/nouveau_display.h | 4 + 7 files changed, 272 insertions(+), 8 deletions(-) -- 2.17.1 ___ Nouveau mailing list nouv...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3 3/3] drm/nouveau: Support NVIDIA format modifiers
Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against he list of modifiers supported by the target display hardware. v2: Used Tesla family instead of NV50 chipset compare Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 3 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index a424ecfdf8e9..0047ba710da0 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->base.dev); struct nv50_wndw_ctxdma *ctxdma; - const u8kind = fb->nvbo->kind; + const u8kind = fb->kind; const u32 handle = 0xfb00 | kind; struct { struct nv_dma_v0 base; @@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { asyw->image.w = fb->base.width; asyw->image.h = fb->base.height; - asyw->image.kind = fb->nvbo->kind; + asyw->image.kind = fb->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = fb->nvbo->mode >> 4; + asyw->image.blockh = fb->tile_mode >> 4; else - asyw->image.blockh = fb->nvbo->mode; + asyw->image.blockh = fb->tile_mode; asyw->image.blocks[0] = fb->base.pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 4273d9387cda..05bb077a9dd9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static int +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + struct nouveau_display *disp = nouveau_display(drm->dev); + int mod; + + BUG_ON(!tile_mode || !kind); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA) { + return -EINVAL; + } + + BUG_ON(!disp->format_modifiers); + + for (mod = 0; +(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(disp->format_modifiers[mod] != modifier); +mod++); + + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* +* Extract the block height and kind from the corresponding +* modifier fields. See drm_fourcc.h for details. +*/ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } + + return 0; +} + static inline uint32_t nouveau_get_width_in_blocks(uint32_t stride) { @@ -304,6 +348,8 @@ nouveau_framebuffer_new(struct drm_device *dev, struct nouveau_framebuffer *fb; const struct drm_format_info *info; unsigned int width, height, i; + uint32_t tile_mode; + uint8_t kind; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -326,6 +372,18 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) { + if (nouveau_decode_mod(drm, mode_cmd->modifier[0], _mode, +
[PATCH v3 0/3] drm/nouveau: Support NVIDIA format modifiers
This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available on the Mesa-dev mailing list as the series: nouveau: Improved format modifier support I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. v2: Used Tesla family instead of NV50 chipset compare to avoid treating oddly numbered NV4x-class chipsets as NV50+ GPUs. Other instances of compares with chipset number in the series were audited, deemed safe, and left as-is for consistency with existing code. v3: -Rebased on nouveau linux-5.6 @ 137c4ba7163ad9d5696b9fde78b1c0898a9c115b -Noted corresponding Mesa patches are production-worthy now -Better validate bo tile_mode when checking framebuffer size. James Jones (3): drm/nouveau: Add format mod prop to base/ovly/nvdisp drm/nouveau: Check framebuffer size against bo drm/nouveau: Support NVIDIA format modifiers drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 + drivers/gpu/drm/nouveau/dispnv50/wndw.c | 35 - drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 158 drivers/gpu/drm/nouveau/nouveau_display.h | 4 + 7 files changed, 276 insertions(+), 8 deletions(-) -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3 2/3] drm/nouveau: Check framebuffer size against bo
Make sure framebuffer dimensions and tiling parameters will not result in accesses beyond the end of the GEM buffer they are bound to. v3: Return EINVAL when creating FB against BO with unsupported tiling Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 97 +++ 1 file changed, 97 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 53f9bceaf17a..4273d9387cda 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,76 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static inline uint32_t +nouveau_get_width_in_blocks(uint32_t stride) +{ + /* GOBs per block in the x direction is always one, and GOBs are +* 64 bytes wide +*/ + static const uint32_t log_block_width = 6; + + return (stride + (1 << log_block_width) - 1) >> log_block_width; +} + +static inline uint32_t +nouveau_get_height_in_blocks(struct nouveau_drm *drm, +uint32_t height, +uint32_t log_block_height_in_gobs) +{ + uint32_t log_gob_height; + uint32_t log_block_height; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + log_gob_height = 2; + else + log_gob_height = 3; + + log_block_height = log_block_height_in_gobs + log_gob_height; + + return (height + (1 << log_block_height) - 1) >> log_block_height; +} + +static int +nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo, + uint32_t offset, uint32_t stride, uint32_t h, + uint32_t tile_mode) +{ + uint32_t gob_size, bw, bh; + uint64_t bl_size; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.chipset >= 0xc0) { + if (tile_mode & 0xF) + return -EINVAL; + tile_mode >>= 4; + } + + if (tile_mode & 0xFFF0) + return -EINVAL; + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + gob_size = 256; + else + gob_size = 512; + + bw = nouveau_get_width_in_blocks(stride); + bh = nouveau_get_height_in_blocks(drm, h, tile_mode); + + bl_size = bw * bh * (1 << tile_mode) * gob_size; + + DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u gob_size=%u bl_size=%llu size=%lu\n", + offset, stride, h, tile_mode, bw, bh, gob_size, bl_size, + nvbo->bo.mem.size); + + if (bl_size + offset > nvbo->bo.mem.size) + return -ERANGE; + + return 0; +} + int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, @@ -232,6 +302,8 @@ nouveau_framebuffer_new(struct drm_device *dev, { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_framebuffer *fb; + const struct drm_format_info *info; + unsigned int width, height, i; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -254,6 +326,31 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + info = drm_get_format_info(dev, mode_cmd); + + for (i = 0; i < info->num_planes; i++) { + width = drm_format_info_plane_width(info, + mode_cmd->width, + i); + height = drm_format_info_plane_height(info, + mode_cmd->height, + i); + + if (nvbo->kind) { + ret = nouveau_check_bl_size(drm, nvbo, + mode_cmd->offsets[i], + mode_cmd->pitches[i], + height, nvbo->mode); + if (ret) + return ret; + } else { + uint32_t size = mode_cmd->pitches[i] * height; + + if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size) + return -ERANGE; + } + } + if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL))) return -ENOMEM; -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v3 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp
Advertise support for the full list of format modifiers supported by each class of NVIDIA desktop GPU display hardware. Stash the array of modifiers in the nouveau_display struct for use when validating userspace framebuffer creation requests, which will be supportd in a subsequent change. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +-- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 + drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 ++ drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +- drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c b/drivers/gpu/drm/nouveau/dispnv50/base507c.c index 00a85f1e1a4a..025b8f996a0a 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c @@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, struct nv50_disp_base_channel_dma_v0 args = { .head = head, }; - struct nv50_disp *disp = nv50_disp(drm->dev); + struct nouveau_display *disp = nouveau_display(drm->dev); + struct nv50_disp *disp50 = nv50_disp(drm->dev); struct nv50_wndw *wndw; int ret; @@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, if (*pwndw = wndw, ret) return ret; - ret = nv50_dmac_create(>client.device, >disp->object, + ret = nv50_dmac_create(>client.device, >disp.object, , head, , sizeof(args), - disp->sync->bo.offset, >wndw); + disp50->sync->bo.offset, >wndw); if (ret) { NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret); return ret; diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index a3dc2ba19fb2..f017d05072b8 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2481,6 +2481,15 @@ nv50_display_create(struct drm_device *dev) if (ret) goto out; + /* Assign the correct format modifiers */ + if (disp->disp->object.oclass >= TU102_DISP) + nouveau_display(dev)->format_modifiers = wndwc57e_modifiers; + else + if (disp->disp->object.oclass >= GF110_DISP) + nouveau_display(dev)->format_modifiers = disp90xx_modifiers; + else + nouveau_display(dev)->format_modifiers = disp50xx_modifiers; + /* create crtc objects to represent the hw heads */ if (disp->disp->object.oclass >= GV100_DISP) crtcs = nvif_rd32(>object, 0x610060) & 0xff; @@ -2576,3 +2585,53 @@ nv50_display_create(struct drm_device *dev) nv50_display_destroy(dev); return ret; } + +/** + * Format modifiers + */ + +/ + *Log2(block height) + * + *Page Kind --+ | * + *Gob Height/Page Kind Generation --+ | | * + * Sector layout ---+ | | | * + * Compression --+ | | | | */ +const u64 disp50xx_modifiers[] = { /* | | | | | */ + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4), + DRM_FORMAT_MOD_NVIDIA_B
Re: [Nouveau] [PATCH v2 0/3] drm/nouveau: Support NVIDIA format modifiers
On 1/5/20 5:30 PM, Ben Skeggs wrote: On Tue, 17 Dec 2019 at 10:44, James Jones wrote: This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available here: https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work But those need a bit of cleanup before they're ready to submit. I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. v2: Used Tesla family instead of NV50 chipset compare to avoid treating oddly numbered NV4x-class chipsets as NV50+ GPUs. Other instances of compares with chipset number in the series were audited, deemed safe, and left as-is for consistency with existing code. Hey James, These look OK to me, with the minor issue I mentioned on one of the patches dealt with. I'll hold off merging anything until I get the go-ahead that the modifier definitions are definitely set in stone / userspace is ready for inclusion. Thanks for having a look. I'll try to get the userspace changes finalized soon. I think from the NV side, we consider the modifier definition itself (the v3 version of the patch) final, so if there's any stand-alone feedback from yourself or other drm/nouveau developers on that layout, we'd be eager to hear it. I don't want it rushed in, but we do have several projects blocked on getting that approved & committed. I assume the sequencing should be: * Fix the minor issue you identified here/complete review of nouveau kernel patches * Complete review of the related TegraDRM new modifier support patch * Finalize and complete review of userspace/Mesa nouveau modifier support patches * Get drm_fourcc.h updates committed * Get these patches and TegraDRM patches committed * Integrate final drm_fourcc.h to Mesa patches and get Mesa patches committed Does that sound right to you? Thanks, -James Thanks, Ben. James Jones (3): drm/nouveau: Add format mod prop to base/ovly/nvdisp drm/nouveau: Check framebuffer size against bo drm/nouveau: Support NVIDIA format modifiers drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 + drivers/gpu/drm/nouveau/dispnv50/wndw.c | 35 - drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 154 drivers/gpu/drm/nouveau/nouveau_display.h | 4 + 7 files changed, 272 insertions(+), 8 deletions(-) -- 2.17.1 ___ Nouveau mailing list nouv...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH v2 2/3] drm/nouveau: Check framebuffer size against bo
On 1/5/20 5:25 PM, Ben Skeggs wrote: On Tue, 17 Dec 2019 at 10:45, James Jones wrote: Make sure framebuffer dimensions and tiling parameters will not result in accesses beyond the end of the GEM buffer they are bound to. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 93 +++ 1 file changed, 93 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 6f038511a03a..f1509392d7b7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,72 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static inline uint32_t +nouveau_get_width_in_blocks(uint32_t stride) +{ + /* GOBs per block in the x direction is always one, and GOBs are +* 64 bytes wide +*/ + static const uint32_t log_block_width = 6; + + return (stride + (1 << log_block_width) - 1) >> log_block_width; +} + +static inline uint32_t +nouveau_get_height_in_blocks(struct nouveau_drm *drm, +uint32_t height, +uint32_t log_block_height_in_gobs) +{ + uint32_t log_gob_height; + uint32_t log_block_height; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + log_gob_height = 2; + else + log_gob_height = 3; + + log_block_height = log_block_height_in_gobs + log_gob_height; + + return (height + (1 << log_block_height) - 1) >> log_block_height; +} + +static int +nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo, + uint32_t offset, uint32_t stride, uint32_t h, + uint32_t tile_mode) +{ + uint32_t gob_size, bw, bh; + uint64_t bl_size; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.chipset >= 0xc0) + tile_mode >>= 4; + + BUG_ON(tile_mode & 0xFFF0); As far as I can tell, tile_mode can be fed into this function unsanitised from userspace, so we probably want something different to a BUG_ON() here. Good catch. I had assumed nouveau_bo::mode was validated at creation time. I'll get this fixed up. Thanks, -James + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + gob_size = 256; + else + gob_size = 512; + + bw = nouveau_get_width_in_blocks(stride); + bh = nouveau_get_height_in_blocks(drm, h, tile_mode); + + bl_size = bw * bh * (1 << tile_mode) * gob_size; + + DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u gob_size=%u bl_size=%llu size=%lu\n", + offset, stride, h, tile_mode, bw, bh, gob_size, bl_size, + nvbo->bo.mem.size); + + if (bl_size + offset > nvbo->bo.mem.size) + return -ERANGE; + + return 0; +} + int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, @@ -232,6 +298,8 @@ nouveau_framebuffer_new(struct drm_device *dev, { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_framebuffer *fb; + const struct drm_format_info *info; + unsigned int width, height, i; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -254,6 +322,31 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + info = drm_get_format_info(dev, mode_cmd); + + for (i = 0; i < info->num_planes; i++) { + width = drm_format_info_plane_width(info, + mode_cmd->width, + i); + height = drm_format_info_plane_height(info, + mode_cmd->height, + i); + + if (nvbo->kind) { + ret = nouveau_check_bl_size(drm, nvbo, + mode_cmd->offsets[i], + mode_cmd->pitches[i], + height, nvbo->mode); + if (ret) + return ret; + } else { + uint32_t size = mode_cmd->pitches[i] * height; + + if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size) + return -ERANGE; + } + } + if (!(fb = *pfb = kzalloc(si
[PATCH] drm/nouveau: Add correct turing page kinds
Turing introduced a new simplified page kind scheme, reducing the number of possible page kinds from 256 to 16. It also is the first NVIDIA GPU in which the highest possible page kind value is not reserved as an "invalid" page kind. To address this, the invalid page kind is made an explicit property of the MMU HAL, and a new table of page kinds is added to the tu102 MMU HAL. One hardware change not addressed here is that 0x00 is technically no longer a supported page kind, and pitch surfaces are instead intended to share the block-linear generic page kind 0x06. However, because that will be a rather invasive change to nouveau and 0x00 still works fine in practice on Turing hardware, addressing this new behavior is deferred. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/include/nvif/if0008.h| 2 +- drivers/gpu/drm/nouveau/include/nvif/mmu.h | 4 ++-- drivers/gpu/drm/nouveau/nvif/mmu.c | 1 + drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c | 3 ++- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c | 3 ++- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c | 3 ++- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/priv.h | 8 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/tu102.c | 16 +++- drivers/gpu/drm/nouveau/nvkm/subdev/mmu/ummu.c | 7 +-- .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c | 6 +++--- .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 +++--- .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c| 6 +++--- 12 files changed, 43 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/nouveau/include/nvif/if0008.h b/drivers/gpu/drm/nouveau/include/nvif/if0008.h index 8450127420f5..c21d09f04f1d 100644 --- a/drivers/gpu/drm/nouveau/include/nvif/if0008.h +++ b/drivers/gpu/drm/nouveau/include/nvif/if0008.h @@ -35,7 +35,7 @@ struct nvif_mmu_type_v0 { struct nvif_mmu_kind_v0 { __u8 version; - __u8 pad01[1]; + __u8 kind_inv; __u16 count; __u8 data[]; }; diff --git a/drivers/gpu/drm/nouveau/include/nvif/mmu.h b/drivers/gpu/drm/nouveau/include/nvif/mmu.h index 747ecf67e403..cec1e88a0a05 100644 --- a/drivers/gpu/drm/nouveau/include/nvif/mmu.h +++ b/drivers/gpu/drm/nouveau/include/nvif/mmu.h @@ -7,6 +7,7 @@ struct nvif_mmu { u8 dmabits; u8 heap_nr; u8 type_nr; + u8 kind_inv; u16 kind_nr; s32 mem; @@ -36,9 +37,8 @@ void nvif_mmu_fini(struct nvif_mmu *); static inline bool nvif_mmu_kind_valid(struct nvif_mmu *mmu, u8 kind) { - const u8 invalid = mmu->kind_nr - 1; if (kind) { - if (kind >= mmu->kind_nr || mmu->kind[kind] == invalid) + if (kind >= mmu->kind_nr || mmu->kind[kind] == mmu->kind_inv) return false; } return true; diff --git a/drivers/gpu/drm/nouveau/nvif/mmu.c b/drivers/gpu/drm/nouveau/nvif/mmu.c index 5641bda2046d..47efc408efa6 100644 --- a/drivers/gpu/drm/nouveau/nvif/mmu.c +++ b/drivers/gpu/drm/nouveau/nvif/mmu.c @@ -121,6 +121,7 @@ nvif_mmu_init(struct nvif_object *parent, s32 oclass, struct nvif_mmu *mmu) kind, argc); if (ret == 0) memcpy(mmu->kind, kind->data, kind->count); + mmu->kind_inv = kind->kind_inv; kfree(kind); } diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c index 2d075246dc46..2cd5ec81c0d0 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c @@ -30,7 +30,7 @@ * The value 0xff represents an invalid storage type. */ const u8 * -gf100_mmu_kind(struct nvkm_mmu *mmu, int *count) +gf100_mmu_kind(struct nvkm_mmu *mmu, int *count, u8 *invalid) { static const u8 kind[256] = { @@ -69,6 +69,7 @@ gf100_mmu_kind(struct nvkm_mmu *mmu, int *count) }; *count = ARRAY_SIZE(kind); + *invalid = 0xff; return kind; } diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c index dbf644ebac97..83990c83f9f8 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c @@ -27,7 +27,7 @@ #include const u8 * -gm200_mmu_kind(struct nvkm_mmu *mmu, int *count) +gm200_mmu_kind(struct nvkm_mmu *mmu, int *count, u8 *invalid) { static const u8 kind[256] = { @@ -65,6 +65,7 @@ gm200_mmu_kind(struct nvkm_mmu *mmu, int *count) 0xfe, 0xfe, 0xfe, 0xfe, 0xff, 0xfd, 0xfe, 0xff }; *count = ARRAY_SIZE(kind); + *invalid = 0xff; return kind; } diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c index db3dfbbb2aa0..c0083ddda65a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c +++ b/driver
[PATCH] drm/nouveau: Fix ttm move init with multiple GPUs
The pointer used to walk the table of move ops and pick the right one for the current GPU was declared static, meaning its state was carried over between invocations of the function, and also made the function non-rentrant and thread-unsafe. Since the table is ordered such that newer GPU methods are listed first, the result of this was that initializing newer GPUs after older GPUs would result in no suitable ttm move acceleration operations being found, and ttm would fall back to CPU blits on the older GPUs. This change declares the walking pointer separately from the table and makes it non-static to fix the logic. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_bo.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index f8015e0318d7..1b62ccc57aef 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1162,7 +1162,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr, void nouveau_bo_move_init(struct nouveau_drm *drm) { - static const struct { + static const struct _method_table { const char *name; int engine; s32 oclass; @@ -1192,7 +1192,8 @@ nouveau_bo_move_init(struct nouveau_drm *drm) { "M2MF", 0, 0x0039, nv04_bo_move_m2mf, nv04_bo_move_init }, {}, { "CRYPT", 0, 0x88b4, nv98_bo_move_exec, nv50_bo_move_init }, - }, *mthd = _methods; + }; + const struct _method_table *mthd = _methods; const char *name = "CPU"; int ret; -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/tegra: Use more descriptive format modifiers
Advertise and accept both the existing DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based format modifiers and the more descriptive DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D()-based format modifiers, preserving backwards compatibility with existing userspace drivers, but providing forwards compatibility with future userspace drivers that also make use of the more descriptive modifiers to enable differentiation between desktop and tegra, as well as compressed and non-compressed surfaces. This patch depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. Signed-off-by: James Jones --- drivers/gpu/drm/tegra/dc.c | 10 ++ drivers/gpu/drm/tegra/fb.c | 14 +++--- drivers/gpu/drm/tegra/hub.c | 10 ++ 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index fbf57bc3cdab..a2cc687dc2d8 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -588,6 +588,16 @@ static const u32 tegra124_primary_formats[] = { static const u64 tegra124_modifiers[] = { DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5), + /* +* For backwards compatibility with older userspace that may have +* baked in usage of the less-descriptive modifiers +*/ DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c index e34325c83d28..d04e0b1c61ea 100644 --- a/drivers/gpu/drm/tegra/fb.c +++ b/drivers/gpu/drm/tegra/fb.c @@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer, { uint64_t modifier = framebuffer->modifier; - switch (modifier) { + switch (drm_fourcc_canonicalize_nvidia_format_mod(modifier)) { case DRM_FORMAT_MOD_LINEAR: tiling->mode = TEGRA_BO_TILING_MODE_PITCH; tiling->value = 0; @@ -55,32 +55,32 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer, tiling->value = 0; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 0; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 1; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 2; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 3; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 4; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 5; break; diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c index 839b49c40e51..03c97b10b122 100644 --- a/drivers/gpu/drm/tegra/hub.c +++ b/drivers/gpu/drm/tegra/hub.c @@ -49,6 +49,16 @@ static const u32 tegra_shared_plane_formats[] = { static const u64 tegra_shared_plane_modifiers[] = { DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5), + /* +* For backwards compatibility with older userspace that may have +* baked in usage of the less-descriptive modifiers +*/ DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
Re: [Nouveau] [PATCH 3/3] drm/nouveau: Support NVIDIA format modifiers
On 12/12/19 6:51 PM, James Jones wrote: On 12/11/19 1:13 PM, Ilia Mirkin wrote: On Wed, Dec 11, 2019 at 4:04 PM James Jones wrote: Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against he list of modifiers supported by the target display hardware. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 3 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 70ad64cb2d34..06c1b18479c1 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->base.dev); struct nv50_wndw_ctxdma *ctxdma; - const u8 kind = fb->nvbo->kind; + const u8 kind = fb->kind; const u32 handle = 0xfb00 | kind; struct { struct nv_dma_v0 base; @@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { asyw->image.w = fb->base.width; asyw->image.h = fb->base.height; - asyw->image.kind = fb->nvbo->kind; + asyw->image.kind = fb->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = fb->nvbo->mode >> 4; + asyw->image.blockh = fb->tile_mode >> 4; else - asyw->image.blockh = fb->nvbo->mode; + asyw->image.blockh = fb->tile_mode; asyw->image.blocks[0] = fb->base.pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index f1509392d7b7..351b58410e1a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static int +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + struct nouveau_display *disp = nouveau_display(drm->dev); + int mod; + + BUG_ON(!tile_mode || !kind); + + if (drm->client.device.info.chipset < 0x50) { Not a full review, but you want to go off the family (chip_class iirc? something like that, should be obvious). Sadly 0x67/0x68 are higher than 0x50 numerically, but are logically part of the nv4x generation. Good catch. I'll get this fixed and send out an updated patchset. I fixed this one instance in the v2 series, and I didn't see any other potentially dangerous uses of chipset, so I left the others as-is, as they seemed to better match surrounding code or existing checks used for a given bit of functionality. Thanks, -James + return -EINVAL; + } + + BUG_ON(!disp->format_modifiers); + + for (mod = 0; + (disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && + (disp->format_modifiers[mod] != modifier); + mod++); + + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* + * Extract the block height and kind from the corresponding + * modifier fields. See drm_fourcc.h for details. + */ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } + + return 0; +} + static inline u
[PATCH v2 2/3] drm/nouveau: Check framebuffer size against bo
Make sure framebuffer dimensions and tiling parameters will not result in accesses beyond the end of the GEM buffer they are bound to. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 93 +++ 1 file changed, 93 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 6f038511a03a..f1509392d7b7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,72 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static inline uint32_t +nouveau_get_width_in_blocks(uint32_t stride) +{ + /* GOBs per block in the x direction is always one, and GOBs are +* 64 bytes wide +*/ + static const uint32_t log_block_width = 6; + + return (stride + (1 << log_block_width) - 1) >> log_block_width; +} + +static inline uint32_t +nouveau_get_height_in_blocks(struct nouveau_drm *drm, +uint32_t height, +uint32_t log_block_height_in_gobs) +{ + uint32_t log_gob_height; + uint32_t log_block_height; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + log_gob_height = 2; + else + log_gob_height = 3; + + log_block_height = log_block_height_in_gobs + log_gob_height; + + return (height + (1 << log_block_height) - 1) >> log_block_height; +} + +static int +nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo, + uint32_t offset, uint32_t stride, uint32_t h, + uint32_t tile_mode) +{ + uint32_t gob_size, bw, bh; + uint64_t bl_size; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.chipset >= 0xc0) + tile_mode >>= 4; + + BUG_ON(tile_mode & 0xFFF0); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + gob_size = 256; + else + gob_size = 512; + + bw = nouveau_get_width_in_blocks(stride); + bh = nouveau_get_height_in_blocks(drm, h, tile_mode); + + bl_size = bw * bh * (1 << tile_mode) * gob_size; + + DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u gob_size=%u bl_size=%llu size=%lu\n", + offset, stride, h, tile_mode, bw, bh, gob_size, bl_size, + nvbo->bo.mem.size); + + if (bl_size + offset > nvbo->bo.mem.size) + return -ERANGE; + + return 0; +} + int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, @@ -232,6 +298,8 @@ nouveau_framebuffer_new(struct drm_device *dev, { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_framebuffer *fb; + const struct drm_format_info *info; + unsigned int width, height, i; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -254,6 +322,31 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + info = drm_get_format_info(dev, mode_cmd); + + for (i = 0; i < info->num_planes; i++) { + width = drm_format_info_plane_width(info, + mode_cmd->width, + i); + height = drm_format_info_plane_height(info, + mode_cmd->height, + i); + + if (nvbo->kind) { + ret = nouveau_check_bl_size(drm, nvbo, + mode_cmd->offsets[i], + mode_cmd->pitches[i], + height, nvbo->mode); + if (ret) + return ret; + } else { + uint32_t size = mode_cmd->pitches[i] * height; + + if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size) + return -ERANGE; + } + } + if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL))) return -ENOMEM; -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 0/3] drm/nouveau: Support NVIDIA format modifiers
This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available here: https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work But those need a bit of cleanup before they're ready to submit. I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. v2: Used Tesla family instead of NV50 chipset compare to avoid treating oddly numbered NV4x-class chipsets as NV50+ GPUs. Other instances of compares with chipset number in the series were audited, deemed safe, and left as-is for consistency with existing code. James Jones (3): drm/nouveau: Add format mod prop to base/ovly/nvdisp drm/nouveau: Check framebuffer size against bo drm/nouveau: Support NVIDIA format modifiers drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 + drivers/gpu/drm/nouveau/dispnv50/wndw.c | 35 - drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 +++ drivers/gpu/drm/nouveau/nouveau_display.c | 154 drivers/gpu/drm/nouveau/nouveau_display.h | 4 + 7 files changed, 272 insertions(+), 8 deletions(-) -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 3/3] drm/nouveau: Support NVIDIA format modifiers
Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against he list of modifiers supported by the target display hardware. v2: Used Tesla family instead of NV50 chipset compare Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 3 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 70ad64cb2d34..06c1b18479c1 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->base.dev); struct nv50_wndw_ctxdma *ctxdma; - const u8kind = fb->nvbo->kind; + const u8kind = fb->kind; const u32 handle = 0xfb00 | kind; struct { struct nv_dma_v0 base; @@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { asyw->image.w = fb->base.width; asyw->image.h = fb->base.height; - asyw->image.kind = fb->nvbo->kind; + asyw->image.kind = fb->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = fb->nvbo->mode >> 4; + asyw->image.blockh = fb->tile_mode >> 4; else - asyw->image.blockh = fb->nvbo->mode; + asyw->image.blockh = fb->tile_mode; asyw->image.blocks[0] = fb->base.pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index f1509392d7b7..50e055adebd4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static int +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + struct nouveau_display *disp = nouveau_display(drm->dev); + int mod; + + BUG_ON(!tile_mode || !kind); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA) { + return -EINVAL; + } + + BUG_ON(!disp->format_modifiers); + + for (mod = 0; +(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(disp->format_modifiers[mod] != modifier); +mod++); + + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* +* Extract the block height and kind from the corresponding +* modifier fields. See drm_fourcc.h for details. +*/ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } + + return 0; +} + static inline uint32_t nouveau_get_width_in_blocks(uint32_t stride) { @@ -300,6 +344,8 @@ nouveau_framebuffer_new(struct drm_device *dev, struct nouveau_framebuffer *fb; const struct drm_format_info *info; unsigned int width, height, i; + uint32_t tile_mode; + uint8_t kind; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -322,6 +368,18 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) { + if (nouveau_decode_mod(drm, mode_cmd->modifier[0], _mode, +
[PATCH v2 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp
Advertise support for the full list of format modifiers supported by each class of NVIDIA desktop GPU display hardware. Stash the array of modifiers in the nouveau_display struct for use when validating userspace framebuffer creation requests, which will be supportd in a subsequent change. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +-- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 + drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 ++ drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +- drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c b/drivers/gpu/drm/nouveau/dispnv50/base507c.c index 00a85f1e1a4a..025b8f996a0a 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c @@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, struct nv50_disp_base_channel_dma_v0 args = { .head = head, }; - struct nv50_disp *disp = nv50_disp(drm->dev); + struct nouveau_display *disp = nouveau_display(drm->dev); + struct nv50_disp *disp50 = nv50_disp(drm->dev); struct nv50_wndw *wndw; int ret; @@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, if (*pwndw = wndw, ret) return ret; - ret = nv50_dmac_create(>client.device, >disp->object, + ret = nv50_dmac_create(>client.device, >disp.object, , head, , sizeof(args), - disp->sync->bo.offset, >wndw); + disp50->sync->bo.offset, >wndw); if (ret) { NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret); return ret; diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 064a69d161e3..0956367d27a2 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2337,6 +2337,15 @@ nv50_display_create(struct drm_device *dev) if (ret) goto out; + /* Assign the correct format modifiers */ + if (disp->disp->object.oclass >= TU102_DISP) + nouveau_display(dev)->format_modifiers = wndwc57e_modifiers; + else + if (disp->disp->object.oclass >= GF110_DISP) + nouveau_display(dev)->format_modifiers = disp90xx_modifiers; + else + nouveau_display(dev)->format_modifiers = disp50xx_modifiers; + /* create crtc objects to represent the hw heads */ if (disp->disp->object.oclass >= GV100_DISP) crtcs = nvif_rd32(>object, 0x610060) & 0xff; @@ -2404,3 +2413,53 @@ nv50_display_create(struct drm_device *dev) nv50_display_destroy(dev); return ret; } + +/** + * Format modifiers + */ + +/ + *Log2(block height) + * + *Page Kind --+ | * + *Gob Height/Page Kind Generation --+ | | * + * Sector layout ---+ | | | * + * Compression --+ | | | | */ +const u64 disp50xx_modifiers[] = { /* | | | | | */ + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4), + DRM_FORMAT_MOD_NVIDIA_B
Re: [Nouveau] [PATCH 3/3] drm/nouveau: Support NVIDIA format modifiers
On 12/11/19 1:13 PM, Ilia Mirkin wrote: On Wed, Dec 11, 2019 at 4:04 PM James Jones wrote: Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against he list of modifiers supported by the target display hardware. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 3 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 70ad64cb2d34..06c1b18479c1 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->base.dev); struct nv50_wndw_ctxdma *ctxdma; - const u8kind = fb->nvbo->kind; + const u8kind = fb->kind; const u32 handle = 0xfb00 | kind; struct { struct nv_dma_v0 base; @@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { asyw->image.w = fb->base.width; asyw->image.h = fb->base.height; - asyw->image.kind = fb->nvbo->kind; + asyw->image.kind = fb->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = fb->nvbo->mode >> 4; + asyw->image.blockh = fb->tile_mode >> 4; else - asyw->image.blockh = fb->nvbo->mode; + asyw->image.blockh = fb->tile_mode; asyw->image.blocks[0] = fb->base.pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index f1509392d7b7..351b58410e1a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static int +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + struct nouveau_display *disp = nouveau_display(drm->dev); + int mod; + + BUG_ON(!tile_mode || !kind); + + if (drm->client.device.info.chipset < 0x50) { Not a full review, but you want to go off the family (chip_class iirc? something like that, should be obvious). Sadly 0x67/0x68 are higher than 0x50 numerically, but are logically part of the nv4x generation. Good catch. I'll get this fixed and send out an updated patchset. Thanks, -James + return -EINVAL; + } + + BUG_ON(!disp->format_modifiers); + + for (mod = 0; +(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(disp->format_modifiers[mod] != modifier); +mod++); + + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* +* Extract the block height and kind from the corresponding +* modifier fields. See drm_fourcc.h for details. +*/ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } + + return 0; +} + static inline uint32_t nouveau_get_width_in_blocks(uint32_t stride) { @@ -300,6 +344,8 @@ nouveau_framebuffer_new(struct drm_device *dev, struct nouveau_framebuffer *fb; const struct drm_format_info *info; unsigned int width, height, i; + uint32_t tile_mode; + uint8_t kind; int ret;
[PATCH 3/3] drm/nouveau: Support NVIDIA format modifiers
Allow setting the block layout of a nouveau FB object using DRM format modifiers. When specified, the format modifier block layout and kind overrides the GEM buffer's implicit layout and kind. The specified format modifier is validated against he list of modifiers supported by the target display hardware. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++- drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 3 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 70ad64cb2d34..06c1b18479c1 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb) { struct nouveau_drm *drm = nouveau_drm(fb->base.dev); struct nv50_wndw_ctxdma *ctxdma; - const u8kind = fb->nvbo->kind; + const u8kind = fb->kind; const u32 handle = 0xfb00 | kind; struct { struct nv_dma_v0 base; @@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) { asyw->image.w = fb->base.width; asyw->image.h = fb->base.height; - asyw->image.kind = fb->nvbo->kind; + asyw->image.kind = fb->kind; ret = nv50_wndw_atomic_check_acquire_rgb(asyw); if (ret) { @@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool modeset, if (asyw->image.kind) { asyw->image.layout = 0; if (drm->client.device.info.chipset >= 0xc0) - asyw->image.blockh = fb->nvbo->mode >> 4; + asyw->image.blockh = fb->tile_mode >> 4; else - asyw->image.blockh = fb->nvbo->mode; + asyw->image.blockh = fb->tile_mode; asyw->image.blocks[0] = fb->base.pitches[0] / 64; asyw->image.pitch[0] = 0; } else { diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index f1509392d7b7..351b58410e1a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static int +nouveau_decode_mod(struct nouveau_drm *drm, + uint64_t modifier, + uint32_t *tile_mode, + uint8_t *kind) +{ + struct nouveau_display *disp = nouveau_display(drm->dev); + int mod; + + BUG_ON(!tile_mode || !kind); + + if (drm->client.device.info.chipset < 0x50) { + return -EINVAL; + } + + BUG_ON(!disp->format_modifiers); + + for (mod = 0; +(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) && +(disp->format_modifiers[mod] != modifier); +mod++); + + if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) + return -EINVAL; + + if (modifier == DRM_FORMAT_MOD_LINEAR) { + /* tile_mode will not be used in this case */ + *tile_mode = 0; + *kind = 0; + } else { + /* +* Extract the block height and kind from the corresponding +* modifier fields. See drm_fourcc.h for details. +*/ + *tile_mode = (uint32_t)(modifier & 0xF); + *kind = (uint8_t)((modifier >> 12) & 0xFF); + + if (drm->client.device.info.chipset >= 0xc0) + *tile_mode <<= 4; + } + + return 0; +} + static inline uint32_t nouveau_get_width_in_blocks(uint32_t stride) { @@ -300,6 +344,8 @@ nouveau_framebuffer_new(struct drm_device *dev, struct nouveau_framebuffer *fb; const struct drm_format_info *info; unsigned int width, height, i; + uint32_t tile_mode; + uint8_t kind; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -322,6 +368,18 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) { + if (nouveau_decode_mod(drm, mode_cmd->modifier[0], _mode, + )) { + DRM_DEBUG_KMS("Unsupported mo
[PATCH 2/3] drm/nouveau: Check framebuffer size against bo
Make sure framebuffer dimensions and tiling parameters will not result in accesses beyond the end of the GEM buffer they are bound to. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/nouveau_display.c | 93 +++ 1 file changed, 93 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 6f038511a03a..f1509392d7b7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -224,6 +224,72 @@ static const struct drm_framebuffer_funcs nouveau_framebuffer_funcs = { .create_handle = nouveau_user_framebuffer_create_handle, }; +static inline uint32_t +nouveau_get_width_in_blocks(uint32_t stride) +{ + /* GOBs per block in the x direction is always one, and GOBs are +* 64 bytes wide +*/ + static const uint32_t log_block_width = 6; + + return (stride + (1 << log_block_width) - 1) >> log_block_width; +} + +static inline uint32_t +nouveau_get_height_in_blocks(struct nouveau_drm *drm, +uint32_t height, +uint32_t log_block_height_in_gobs) +{ + uint32_t log_gob_height; + uint32_t log_block_height; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + log_gob_height = 2; + else + log_gob_height = 3; + + log_block_height = log_block_height_in_gobs + log_gob_height; + + return (height + (1 << log_block_height) - 1) >> log_block_height; +} + +static int +nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo, + uint32_t offset, uint32_t stride, uint32_t h, + uint32_t tile_mode) +{ + uint32_t gob_size, bw, bh; + uint64_t bl_size; + + BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA); + + if (drm->client.device.info.chipset >= 0xc0) + tile_mode >>= 4; + + BUG_ON(tile_mode & 0xFFF0); + + if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI) + gob_size = 256; + else + gob_size = 512; + + bw = nouveau_get_width_in_blocks(stride); + bh = nouveau_get_height_in_blocks(drm, h, tile_mode); + + bl_size = bw * bh * (1 << tile_mode) * gob_size; + + DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u gob_size=%u bl_size=%llu size=%lu\n", + offset, stride, h, tile_mode, bw, bh, gob_size, bl_size, + nvbo->bo.mem.size); + + if (bl_size + offset > nvbo->bo.mem.size) + return -ERANGE; + + return 0; +} + int nouveau_framebuffer_new(struct drm_device *dev, const struct drm_mode_fb_cmd2 *mode_cmd, @@ -232,6 +298,8 @@ nouveau_framebuffer_new(struct drm_device *dev, { struct nouveau_drm *drm = nouveau_drm(dev); struct nouveau_framebuffer *fb; + const struct drm_format_info *info; + unsigned int width, height, i; int ret; /* YUV overlays have special requirements pre-NV50 */ @@ -254,6 +322,31 @@ nouveau_framebuffer_new(struct drm_device *dev, return -EINVAL; } + info = drm_get_format_info(dev, mode_cmd); + + for (i = 0; i < info->num_planes; i++) { + width = drm_format_info_plane_width(info, + mode_cmd->width, + i); + height = drm_format_info_plane_height(info, + mode_cmd->height, + i); + + if (nvbo->kind) { + ret = nouveau_check_bl_size(drm, nvbo, + mode_cmd->offsets[i], + mode_cmd->pitches[i], + height, nvbo->mode); + if (ret) + return ret; + } else { + uint32_t size = mode_cmd->pitches[i] * height; + + if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size) + return -ERANGE; + } + } + if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL))) return -ENOMEM; -- 2.17.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 0/3] drm/nouveau: Support NVIDIA format modifiers
This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available here: https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work But those need a bit of cleanup before they're ready to submit. I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. Signed-off-by: James Jones --- drivers/gpu/drm/tegra/dc.c | 10 ++ drivers/gpu/drm/tegra/fb.c | 14 +++--- drivers/gpu/drm/tegra/hub.c | 10 ++ 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index fbf57bc3cdab..a2cc687dc2d8 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -588,6 +588,16 @@ static const u32 tegra124_primary_formats[] = { static const u64 tegra124_modifiers[] = { DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5), + /* +* For backwards compatibility with older userspace that may have +* baked in usage of the less-descriptive modifiers +*/ DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c index e34325c83d28..d04e0b1c61ea 100644 --- a/drivers/gpu/drm/tegra/fb.c +++ b/drivers/gpu/drm/tegra/fb.c @@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer, { uint64_t modifier = framebuffer->modifier; - switch (modifier) { + switch (drm_fourcc_canonicalize_nvidia_format_mod(modifier)) { case DRM_FORMAT_MOD_LINEAR: tiling->mode = TEGRA_BO_TILING_MODE_PITCH; tiling->value = 0; @@ -55,32 +55,32 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer, tiling->value = 0; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 0; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 1; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 2; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 3; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 4; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 5; break; diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c index 839b49c40e51..03c97b10b122 100644 --- a/drivers/gpu/drm/tegra/hub.c +++ b/drivers/gpu/drm/tegra/hub.c @@ -49,6 +49,16 @@ static const u32 tegra_shared_plane_formats[] = { static const u64 tegra_shared_plane_modifiers[] = { DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5), + /* +* For backwards compatibility with older userspace that may have +* baked in usage of the less-descriptive modifiers +*/ DRM_FORMAT_MOD_NVIDIA_1
[PATCH 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp
Advertise support for the full list of format modifiers supported by each class of NVIDIA desktop GPU display hardware. Stash the array of modifiers in the nouveau_display struct for use when validating userspace framebuffer creation requests, which will be supportd in a subsequent change. Signed-off-by: James Jones --- drivers/gpu/drm/nouveau/dispnv50/base507c.c | 7 +-- drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 + drivers/gpu/drm/nouveau/dispnv50/disp.h | 4 ++ drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +- drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++ drivers/gpu/drm/nouveau/nouveau_display.h | 2 + 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c b/drivers/gpu/drm/nouveau/dispnv50/base507c.c index 00a85f1e1a4a..025b8f996a0a 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c +++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c @@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, struct nv50_disp_base_channel_dma_v0 args = { .head = head, }; - struct nv50_disp *disp = nv50_disp(drm->dev); + struct nouveau_display *disp = nouveau_display(drm->dev); + struct nv50_disp *disp50 = nv50_disp(drm->dev); struct nv50_wndw *wndw; int ret; @@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 *format, if (*pwndw = wndw, ret) return ret; - ret = nv50_dmac_create(>client.device, >disp->object, + ret = nv50_dmac_create(>client.device, >disp.object, , head, , sizeof(args), - disp->sync->bo.offset, >wndw); + disp50->sync->bo.offset, >wndw); if (ret) { NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret); return ret; diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 064a69d161e3..0956367d27a2 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2337,6 +2337,15 @@ nv50_display_create(struct drm_device *dev) if (ret) goto out; + /* Assign the correct format modifiers */ + if (disp->disp->object.oclass >= TU102_DISP) + nouveau_display(dev)->format_modifiers = wndwc57e_modifiers; + else + if (disp->disp->object.oclass >= GF110_DISP) + nouveau_display(dev)->format_modifiers = disp90xx_modifiers; + else + nouveau_display(dev)->format_modifiers = disp50xx_modifiers; + /* create crtc objects to represent the hw heads */ if (disp->disp->object.oclass >= GV100_DISP) crtcs = nvif_rd32(>object, 0x610060) & 0xff; @@ -2404,3 +2413,53 @@ nv50_display_create(struct drm_device *dev) nv50_display_destroy(dev); return ret; } + +/** + * Format modifiers + */ + +/ + *Log2(block height) + * + *Page Kind --+ | * + *Gob Height/Page Kind Generation --+ | | * + * Sector layout ---+ | | | * + * Compression --+ | | | | */ +const u64 disp50xx_modifiers[] = { /* | | | | | */ + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4), + DRM_FORMAT_MOD_NVIDIA_B
Re: [PATCH 0/3] drm/nouveau: Support NVIDIA format modifiers
Please ignore the tegra diff on the bottom of this. I never fail to find a way to mess up git-send-email. -James On 12/11/19 12:59 PM, James Jones wrote: This series modifies the NV5x+ nouveau display backends to advertise appropriate format modifiers on their display planes in atomic mode setting blobs. Corresponding modifications to Mesa/userspace are available here: https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work But those need a bit of cleanup before they're ready to submit. I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware using various formats and all the exposed format modifiers, plus some negative testing with invalid ones. NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM format mod" patch submitted to dri-devel. Signed-off-by: James Jones --- drivers/gpu/drm/tegra/dc.c | 10 ++ drivers/gpu/drm/tegra/fb.c | 14 +++--- drivers/gpu/drm/tegra/hub.c | 10 ++ 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index fbf57bc3cdab..a2cc687dc2d8 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -588,6 +588,16 @@ static const u32 tegra124_primary_formats[] = { static const u64 tegra124_modifiers[] = { DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5), + /* +* For backwards compatibility with older userspace that may have +* baked in usage of the less-descriptive modifiers +*/ DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1), DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2), diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c index e34325c83d28..d04e0b1c61ea 100644 --- a/drivers/gpu/drm/tegra/fb.c +++ b/drivers/gpu/drm/tegra/fb.c @@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer, { uint64_t modifier = framebuffer->modifier; - switch (modifier) { + switch (drm_fourcc_canonicalize_nvidia_format_mod(modifier)) { case DRM_FORMAT_MOD_LINEAR: tiling->mode = TEGRA_BO_TILING_MODE_PITCH; tiling->value = 0; @@ -55,32 +55,32 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer, tiling->value = 0; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 0; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 1; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 2; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 3; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 4; break; - case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5): + case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5): tiling->mode = TEGRA_BO_TILING_MODE_BLOCK; tiling->value = 5; break; diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c index 839b49c40e51..03c97b10b122 100644 --- a/drivers/gpu/drm/tegra/hub.c +++ b/drivers/gpu/drm/tegra/hub.c @@ -49,6 +49,16 @@ static const u32 tegra_shared_plane_formats[] = { static const u64 tegra_shared_plane_modifiers[] = { DRM_FORMAT_MOD_LINEAR, + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4), + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5), + /* +* For backwards compatibility with older userspace that
[PATCH v3] drm: Generalized NV Block Linear DRM format mod
Builds upon the existing NVIDIA 16Bx2 block linear format modifiers by adding more "fields" to the existing parameterized DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier macro that allow fully defining a unique-across- all-NVIDIA-hardware bit layout using a minimal set of fields and values. The new modifier macro DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is effectively backwards compatible with the existing macro, introducing a superset of the previously definable format modifiers. Backwards compatibility has two quirks. First, the zero value for the "kind" field, which is implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK macro, must be special cased in drivers and assumed to map to the pre-Turing generic kind of 0xfe, since a kind of "zero" is reserved for linear buffer layouts on all GPUs. Second, it is assumed backwards compatibility is only needed when running on Tegra GPUs, and specifically Tegra GPUs prior to Xavier. This is based on two assertions: -Tegra GPUs prior to Xavier used a slightly different raw bit layout than desktop GPUs, making it impossible to directly share block linear buffers between the two. -Support for the existing block linear modifiers was incomplete, making them useful only for exporting buffers created by nouveau and importing them to Tegra DRM as framebuffers for scan out. There was no support for adding framebuffers using format modifiers in nouveau, nor importing dma-buf/PRIME GEM objects into nouveau userspace drivers with modifiers in Mesa. Hence it is assumed the prior modifiers were not intended for use on desktop GPUs, and as a corollary, were not intended to support sharing block linear buffers across two different NVIDIA GPUs. v2: - Added canonicalize helper function v3: - Added additional bit to compression field to support Tesla (NV5x,G8x,G9x,GT1xx,GT2xx) class chips. Signed-off-by: James Jones --- include/uapi/drm/drm_fourcc.h | 122 +++--- 1 file changed, 114 insertions(+), 8 deletions(-) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..4330d930bdbb 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -497,7 +497,113 @@ extern "C" { #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) /* - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, + * and Tegra GPUs starting with Tegra K1. + * + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies + * based on the architecture generation. GOBs themselves are then arranged in + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents + * a block depth or height of "4"). + * + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format + * in full detail. + * + * Macro + * Bits Param Description + * - - + * + * 3:0 h log2(height) of each block, in GOBs. Placed here for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block + * size). Must be zero. + * + * Note there is no log2(width) parameter. Some portions of the + * hardware support a block width of two gobs, but it is impractical + * to use due to lack of support elsewhere, and has no known + * benefits. + * + * 11:9 - Reserved (To support 2D-array textures with variable array stride + * in blocks, specified via log2(tile width in blocks)). Must be + * zero. + * + * 19:12 k Page Kind. This value directly maps to a field in the page + * tables of all GPUs >= NV50. It affects the exact layout of bits + * in memory and can be derived from the tuple + * + * (format, GPU model, compression type, samples per pixel) + * + * Where compression type is defined below. If GPU model were + * implied by the format modifier, format, or memory buffer, page + * kind would not need to be included in the modifier itself, but + * since the modifier should define the layout of the associated + * memory buffer independent from any device or other context, it + * must be included here. + * + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed + *
Re: [PATCH] drm: Generalized NV Block Linear DRM format mod
On 10/15/19 8:42 AM, Daniel Vetter wrote: On Tue, Oct 15, 2019 at 5:14 PM James Jones wrote: On 10/15/19 7:19 AM, Daniel Vetter wrote: On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote: Builds upon the existing NVIDIA 16Bx2 block linear format modifiers by adding more "fields" to the existing parameterized DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier macro that allow fully defining a unique-across- all-NVIDIA-hardware bit layout using a minimal set of fields and values. The new modifier macro DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is effectively backwards compatible with the existing macro, introducing a superset of the previously definable format modifiers. Backwards compatibility has two quirks. First, the zero value for the "kind" field, which is implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK macro, must be special cased in drivers and assumed to map to the pre-Turing generic kind of 0xfe, since a kind of "zero" is reserved for linear buffer layouts on all GPUs. Second, it is assumed backwards compatibility is only needed when running on Tegra GPUs, and specifically Tegra GPUs prior to Xavier. This is based on two assertions: -Tegra GPUs prior to Xavier used a slightly different raw bit layout than desktop GPUs, making it impossible to directly share block linear buffers between the two. -Support for the existing block linear modifiers was incomplete, making them useful only for exporting buffers created by nouveau and importing them to Tegra DRM as framebuffers for scan out. There was no support for adding framebuffers using format modifiers in nouveau, nor importing dma-buf/PRIME GEM objects into nouveau userspace drivers with modifiers in Mesa. Hence it is assumed the prior modifiers were not intended for use on desktop GPUs, and as a corrolary, were not intended to support sharing block linear buffers across two different NVIDIA GPUs. Signed-off-by: James Jones --- include/uapi/drm/drm_fourcc.h | 108 +++--- 1 file changed, 100 insertions(+), 8 deletions(-) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..cc9853d42a24 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -497,7 +497,99 @@ extern "C" { #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) /* - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, + * and Tegra GPUs starting with Tegra K1. + * + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies + * based on the architecture generation. GOBs themselves are then arranged in + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents + * a block depth or height of "4"). + * + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format + * in full detail. + * + * Macro + * Bits Param Description + * - - + * + * 3:0 h log2(height) of each block, in GOBs. Placed here for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block + * size). Must be zero. + * + * Note there is no log2(width) parameter. Some portions of the + * hardware support a block width of two gobs, but it is impractical + * to use due to lack of support elsewhere, and has no known + * benefits. + * + * 11:9 - Reserved (To support 2D-array textures with variable array stride + * in blocks, specified via log2(tile width in blocks)). Must be + * zero. + * + * 19:12 k Page Kind. This value directly maps to a field in the page + * tables of all GPUs >= NV50. It affects the exact layout of bits + * in memory and can be derived from the tuple + * + * (format, GPU model, compression type, samples per pixel) + * + * Where compression type is defined below. If GPU model were + * implied by the format modifier, format, or memory buffer, page + * kind would not need to be included in the modifier itself, but + * since the modifier should define the layout of the associated + * memory buffer independent from any device or other context, it + * must be included here. + * + * To grandfat
[PATCH v2] drm: Generalized NV Block Linear DRM format mod
Builds upon the existing NVIDIA 16Bx2 block linear format modifiers by adding more "fields" to the existing parameterized DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier macro that allow fully defining a unique-across- all-NVIDIA-hardware bit layout using a minimal set of fields and values. The new modifier macro DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is effectively backwards compatible with the existing macro, introducing a superset of the previously definable format modifiers. Backwards compatibility has two quirks. First, the zero value for the "kind" field, which is implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK macro, must be special cased in drivers and assumed to map to the pre-Turing generic kind of 0xfe, since a kind of "zero" is reserved for linear buffer layouts on all GPUs. Second, it is assumed backwards compatibility is only needed when running on Tegra GPUs, and specifically Tegra GPUs prior to Xavier. This is based on two assertions: -Tegra GPUs prior to Xavier used a slightly different raw bit layout than desktop GPUs, making it impossible to directly share block linear buffers between the two. -Support for the existing block linear modifiers was incomplete, making them useful only for exporting buffers created by nouveau and importing them to Tegra DRM as framebuffers for scan out. There was no support for adding framebuffers using format modifiers in nouveau, nor importing dma-buf/PRIME GEM objects into nouveau userspace drivers with modifiers in Mesa. Hence it is assumed the prior modifiers were not intended for use on desktop GPUs, and as a corrolary, were not intended to support sharing block linear buffers across two different NVIDIA GPUs. v2: - Added canonicalize helper function Signed-off-by: James Jones --- include/uapi/drm/drm_fourcc.h | 116 +++--- 1 file changed, 108 insertions(+), 8 deletions(-) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..56c8fe30caab 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -497,7 +497,107 @@ extern "C" { #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) /* - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, + * and Tegra GPUs starting with Tegra K1. + * + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies + * based on the architecture generation. GOBs themselves are then arranged in + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents + * a block depth or height of "4"). + * + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format + * in full detail. + * + * Macro + * Bits Param Description + * - - + * + * 3:0 h log2(height) of each block, in GOBs. Placed here for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block + * size). Must be zero. + * + * Note there is no log2(width) parameter. Some portions of the + * hardware support a block width of two gobs, but it is impractical + * to use due to lack of support elsewhere, and has no known + * benefits. + * + * 11:9 - Reserved (To support 2D-array textures with variable array stride + * in blocks, specified via log2(tile width in blocks)). Must be + * zero. + * + * 19:12 k Page Kind. This value directly maps to a field in the page + * tables of all GPUs >= NV50. It affects the exact layout of bits + * in memory and can be derived from the tuple + * + * (format, GPU model, compression type, samples per pixel) + * + * Where compression type is defined below. If GPU model were + * implied by the format modifier, format, or memory buffer, page + * kind would not need to be included in the modifier itself, but + * since the modifier should define the layout of the associated + * memory buffer independent from any device or other context, it + * must be included here. + * + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed + * starting with Fermi GPUs. Additionally, the mapping between page + * kind and bit layout has changed
Re: [PATCH] drm: Generalized NV Block Linear DRM format mod
On 10/15/19 7:19 AM, Daniel Vetter wrote: On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote: Builds upon the existing NVIDIA 16Bx2 block linear format modifiers by adding more "fields" to the existing parameterized DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier macro that allow fully defining a unique-across- all-NVIDIA-hardware bit layout using a minimal set of fields and values. The new modifier macro DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is effectively backwards compatible with the existing macro, introducing a superset of the previously definable format modifiers. Backwards compatibility has two quirks. First, the zero value for the "kind" field, which is implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK macro, must be special cased in drivers and assumed to map to the pre-Turing generic kind of 0xfe, since a kind of "zero" is reserved for linear buffer layouts on all GPUs. Second, it is assumed backwards compatibility is only needed when running on Tegra GPUs, and specifically Tegra GPUs prior to Xavier. This is based on two assertions: -Tegra GPUs prior to Xavier used a slightly different raw bit layout than desktop GPUs, making it impossible to directly share block linear buffers between the two. -Support for the existing block linear modifiers was incomplete, making them useful only for exporting buffers created by nouveau and importing them to Tegra DRM as framebuffers for scan out. There was no support for adding framebuffers using format modifiers in nouveau, nor importing dma-buf/PRIME GEM objects into nouveau userspace drivers with modifiers in Mesa. Hence it is assumed the prior modifiers were not intended for use on desktop GPUs, and as a corrolary, were not intended to support sharing block linear buffers across two different NVIDIA GPUs. Signed-off-by: James Jones --- include/uapi/drm/drm_fourcc.h | 108 +++--- 1 file changed, 100 insertions(+), 8 deletions(-) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..cc9853d42a24 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -497,7 +497,99 @@ extern "C" { #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) /* - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, + * and Tegra GPUs starting with Tegra K1. + * + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies + * based on the architecture generation. GOBs themselves are then arranged in + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents + * a block depth or height of "4"). + * + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format + * in full detail. + * + * Macro + * Bits Param Description + * - - + * + * 3:0 h log2(height) of each block, in GOBs. Placed here for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block + * size). Must be zero. + * + * Note there is no log2(width) parameter. Some portions of the + * hardware support a block width of two gobs, but it is impractical + * to use due to lack of support elsewhere, and has no known + * benefits. + * + * 11:9 - Reserved (To support 2D-array textures with variable array stride + * in blocks, specified via log2(tile width in blocks)). Must be + * zero. + * + * 19:12 k Page Kind. This value directly maps to a field in the page + * tables of all GPUs >= NV50. It affects the exact layout of bits + * in memory and can be derived from the tuple + * + * (format, GPU model, compression type, samples per pixel) + * + * Where compression type is defined below. If GPU model were + * implied by the format modifier, format, or memory buffer, page + * kind would not need to be included in the modifier itself, but + * since the modifier should define the layout of the associated + * memory buffer independent from any device or other context, it + * must be included here. + * + * To grandfather in prior block linear format modifiers to this + * layout, the page kind "0", w
[PATCH] drm: Generalized NV Block Linear DRM format mod
Beyond general review, I'm looking for feedback on a few things specifically here: -Is the level of backwards compatibility described here sufficient? Technically I can make the user space drivers support the old modifiers too, but that would mean the layout they specify would morph based on the GPU they're being used on, and sharing buffers between two different NV GPUs, which would appear to be possible, would result in corruption on one side or the other. -I used "magic" numbers for all the bit shifting. Would it be better to use __fourcc_XXX constants like the broadcom modifiers do? I wasn't sure which style was preferred. The nouveau code is full of magic numbers, but that's a bit lower level than this file. If preferred, I can send this out as part of a patchset that adds support for the modifiers to nouveau and TegraDRM, but I have some things to clean up there before it's ready for proper review, and I didn't want to block review of the basic modifier layout on that work. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm: Generalized NV Block Linear DRM format mod
Builds upon the existing NVIDIA 16Bx2 block linear format modifiers by adding more "fields" to the existing parameterized DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier macro that allow fully defining a unique-across- all-NVIDIA-hardware bit layout using a minimal set of fields and values. The new modifier macro DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is effectively backwards compatible with the existing macro, introducing a superset of the previously definable format modifiers. Backwards compatibility has two quirks. First, the zero value for the "kind" field, which is implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK macro, must be special cased in drivers and assumed to map to the pre-Turing generic kind of 0xfe, since a kind of "zero" is reserved for linear buffer layouts on all GPUs. Second, it is assumed backwards compatibility is only needed when running on Tegra GPUs, and specifically Tegra GPUs prior to Xavier. This is based on two assertions: -Tegra GPUs prior to Xavier used a slightly different raw bit layout than desktop GPUs, making it impossible to directly share block linear buffers between the two. -Support for the existing block linear modifiers was incomplete, making them useful only for exporting buffers created by nouveau and importing them to Tegra DRM as framebuffers for scan out. There was no support for adding framebuffers using format modifiers in nouveau, nor importing dma-buf/PRIME GEM objects into nouveau userspace drivers with modifiers in Mesa. Hence it is assumed the prior modifiers were not intended for use on desktop GPUs, and as a corrolary, were not intended to support sharing block linear buffers across two different NVIDIA GPUs. Signed-off-by: James Jones --- include/uapi/drm/drm_fourcc.h | 108 +++--- 1 file changed, 100 insertions(+), 8 deletions(-) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..cc9853d42a24 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -497,7 +497,99 @@ extern "C" { #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1) /* - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80, + * and Tegra GPUs starting with Tegra K1. + * + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies + * based on the architecture generation. GOBs themselves are then arranged in + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents + * a block depth or height of "4"). + * + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format + * in full detail. + * + * Macro + * Bits Param Description + * - - + * + * 3:0 h log2(height) of each block, in GOBs. Placed here for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for + * compatibility with the existing + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers. + * + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block + * size). Must be zero. + * + * Note there is no log2(width) parameter. Some portions of the + * hardware support a block width of two gobs, but it is impractical + * to use due to lack of support elsewhere, and has no known + * benefits. + * + * 11:9 - Reserved (To support 2D-array textures with variable array stride + * in blocks, specified via log2(tile width in blocks)). Must be + * zero. + * + * 19:12 k Page Kind. This value directly maps to a field in the page + * tables of all GPUs >= NV50. It affects the exact layout of bits + * in memory and can be derived from the tuple + * + * (format, GPU model, compression type, samples per pixel) + * + * Where compression type is defined below. If GPU model were + * implied by the format modifier, format, or memory buffer, page + * kind would not need to be included in the modifier itself, but + * since the modifier should define the layout of the associated + * memory buffer independent from any device or other context, it + * must be included here. + * + * To grandfather in prior block linear format modifiers to this + * layout, the page kind "0", which corresponds to "pitch/linear" + * and hence is unusable with block-li
Re: XDC allocator workshop and Wayland dmabuf hints
On 10/13/19 2:05 PM, Scott Anderson wrote: (Sorry to CCs for spam, I made an error in my first posting) Hi, There were certainly some interesting changes discussed at the allocator workshop during XDC this year, and I'd like to just summarise my thoughts on it and make sure everybody is on the same page. For those who don't know who I am or my stake in this, I'm the maintainer of the DRM and graphics code for the wlroots Wayland compositor library. I'm ascent12 on Github and Freenode. My understanding of the issue Nvidia was trying to solve was the in-place transition between different format modifiers. E.g. if a client is to be scanned out, the buffer would need to be transitioned to a non-compressed format that the display controller can work with, but if the client is to be composited, a compressed format would be used, saving on memory bandwidth. Hardware may have more efficient ways to transition between different formats, so it would be good if we can use these and not rely on having to perform a blit if we don't need to. The problem is more general than this, but that was just the example given. The original solution proposed in James' talk was to add functions to EGL/OpenGL/Vulkan and have the display server perform transitions where required. FWIW, I didn't intend to imply the display server should be the thing doing transitions. It is a possible implementation, but I assumed display servers would only do these transitions in fallback paths or as part of some in-between period before clients picked up on the need for them. Beyond the design goals you imply below, I wanted to note that it's more optimal to perform transitions in the client, and since transitions were intended to be persistent (paralleling Vulkan layout transitions), the compositor would need to transition back to the client's view of the image if the client hadn't picked up on the transition and agreed to handle it anyway, which would not be ideal and could cost additional perf in some cases. Discussions during the workshop at the start tended to having libliftoff handle all of this, but would require libliftoff to have its own rendering context, which I think is bloating the purpose of the library. Also discussed was to have libliftoff ask the compositor to perform the transition if it thinks it was possible. Another suggestion I made was to make use of Simon's dmabuf hints patch to the wp_linux_dmabuf protocol [1] and leave it up to the client's GPU driver to handle any transitions. This wasn't adequately represented in the lightning talk summarising the workshop, so I'll go over it here now, making sure everyone understands what it is and why I think it is the way we should go forward. Right now, a Wayland compositor will advertise all of the format+modifier pairs that it supports, but currently does not provide any context for clients as to which one they should actually choose. It's basically up to chance if a client is able to be scanned out and is likely to lead to several suboptimal situations. The dmabuf hints patch adds a way to suggest a better format to use, based on the current context. This is dynamic, and can be sent multiple times over the lifetime of a surface. The patch also adds a way for the compositor to tell the client which GPU its using, which is useful for clients to know in multi GPU situations. These hints are in various "tranches", which are just groups of format+modifier pairs of the same preference. The tranches are ordered from most optimal to least optimal. The most optimal tranche would imply direct scanout, while a less optimal tranche would imply compositing, but is not actually defined like that in the protocol. If a client becomes fullscreen, we would send the format+modifier pairs for the primary plane as the most optimal tranche. If a client is eligible to be scanned out on an overlay plane, we would send the format+modifier pairs for that plane. If a client is partially occluded or otherwise not possible to be scanned out, we'd just have the normal format+modifier pairs that we can use as a texture. Note that the compositor won't send format+modifier pairs which we cannot texture from, even if the plane advertises it's supported. We always need to be able to fall back to compositing. The hard part of figuring out which clients are "eligible" for being scanned out on an overlay plane could be handled by libliftoff (or something similar) and given back to the compositor to forward to clients. For libliftoff to make a properly informed decision, I think the atomic KMS API needs to be changed. We can only TEST_ONLY for valid buffers, testing the immediate configuration, but doesn't allow us to test for a configuration we WANT to go to. We need some sort of fake framebuffer not backed by any real memory, but will allow us to TEST_ONLY it. Without this, we may tell the client format+modifier pairs that we think will work for scanout, but don't due to whatever
Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
On 02/22/2018 01:16 PM, Alex Deucher wrote: On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizenwrote: On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg wrote: On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher wrote: On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace wrote: On Thu 21 Dec 2017, Daniel Vetter wrote: On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen < hoegsb...@google.com> wrote: On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico < mvicom...@nvidia.com> wrote: On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg < hoegsb...@gmail.com> wrote: I'd like to see concrete examples of actual display controllers supporting more format layouts than what can be specified with a 64 bit modifier. The main problem is our tiling and other metadata parameters can't generally fit in a modifier, so we find passing a blob of metadata a more suitable mechanism. I understand that you may have n knobs with a total of more than a total of 56 bits that configure your tiling/swizzling for color buffers. What I don't buy is that you need all those combinations when passing buffers around between codecs, cameras and display controllers. Even if you're sharing between the same 3D drivers in different processes, I expect just locking down, say, 64 different combinations (you can add more over time) and assigning each a modifier would be sufficient. I doubt you'd extract meaningful performance gains from going all the way to a blob. I agree with Kristian above. In my opinion, choosing to encode in modifiers a precise description of every possible tiling/compression layout is not technically incorrect, but I believe it misses the point. The intention behind modifiers is not to exhaustively describe all possibilites. I summarized this opinion in VK_EXT_image_drm_format_modifier, where I wrote an "introdution to modifiers" section. Here's an excerpt: One goal of modifiers in the Linux ecosystem is to enumerate for each vendor a reasonably sized set of tiling formats that are appropriate for images shared across processes, APIs, and/or devices, where each participating component may possibly be from different vendors. A non-goal is to enumerate all tiling formats supported by all vendors. Some tiling formats used internally by vendors are inappropriate for sharing; no modifiers should be assigned to such tiling formats. Where it gets tricky is how to select that subset? Our tiling mode are defined more by the asic specific constraints than the tiling mode itself. At a high level we have basically 3 tiling modes (out of 16 possible) that would be the minimum we'd want to expose for gfx6-8. gfx9 uses a completely new scheme. 1. Linear (per asic stride requirements, not usable by many hw blocks) 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick) 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible), tile split (7 possible), sample split (4 possible), num banks (4 possible), bank width (4 possible), bank height (4 possible), macro tile aspect (4 possible) all of which are asic config specific) I guess we could do something like: AMD_GFX6_LINEAR_ALIGNED_64B AMD_GFX6_LINEAR_ALIGNED_256B AMD_GFX6_LINEAR_ALIGNED_512B AMD_GFX6_1D_THIN_DISPLAY AMD_GFX6_1D_THIN_DEPTH AMD_GFX6_1D_THIN_ROTATED AMD_GFX6_1D_THIN_THIN AMD_GFX6_1D_THIN_THICK AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1 etc. We only probably need 40 bits to encode all of the tiling parameters so we could do family, plus tiling encoding that still seems unwieldy to deal with from an application
Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces
On 12/28/2017 10:24 AM, Miguel Angel Vico wrote: (Adding dri-devel back, and trying to respond to some comments from the different forks) James Jones wrote: Your worst case analysis above isn't far off from our HW, give or take some bits and axes here and there. We've started an internal discussion about how to lay out all the bits we need. It's hard to even enumerate them all without having a complete understanding of what capability sets are going to include, a fully-optimized implementation of the mechanism on our HW, and lot's of test scenarios though. (thanks James for most of the info below) To elaborate a bit, if we want to share an allocation across GPUs for 3D rendering, it seems we would need 12 bits to express our swizzling/tiling memory layouts for fermi+. In addition to that, maxwell uses 3 more bits for this, and we need an extra bit to identify pre-fermi representations. We also need one bit to differentiate between Tegra and desktop, and another one to indicate whether the layout is otherwise linear. Then things like whether compression is used (one more bit), and we can probably get by with 3 bits for the type of compression if we are creative. However, it'd be way easier to just track arch + page kind, which would be like 32 bits on its own. Not clear if this is an NV-only term, so for those not familiar, page kind is very loosely the equivalent of a format modifier our HW uses internally in its memory management subsystem. The value mappings vary a bit for each HW generation. Whether Z-culling and/or zero-bandwidth-clears are used may be another 3 bits. If device-local properties are included, we might need a couple more bits for caching. We may also need to express locality information, which may take at least another 2 or 3 bits. If we want to share array textures too, you also need to pass the array pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on its own. So yes, as James mentioned, with some effort, we could technically fit our current allocation parameters in a modifier, but I'm still not convinced this is as future proof as it could be as our hardware grows in capabilities. Daniel Stone wrote: So I reflexively get a bit itchy when I see the kernel being used to transit magic blobs of data which are supplied by userspace, and only interpreted by different userspace. Having tiling formats hidden away means that we've had real-world bugs in AMD hardware, where we end up displaying garbage because we cannot generically reason about the buffer attributes. I'm a bit confused. Can't modifiers be specified by vendors and only interpreted by drivers? My understanding was that modifiers could actually be treated as opaque 64-bit data, in which case they would qualify as "magic blobs of data". Otherwise, it seems this wouldn't be scalable. What am I missing? Daniel Vetter wrote: I think in the interim figuring out how to expose kms capabilities better (and necessarily standardizing at least some of them which matter at the compositor level, like size limits of framebuffers) feels like the place to push the ecosystem forward. In some way Miguel's proposal looks a bit backwards, since it adds the pitch capabilities to addfb, but at addfb time you've allocated everything already, so way too late to fix things up. With modifiers we've added a very simple per-plane property to list which modifiers can be combined with which pixel formats. Tiny start, but obviously very far from all that we'll need. Not sure whether I might be misunderstanding your statement, but one of the allocator main features is negotiation of nearly optimal allocation parameters given a set of uses on different devices/engines by the capability merge operation. A client should have queried what every device/engine is capable of for the given uses, find the optimal set of capabilities, and use it for allocating a buffer. At the moment these parameters are given to KMS, they are expected to be good. If they aren't, the client didn't do things right. Rob Clark wrote: It does seem like, if possible, starting out with modifiers for now at the kernel interface would make life easier, vs trying to reinvent both kernel and userspace APIs at the same time. Userspace APIs are easier to change or throw away. Presumably by the time we get to the point of changing kernel uabi, we are already using, and pretty happy with, serialized liballoc data over the wire in userspace so it is only a matter of changing the kernel interface. I guess we can indeed start with modifiers for now, if that's what it takes to get the allocator mechanisms rolling. However, it seems to me that we won't be able to encode the same type of information included in capability sets with modifiers in all cases. For instance, if we end up encoding usage transition information in capability sets, how that would translate to modifiers? I assume display doesn't really care about a lot o
Re: [rfc repost] drm sync objects - a new beginning (make ickle happier?)
On 04/19/2017 05:07 AM, Christian König wrote: Am 13.04.2017 um 03:41 schrieb Dave Airlie: Okay I've taken Chris's suggestions to heart and reworked things around a sem_file to see how they might look. This means the drm_syncobj are currently only useful for semaphores, the flags field could be used in future to use it for other things, and we can reintroduce some of the API then if needed. This refactors sync_file first to add some basic rcu wrappers about the fence pointer, as this point never updates this should all be fine unlocked. It then creates the sem_file with a mutex, and uses that to track the semaphores with reduced fops and the replace and get APIs. Then it reworks the drm stuff on top, and fixes amdgpu bug with old_fence. Let's see if anyone prefers one approach over the other. Yeah, I clearly prefer keeping only one object type for synchronization in the kernel. As I wrote in the other mail the argument of using the sync file for semaphores was to be able to use it as in fence with the atomic mode setting as well. This may introduce incompatibilities in userspace though, as the response to Dave's original series' pointed out. For example, the Vulkan extensions that allow importing sync files expect them to behave as sync files currently do, not as these new objects do. Introducing the new behavior would invalidate language in those specifications, causing problems with the very use case I suspect these changes are trying to address. Those specs are not finalized, so it could be fixed, but I think that highlights the general concern. That a wait consumes a previous signal should be a specific behavior of the operation and not the property of the object. In other words I'm fine with using the sync_file in a 1:1 fashion with Vulkan, but for the atomic API we probably want 1:N to be able to flip a rendering result on multiple CRTCs at the same time. Agreed, this usage seems valuable too. Sem files still have a fence in them, and that doesn't seem like an implementation detail that needs to be hidden from userspace. Vulkan solved this very issue by letting applications directly extract the sync_file fd from a Vulkan semaphore so they could use it with native operations that specifically require a sync file, via the experimental external semaphore extensions. Perhaps there could be a sem file -> sync file conversion operation with semantics similar to a Vulkan semaphore -> sync file export operation? Note the Vulkan semantics for this are in churn, so it might be worth holding off a bit on adding that interface if this is the path you use, but it shouldn't need to block this series from my high-level read. Thanks, -James Regards, Christian. Dave. ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Static inline DRM functions calling into GPL-only code
On 04/11/2017 09:09 AM, Harry Wentland wrote: On 2017-04-11 11:15 AM, James Jones wrote: On 04/10/2017 11:20 PM, Daniel Vetter wrote: On Tue, Apr 11, 2017 at 7:52 AM, Daniel Vetter <dan...@ffwll.ch> wrote: On Tue, Apr 11, 2017 at 6:14 AM, Nikhil Mahale <nmah...@nvidia.com> wrote: My name is Nikhil Mahale, and I work at NVIDIA in the Linux drivers team. I have been working on adding DRM KMS support to our driver. The NVIDIA GPU driver package (364.12 and higher) provides a kernel module, nvidia-drm.ko, which is licensed as "MIT". This module registers a DRM driver with the DRM subsystem of the Linux kernel and advertises KMS capability on Linux kernel v4.1 or higher, with CONFIG_DRM and CONFIG_DRM_KMS_HELPER enabled. We have been able to maintain compatibility between nvidia-drm.ko and Linux kernels from v2.6.9 to v4.10. Unfortunately with release candidates of v4.11: * Commit 10383aea2f445bce9b2a2b308def08134b438c8e changed the kernel's kref implementation to use refcount_inc and refcount_dec_and_test. * Commit 29dee3c03abce04cd527878ef5f9e5f91b7b83f4 made refcount_inc and refcount_dec_and_test EXPORT_SYMBOL_GPL. DRM drivers call refcount_inc through static inline function callchains such as: drm_crtc_commit_put() => kref_put() => refcount_dec_and_test() drm_crtc_commit_get() => kref_get() => refcount_inc() drm_atomic_state_put() => kref_put() => refcount_dec_and_test() drm_atomic_state_get() => kref_get() => refcount_inc() drm_gem_object_reference() => kref_get => refcount_inc() This causes nvidia-drm.ko to inadvertently pick up references to EXPORT_SYMBOL_GPL symbols. There is not interest in relaxing the export of refcount_inc, and changing the license of nvidia-drm.ko isn't viable right now. So, the remaining options we see are: * Make these static inline DRM functions EXPORT_SYMBOL instead of inline. * Make these static inline DRM functions not use kref. * Make nvidia-drm.ko not use these static inline DRM functions. None of those seem good, though the first might be least bad. Do any of those seem reasonable? * Open-source the nvidia kernel driver? tbh I'm not sure how much you can still make the case that your driver is fully an independent thing if you're adopting stuff like atomic modesetting. Might be better to make all the glue/remapping code from linux atomic to the shared cross-os code at least open As the original message stated, this code is already open (MIT license). Just out of curiosity, can I find this on any public repo or webpage? This is our usual Linux driver download landing page: https://www.nvidia.com/object/unix.html We don't break out the nvidia-drm source into a separate package like we do for some of our other open-source components, but it's included when you download the full driver. You can unpack it without installing, e.g: $ sh ~/Downloads/NVIDIA-Linux-x86_64-378.13.run -x Then it will be in ./NVIDIA-Linux-x86_64-378.13/kernel/nvidia-drm/ Feedback welcome. Thanks, -James If inlining is the issue it looks like this is not used by any upstream DRM driver (or DAL) directly but only from a bunch of atomic functions, none of which are inline. If this is an issue for NVidia would this also be an issue for any other MIT licensed code, such as drm_atomic_helper.c? Harry Thanks, -James ... And atomic is pretty much guaranteed to change all the time anyway, we're definitely not going to make a stable kabi for you folks, so you might want to do that for practical reasons anyway. Just my 2cents, personal opinion, not reflecting intel's, not legal advice, yadayada and all that :-) Apparently coffee didn't work yet, so let me retry the more serious part of my reply. I'd go with a shim that essentially remaps the linux atomic to whatever cross-os datastructures and semantics you have in the blob. That also has the benefit of insulating you a bit more from upstream changes in atomic (which will happen), and enthusiasts might get around to porting to new kernels before you do. Essentially pick the architecture of amd's DAL, then fully open the glue layer. With my maintainer hat on I'm at least not inclinced to add the "is this fair use or not" hacks on upstream's side, simply because sooner or later we'll break them and then we have the angry users, instead of nvidia. And that's the wrong place for bug reports for blobs :-) -Daniel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Static inline DRM functions calling into GPL-only code
On 04/10/2017 11:20 PM, Daniel Vetter wrote: On Tue, Apr 11, 2017 at 7:52 AM, Daniel Vetterwrote: On Tue, Apr 11, 2017 at 6:14 AM, Nikhil Mahale wrote: My name is Nikhil Mahale, and I work at NVIDIA in the Linux drivers team. I have been working on adding DRM KMS support to our driver. The NVIDIA GPU driver package (364.12 and higher) provides a kernel module, nvidia-drm.ko, which is licensed as "MIT". This module registers a DRM driver with the DRM subsystem of the Linux kernel and advertises KMS capability on Linux kernel v4.1 or higher, with CONFIG_DRM and CONFIG_DRM_KMS_HELPER enabled. We have been able to maintain compatibility between nvidia-drm.ko and Linux kernels from v2.6.9 to v4.10. Unfortunately with release candidates of v4.11: * Commit 10383aea2f445bce9b2a2b308def08134b438c8e changed the kernel's kref implementation to use refcount_inc and refcount_dec_and_test. * Commit 29dee3c03abce04cd527878ef5f9e5f91b7b83f4 made refcount_inc and refcount_dec_and_test EXPORT_SYMBOL_GPL. DRM drivers call refcount_inc through static inline function callchains such as: drm_crtc_commit_put() => kref_put() => refcount_dec_and_test() drm_crtc_commit_get() => kref_get() => refcount_inc() drm_atomic_state_put() => kref_put() => refcount_dec_and_test() drm_atomic_state_get() => kref_get() => refcount_inc() drm_gem_object_reference() => kref_get => refcount_inc() This causes nvidia-drm.ko to inadvertently pick up references to EXPORT_SYMBOL_GPL symbols. There is not interest in relaxing the export of refcount_inc, and changing the license of nvidia-drm.ko isn't viable right now. So, the remaining options we see are: * Make these static inline DRM functions EXPORT_SYMBOL instead of inline. * Make these static inline DRM functions not use kref. * Make nvidia-drm.ko not use these static inline DRM functions. None of those seem good, though the first might be least bad. Do any of those seem reasonable? * Open-source the nvidia kernel driver? tbh I'm not sure how much you can still make the case that your driver is fully an independent thing if you're adopting stuff like atomic modesetting. Might be better to make all the glue/remapping code from linux atomic to the shared cross-os code at least open As the original message stated, this code is already open (MIT license). Thanks, -James ... And atomic is pretty much guaranteed to change all the time anyway, we're definitely not going to make a stable kabi for you folks, so you might want to do that for practical reasons anyway. Just my 2cents, personal opinion, not reflecting intel's, not legal advice, yadayada and all that :-) Apparently coffee didn't work yet, so let me retry the more serious part of my reply. I'd go with a shim that essentially remaps the linux atomic to whatever cross-os datastructures and semantics you have in the blob. That also has the benefit of insulating you a bit more from upstream changes in atomic (which will happen), and enthusiasts might get around to porting to new kernels before you do. Essentially pick the architecture of amd's DAL, then fully open the glue layer. With my maintainer hat on I'm at least not inclinced to add the "is this fair use or not" hacks on upstream's side, simply because sooner or later we'll break them and then we have the angry users, instead of nvidia. And that's the wrong place for bug reports for blobs :-) -Daniel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Vulkan WSI+VK_KHR_display for KMS/DRM?
On 04/10/2017 12:32 PM, Jason Ekstrand wrote: On April 10, 2017 12:29:12 PM Chad Versacewrote: On Tue 04 Apr 2017, Keith Packard wrote: Jason Ekstrand writes: > Interesting question. To my knowledge, no one has actually implemented the > Vulkan WSI direct-to-display extensions. (I tried to prevent them from > getting released with 1.0 but failed.) I believe the correct answer is to > use the external memory dma-buf stuff that chad and I have been using and > talk directly to KMS. Sounds good, and minimizes the amount of code I have to write too :-) I found an implementation. Nvidia's 2017-04-06 Linux driver release notes claim newly added support for VK_EXT_direct_mode_diplay, which is layered atop VK_KHR_display. If it's useful to do so, we can always pull Keith's work into Mesa or even put it in a layer. Let's start with an implementation and figure out the Vulkan bits later. Of there's something interesting in NVIDIA's extensions, we can let that guide the design of course. http://www.nvidia.com/download/driverResults.aspx/117741/en-us https://www.khronos.org/registry/vulkan/specs/1.0-extensions/html/vkspec.html#VK_EXT_direct_mode_display > I see no good reason to have a large abstraction in > the middle. Other than 'it's a standard', neither do I. Yup. There's one good technical reason, at least on NVIDIA HW but I suspect others, and it's the same reason that spawned the EGLStream Vs. raw DRM-KMS debate: dma-buf+KMS doesn't let you transition to the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout, so rendering to/texturing from the dma-buf images won't be as optimal as rendering to VK_KHR_display images. You could solve that (and I intend to) with the combination of Vulkan + the generic allocator stuff we started discussing at XDC last year, but it'll take more work. No, I haven't stopped working on that, I just haven't had much time for it lately. I'll have updates from my side there soon. Besides that, the abstraction's primary purpose is the same as any abstraction: portability. Applications targeting it will work on platforms that don't have DRM-KMS. That's more useful if there's a DRM-KMS implementation too. I fully expect that you could implement it via a Vulkan implicit layer as suggested here once the external memory and dma-buf stuff is complete, and there'd be nothing sub-optimal about that if you could properly transition the layouts. Nothing wrong with that implementation path. It also shouldn't be a lot of code to add a native DRM-KMS implementation in Mesa and then lift it to a layer later, or write it as a Vulkan layer now and add optimization once the generic allocator + Vulkan interactions are worked out. Clean interaction with DRM-KMS was one of the goals of the spec. I know of two (maybe three, but I haven't confirmed the last) other shipping implementations besides ours BTW, so this isn't a de-facto NVIDIA-ism dressed up like a standard. I don't think the other implementations are currently publicly available. Thanks, -James ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Unix Device Memory Allocation project
On 01/03/2017 04:06 PM, Marek Olšák wrote: > On Wed, Jan 4, 2017 at 12:43 AM, James Jones wrote: >> On 01/03/2017 03:38 PM, Marek Olšák wrote: >>> >>> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter wrote: >>>> >>>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák wrote: >>>>>>> >>>>>>> We've had per buffer metadata in Radeon since KMS, which I believe >>>>>>> first >>>>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling >>>>>>> flags between Mesa, DDX, and the kernel display code. It was a widely >>>>>>> accepted solution back then and Red Hat was the main developer. So >>>>>>> yeah, >>>>>>> pretty much all people except Intel were collaborating on "sneaking" >>>>>>> this >>>>>>> in in 2009. I think radeon driver developers deserve an apology for >>>>>>> that >>>>>>> language. >>>>>>> >>>>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way >>>>>>> as >>>>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes >>>>>>> for use >>>>>>> by userspace drivers only. The kernel driver isn't supposed to read it >>>>>>> or >>>>>>> parse it. The format is negotiated between userspace driver developers >>>>>>> for >>>>>>> sharing of more complex allocations than 2D displayable surfaces. >>>>>> >>>>>> >>>>>> Metadata needed for kms (what Christian also pointed out) is what >>>>>> everyone >>>>>> did (intel included) and I think that's perfectly reasonable. And I was >>>>>> aware of that radeon is doing that since the dawn of ages since >>>>>> forever. >>>>>> >>>>>> What I think is not really ok is opaque metadata blobs that the kernel >>>>>> never ever inspect, but just carries around. That essentially means >>>>>> you're >>>>>> reimplementing some bad form of IPC, and I dont think that's something >>>>>> the >>>>>> drm subsystem (or dma-buf) really should be doing. Because you still >>>>>> have >>>>>> that real protocol in userspace (dri2/3, wayland, whatever), but now >>>>>> with >>>>>> a side channel with no documented ordering and synchronization. It gets >>>>>> the job done for single-vendor buffer metadata transport, but as soon >>>>>> as >>>>>> there's more than one vendor, or as soon as you need to reallocate >>>>>> buffers >>>>>> dynamically because the usage changes it gets bad imo (and I've seen >>>>>> what >>>>> >>>>> >>>>> The metadata is immutable after allocation, so it's not a >>>>> communication channel. There is no synchronization or ordering needed >>>>> for immutable metadata. That implies that a shared buffer can't be >>>>> reused for an entirely different purpose. It can only be used as-is or >>>>> freed. >>>>> >>>>> For suballocated memory, the idea is to reallocate it as a separate >>>>> buffer on the first "handle" export, so that shared suballocated >>>>> buffers don't exist. >>>> >>>> >>>> Yeah, once it becomes mutable the fun starts imo. I didn't realize >>>> that you're treating it strictly immutable since at least the kernel >>>> ioctl has both set and get (and that's the thing I looked at). >>>> Immutable stuff shouldn't be any problem (except that of course it >>>> won't work cross-driver in any fashion) >>>> >>>>>> that looks like on android in various forms). And that consensus (at >>>>>> least >>>>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day >>>>>> meeting we've had over 5 years ago. Not sure we're gaining anything >>>>>> with a >>>>>> "who's older" competition. >>>>>> >>>>>> Anyways it's there and it's uabi so will never disappear. Just wanted >>>>>> to >>>>>> make sure it's clea
Unix Device Memory Allocation project
On 01/03/2017 03:38 PM, Marek Olšák wrote: > On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter wrote: >> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák wrote: > We've had per buffer metadata in Radeon since KMS, which I believe first > appeared in 2009. It's 4 bytes large and is used to communicate tiling > flags between Mesa, DDX, and the kernel display code. It was a widely > accepted solution back then and Red Hat was the main developer. So yeah, > pretty much all people except Intel were collaborating on "sneaking" this > in in 2009. I think radeon driver developers deserve an apology for that > language. > > Amdgpu extended that metadata to 8 bytes and it's used in the same way as > radeon. Additionally, amdgpu added opaque metadata having 256 bytes for > use > by userspace drivers only. The kernel driver isn't supposed to read it or > parse it. The format is negotiated between userspace driver developers for > sharing of more complex allocations than 2D displayable surfaces. Metadata needed for kms (what Christian also pointed out) is what everyone did (intel included) and I think that's perfectly reasonable. And I was aware of that radeon is doing that since the dawn of ages since forever. What I think is not really ok is opaque metadata blobs that the kernel never ever inspect, but just carries around. That essentially means you're reimplementing some bad form of IPC, and I dont think that's something the drm subsystem (or dma-buf) really should be doing. Because you still have that real protocol in userspace (dri2/3, wayland, whatever), but now with a side channel with no documented ordering and synchronization. It gets the job done for single-vendor buffer metadata transport, but as soon as there's more than one vendor, or as soon as you need to reallocate buffers dynamically because the usage changes it gets bad imo (and I've seen what >>> >>> The metadata is immutable after allocation, so it's not a >>> communication channel. There is no synchronization or ordering needed >>> for immutable metadata. That implies that a shared buffer can't be >>> reused for an entirely different purpose. It can only be used as-is or >>> freed. >>> >>> For suballocated memory, the idea is to reallocate it as a separate >>> buffer on the first "handle" export, so that shared suballocated >>> buffers don't exist. >> >> Yeah, once it becomes mutable the fun starts imo. I didn't realize >> that you're treating it strictly immutable since at least the kernel >> ioctl has both set and get (and that's the thing I looked at). >> Immutable stuff shouldn't be any problem (except that of course it >> won't work cross-driver in any fashion) >> that looks like on android in various forms). And that consensus (at least among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day meeting we've had over 5 years ago. Not sure we're gaining anything with a "who's older" competition. Anyways it's there and it's uabi so will never disappear. Just wanted to make sure it's clear that for dma-buf we've discussed this years ago, and decided it wasn't a great idea. And I think that's still correct. >>> >>> The arguments against blob metadata sound reasonable to me. I'm pretty >>> sceptic that window system protocols will make driver-specific >>> metadata blobs redundant anytime soon though. It seems the protocols >>> don't get much attention nowadays and there is no incentive to do >>> things differently in that area. At least that's how it appears to me, >>> but I'm not involved in that. >> >> Folks are working on protocols again, at least I think the plan is to >> make all that shared buffer allocation dance also work over >> compositor/client situation (would be a bit pointless without that). >> And agreed there'll always be driver-specific stuff which is opaque to >> everyone else, but I hope at least in the future that all gets >> shuffled around through protocol extensions. And not in the way every >> Android gfx stack seems to work, where everyone has their own >> vendor-private ipc-over-dma-buf thing. Wayland definitely got this >> right, both protocol versioning and being able to add any kind of >> new/vendor-private protocol endpoints to any wayland protocol. X is a >> lot more pain, but since it finally looks like the world is switching >> away from it we might get away with a simpler protocol there. At >> least all the tricky reallocation dances seem to matter a lot more on >> mobile/tablets/phones, and there Wayland starts to rule. > > I've been thinking about it, and it looks like we're gonna continue > using immutable per-BO metadata (buffer layout, tiling description, > compression flags). The reasons are that everything else is less > economical, and the current "modifier" work done in EGL/GBM is > insufficient for our hardware - we need
Unix Device Memory Allocation project
Thanks for the detailed writeup, and it was good to meet you at XDC. Below: On 10/18/2016 04:40 PM, Marek Olšák wrote: > Hi, > > The text below describes how open source AMDGPU buffer sharing works. > I hope you'll find some useful bits in it. > > > Producer = allocates a buffer (or texture), and exports its handle > (DMABUF, etc.), and can use the buffer in various ways > > Consumer = imports the handle, and can use the buffer in various ways > > > *** Producer-consumer interaction. *** > > 1) On handle export, the producer receives these flags: > > - READ, WRITE, READ+WRITE: Describe the expected usage in the consumer. > * The producer decides if it needs to disable compression based on > those flags. > > - EXPLICIT_FLUSH flag: Meaning that the producer will explicitly > receive a "flush_resource" call before the consumer starts using the > buffer. This is a hint that the producer doesn't have to keep track of > "when to do decompression" when sharing the buffer with the consumer. > > > 2) Passing metadata (tiling, pixel ordering, format, layout) info > between the producer and consumer: > > - All AMDGPU buffer/texture allocations have 256 bytes (64 dwords) of > internal per-allocation metadata storage that lives in the kernel > space. There are amdgpu-specific ioctls that can "set" and "get" the > metadata. Any process that has a buffer handle can do that. > * The produces writes the metadata, the consumer reads it. > > - The producer-consumer interop API doesn't know about the metadata. > All you need to pass around is a buffer handle. (KMS, DMABUF, etc.) > * There was a note during the talk that DMABUF doesn't have any > metadata. Well, I just told you that it has, but it's private to > amdgpu and possibly accessible to other kernel drivers too. OK. I believe someone pointed this out during my talk or afterwards as well. Some drivers are using this method, but there seems to be some debate over whether this is the preferred general design. Others have told me this isn't the right mechanism to store this sort of metadata, but I'm not familiar with the specific counter arguments. > * We can build upon this idea. I think the worst thing to do would > be to add metadata handling to driver-agnostic userspace APIs. Really, > driver-agnostic APIs shouldn't know about that, because they can't > understand all the hw-specific information encoded in the metadata. > Also, when you want to change the metadata format, you only have to > update the affected drivers, not userspace APIs. How does this kernel-side metadata interact with userspace driver suballocation, or application-managed suballocation in APIs such as Vulkan? Thanks, -James > 3) Internal AMDGPU metadata storage format > - The header contains: Vendor ID, PCI ID, and version number. > - The header is followed by PCI-ID-specific data. The PCI ID and the > version number define the format. > - If the consumer runs on a different device, it must read the header > and parse the metadata based on that. It implies that the > driver-specific consumer code needs to know about all potential > producer devices. > > > Bottom line: DMABUF handles alone are fully sufficient for sharing > buffers/textures between devices and processes from the AMDGPU point > of view. > > HW driver implementation: The driver doesn't know anything about the > users of exported or imported buffers. It only acts based on the few > flags described in section 1. So far that's all we've needed. > > > *** Use cases *** > > 1) DRI (producer: application; consumer: X server) > - The producer receives these flags: READ, EXPLICIT_FLUSH. The X > server will treat the shared "texture" as read-only. EXPLICIT_FLUSH > ensures the texture can be compressed, and "flush_resource" will be > called as part of SwapBuffers and "glFlush: GL_FRONT". > - The X server can run on a different device. In that case, the window > system API passes the "LINEAR" flag to the driver during allocation. > That's suboptimal and fixable. > > > 2) OpenGL-OpenCL interop (OpenGL always exports handles, OpenCL always > imports handles) > - Possible flags: READ, WRITE, READ+WRITE > - OpenCL doesn't give us any other flags, so we are stuck with those. > - Inter-device sharing is possible if the consumer understands the > producer's metadata and tiling layouts. > > (amdgpu actually stores 2 different metadata blocks per allocation, > but the simpler one is too limited and has only 8 bytes) > > Marek > > > On Wed, Oct 5, 2016 at 1:47 AM, James Jones wrote: >> Hello everyone, >> >> As many are aware, we took up the issue of surface/memory a
Unix Device Memory Allocation project
Hello everyone, As many are aware, we took up the issue of surface/memory allocation at XDC this year. The outcome of that discussion was the beginnings of a design proposal for a library that would server as a cross-device, cross-process surface allocator. In the past week I've started to condense some of my notes from that discussion down to code & a design document. I've posted the first pieces to a github repository here: https://github.com/cubanismo/allocator This isn't anything close to usable code yet. Just headers and docs, and incomplete ones at that. However, feel free to check it out if you're interested in discussing the design. Thanks, -James
[RFC] Explicit synchronization for Nouveau
On 9/29/14 8:42 AM, Jerome Glisse wrote: > On Mon, Sep 29, 2014 at 09:43:02AM +0200, Daniel Vetter wrote: >> On Fri, Sep 26, 2014 at 01:00:05PM +0300, Lauri Peltonen wrote: >>> >>> Hi guys, >>> >>> >>> I'd like to start a new thread about explicit fence synchronization. This >>> time >>> with a Nouveau twist. :-) >>> >>> First, let me define what I understand by implicit/explicit sync: >>> >>> Implicit synchronization >>> * Fences are attached to buffers >>> * Kernel manages fences automatically based on buffer read/write access >>> >>> Explicit synchronization >>> * Fences are passed around independently >>> * Kernel takes and emits fences to/from user space when submitting work >>> >>> Implicit synchronization is already implemented in open source drivers, and >>> works well for most use cases. I don't seek to change any of that. My >>> proposal aims at allowing some drm drivers to operate in explicit sync mode >>> to >>> get maximal performance, while still remaining fully compatible with the >>> implicit paradigm. >> >> Yeah, pretty much what we have in mind on the i915 side too. I didn't look >> too closely at your patches, so just a few high level comments on your rfc >> here. >> >>> I will try to explain why I think we should support the explicit model as >>> well. >>> >>> >>> 1. Bindless graphics >>> >>> Bindless graphics is a central concept when trying to reduce the OpenGL >>> driver >>> overhead. The idea is that the application can bind a large set of buffers >>> to >>> the working set up front using extensions such as GL_ARB_bindless_texture, >>> and >>> they remain resident until the application releases them (note that compute >>> APIs have typically similar semantics). These working sets can be huge, >>> hundreds or even thousands of buffers, so we would like to opt out from the >>> per-submit overhead of acquiring locks, waiting for fences, and storing >>> fences. >>> Automatically synchronizing these working sets in kernel will also prevent >>> parallelism between channels that are sharing the working set (in fact >>> sharing >>> just one buffer from the working set will cause the jobs of the two >>> channels to >>> be serialized). >>> >>> 2. Evolution of graphics APIs >>> >>> The graphics API evolution seems to be going to a direction where game >>> engine >>> and middleware vendors demand more control over work submission and >>> synchronization. We expect that this trend will continue, and more and more >>> synchronization decisions will be pushed to the API level. OpenGL and EGL >>> already provide good explicit command stream level synchronization >>> primitives: >>> glFenceSync and EGL_KHR_wait_sync. Their use is also encouraged - for >>> example >>> EGL_KHR_image_base spec clearly states that the application is responsible >>> for >>> synchronizing accesses to EGLImages. If the API that is exposed to >>> developers >>> gives the control over synchronization to the developer, then implicit waits >>> that are inserted by the kernel are unnecessary and unexpected, and can >>> severely hurt performance. It also makes it easy for the developer to write >>> code that happens to work on Linux because of implicit sync, but will fail >>> on >>> other platforms. >>> >>> 3. Suballocation >>> >>> Using user space suballocation can help reduce the overhead when a large >>> number >>> of small textures are used. Synchronizing suballocated surfaces implicitly >>> in >>> kernel doesn't make sense - many channels should be able to access the same >>> kernel-level buffer object simultaneously. >>> >>> 4. Buffer sharing complications >>> >>> This is not really an argument for explicit sync as such, but I'd like to >>> point >>> out that sharing buffers across SoC engines is often much more complex than >>> just exporting and importing a dma-buf and waiting for the dma-buf fences. >>> Sometimes we need to do color format or tiling layout conversion. >>> Sometimes, >>> at least on Tegra, we need to decompress buffers when we pass them from the >>> GPU >>> to an engine that doesn't support framebuffer compression. These things are >>> not uncommon, particularly when we have SoC's that combine licensed IP >>> blocks >>> from different vendors. My point is that user space is already heavily >>> involved when sharing buffers between drivers, and giving it some more >>> control >>> over synchronization is not adding that much complexity. >>> >>> >>> Because of the above arguments, I think it makes sense to let some user >>> space >>> drm drivers opt out from implicit synchronization, while allowing them to >>> still >>> remain fully compatible with the rest of the drm world that uses implicit >>> synchronization. In practice, this would require three things: >>> >>> (1) Support passing fences (that are not tied to buffer objects) between >>> kernel >>> and user space. >>> >>> (2) Stop automatically storing fences to the buffers that user space wants >>> to >>>