Re: [PATCH v2 0/2] doc: uapi: Document dma-buf interop design & semantics

2023-08-03 Thread James Jones

On 8/3/23 08:47, Daniel Stone wrote:

Hi all,
This is v2 to the linked patch series; thanks to everyone for reviewing
the initial version. I've moved this out of a pure DRM scope and into
the general userspace-API design section. Hopefully it helps others and
answers a bunch of questions.


Again, thanks for writing this up. I think it is great to have all this 
knowledge collected in one place.


For the series:

Reviewed-by: James Jones 


I think it'd be great to have input/links/reflections from other
subsystems as well here.


Agreed, though I'll reiterate my comment on the v1 series from a few 
years ago: I hope this can be merged relatively soon with additional 
documentation added in follow-up patches as needed. While you can always 
note more interactions, details, etc., everything here appears to be 
correct from my understanding and is strictly an improvement over the 
current lack of documentation.


Thanks,
-James


Cheers,
Daniel




Re: DMA-heap driver hints

2023-01-25 Thread James Jones

On 1/24/23 15:14, T.J. Mercier wrote:

On Mon, Jan 23, 2023 at 11:49 PM Christian König
 wrote:


Am 24.01.23 um 04:56 schrieb James Jones:

On 1/23/23 08:58, Laurent Pinchart wrote:

Hi Christian,

On Mon, Jan 23, 2023 at 05:29:18PM +0100, Christian König wrote:

Am 23.01.23 um 14:55 schrieb Laurent Pinchart:

Hi Christian,

CC'ing James as I think this is related to his work on the unix device
memory allocator ([1]).


Thank you for including me.


Sorry for not having you in initially. I wasn't aware of your previous
work in this area.


No worries. I've embarrassingly made no progress here since the last XDC 
talk, so I wouldn't expect everyone to know or remember.





[1]
https://lore.kernel.org/dri-devel/8b555674-1c5b-c791-4547-2ea7c16ae...@nvidia.com/

On Mon, Jan 23, 2023 at 01:37:54PM +0100, Christian König wrote:

Hi guys,

this is just an RFC! The last time we discussed the DMA-buf coherency
problem [1] we concluded that DMA-heap first needs a better way to
communicate to userspace which heap to use for a certain device.

As far as I know userspace currently just hard codes that information
which is certainly not desirable considering that we should have this
inside the kernel as well.

So what those two patches here do is to first add some
dma_heap_create_device_link() and dma_heap_remove_device_link()
function and then demonstrating the functionality with uvcvideo
driver.

The preferred DMA-heap is represented with a symlink in sysfs between
the device and the virtual DMA-heap device node.


I'll start with a few high-level comments/questions:

- Instead of tying drivers to heaps, have you considered a system
where
 a driver would expose constraints, and a heap would then be
selected
 based on those constraints ? A tight coupling between heaps and
 drivers means downstream patches to drivers in order to use
 vendor-specific heaps, that sounds painful.


I was wondering the same thing as well, but came to the conclusion that
just the other way around is the less painful approach.


  From a kernel point of view, sure, it's simpler and thus less painful.
  From the point of view of solving the whole issue, I'm not sure :-)


The problem is that there are so many driver specific constrains that I
don't even know where to start from.


That's where I was hoping James would have some feedback for us, based
on the work he did on the Unix device memory allocator. If that's not
the case, we can brainstorm this from scratch.


Simon Ser's and my presentation from XDC 2020 focused entirely on
this. The idea was not to try to enumerate every constraint up front,
but rather to develop an extensible mechanism that would be flexible
enough to encapsulate many disparate types of constraints and perform
set operations on them (merging sets was the only operation we tried
to solve). Simon implemented a prototype header-only library to
implement the mechanism:

https://gitlab.freedesktop.org/emersion/drm-constraints

The links to the presentation and talk are below, along with notes
from the follow-up workshop.

https://lpc.events/event/9/contributions/615/attachments/704/1301/XDC_2020__Allocation_Constraints.pdf

https://www.youtube.com/watch?v=HZEClOP5TIk
https://paste.sr.ht/~emersion/c43b30be08bab1882f1b107402074462bba3b64a

Note one of the hard parts of this was figuring out how to express a
device or heap within the constraint structs. One of the better ideas
proposed back then was something like heap IDs, where dma heaps would
each have one,


We already have that. Each dma_heap has it's own unique name.


Cool.


and devices could register their own heaps (or even just themselves?)
with the heap subsystem and be assigned a locally-unique ID that
userspace could pass around.


I was more considering that we expose some kind of flag noting that a
certain device needs its buffer allocated from that device to utilize
all use cases.


This sounds similar to what you're proposing. Perhaps a reasonable
identifier is a device (major, minor) pair. Such a constraint could be
expressed as a symlink for easy visualization/discoverability from
userspace, but might be easier to serialize over the wire as the
(major, minor) pair. I'm not clear which direction is better to
express this either: As a link from heap->device, or device->heap.


 A constraint-based system would also, I think, be easier to extend
 with additional constraints in the future.

- I assume some drivers will be able to support multiple heaps. How do
 you envision this being implemented ?


I don't really see an use case for this.


One use case I know of here is same-vendor GPU local memory on
different GPUs. NVIDIA GPUs have certain things they can only do on
local memory, certain things they can do on all memory, and certain
things they can only do on memory local to another NVIDIA GPU,
especially when there exists an NVLink interface between the two. So
they'd ideally express different constraints fo

Re: DMA-heap driver hints

2023-01-23 Thread James Jones

On 1/23/23 08:58, Laurent Pinchart wrote:

Hi Christian,

On Mon, Jan 23, 2023 at 05:29:18PM +0100, Christian König wrote:

Am 23.01.23 um 14:55 schrieb Laurent Pinchart:

Hi Christian,

CC'ing James as I think this is related to his work on the unix device
memory allocator ([1]).


Thank you for including me.


[1] 
https://lore.kernel.org/dri-devel/8b555674-1c5b-c791-4547-2ea7c16ae...@nvidia.com/

On Mon, Jan 23, 2023 at 01:37:54PM +0100, Christian König wrote:

Hi guys,

this is just an RFC! The last time we discussed the DMA-buf coherency
problem [1] we concluded that DMA-heap first needs a better way to
communicate to userspace which heap to use for a certain device.

As far as I know userspace currently just hard codes that information
which is certainly not desirable considering that we should have this
inside the kernel as well.

So what those two patches here do is to first add some
dma_heap_create_device_link() and  dma_heap_remove_device_link()
function and then demonstrating the functionality with uvcvideo
driver.

The preferred DMA-heap is represented with a symlink in sysfs between
the device and the virtual DMA-heap device node.


I'll start with a few high-level comments/questions:

- Instead of tying drivers to heaps, have you considered a system where
a driver would expose constraints, and a heap would then be selected
based on those constraints ? A tight coupling between heaps and
drivers means downstream patches to drivers in order to use
vendor-specific heaps, that sounds painful.


I was wondering the same thing as well, but came to the conclusion that
just the other way around is the less painful approach.


 From a kernel point of view, sure, it's simpler and thus less painful.
 From the point of view of solving the whole issue, I'm not sure :-)


The problem is that there are so many driver specific constrains that I
don't even know where to start from.


That's where I was hoping James would have some feedback for us, based
on the work he did on the Unix device memory allocator. If that's not
the case, we can brainstorm this from scratch.


Simon Ser's and my presentation from XDC 2020 focused entirely on this. 
The idea was not to try to enumerate every constraint up front, but 
rather to develop an extensible mechanism that would be flexible enough 
to encapsulate many disparate types of constraints and perform set 
operations on them (merging sets was the only operation we tried to 
solve). Simon implemented a prototype header-only library to implement 
the mechanism:


https://gitlab.freedesktop.org/emersion/drm-constraints

The links to the presentation and talk are below, along with notes from 
the follow-up workshop.


https://lpc.events/event/9/contributions/615/attachments/704/1301/XDC_2020__Allocation_Constraints.pdf
https://www.youtube.com/watch?v=HZEClOP5TIk
https://paste.sr.ht/~emersion/c43b30be08bab1882f1b107402074462bba3b64a

Note one of the hard parts of this was figuring out how to express a 
device or heap within the constraint structs. One of the better ideas 
proposed back then was something like heap IDs, where dma heaps would 
each have one, and devices could register their own heaps (or even just 
themselves?) with the heap subsystem and be assigned a locally-unique ID 
that userspace could pass around. This sounds similar to what you're 
proposing. Perhaps a reasonable identifier is a device (major, minor) 
pair. Such a constraint could be expressed as a symlink for easy 
visualization/discoverability from userspace, but might be easier to 
serialize over the wire as the (major, minor) pair. I'm not clear which 
direction is better to express this either: As a link from heap->device, 
or device->heap.



A constraint-based system would also, I think, be easier to extend
with additional constraints in the future.

- I assume some drivers will be able to support multiple heaps. How do
you envision this being implemented ?


I don't really see an use case for this.


One use case I know of here is same-vendor GPU local memory on different 
GPUs. NVIDIA GPUs have certain things they can only do on local memory, 
certain things they can do on all memory, and certain things they can 
only do on memory local to another NVIDIA GPU, especially when there 
exists an NVLink interface between the two. So they'd ideally express 
different constraints for heap representing each of those.


The same thing is often true of memory on remote devices that are at 
various points in a PCIe topology. We've had situations where we could 
only get enough bandwidth between two PCIe devices when they were less 
than some number of hops away on the PCI tree. We hard-coded logic to 
detect that in our userspace drivers, but we could instead expose it as 
a constraint on heaps that would express which devices can accomplish 
certain operations as pairs.


Similarly to the last one, I would assume (But haven't yet run into in 
my personal experience) 

Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-11-08 Thread James Jones

On 9/8/21 2:44 AM, Simon Ser wrote:

stride



I think what's clear is:

- Per-plane property
- In bytes
- Offset between two consecutive rows

How that applies to weird YUV formats is the tricky question…


Btw. there was a fun argument whether the same modifier value could
mean different things on different devices. There were also arguments
that a certain modifier could reference additional implicit memory on
the device - memory that can only be accessed by very specific devices.

I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those.


A recent exmaple of this is [1].

[1]: https://patchwork.freedesktop.org/patch/452461/


What was the resolution to that argument?  It took some fiddling to get 
the NV format modifiers to be robust enough that they actually do 
differentiate "identical" layouts that actually mismatch between devices 
(E.g., some of our SoC GPUs interpret layouts differently than our 
discrete GPUs, so that's reflected in the format modifier-building macro 
and hence applications can properly deduce that they can *not* share 
images directly between these devices, but can share between two similar 
discrete GPUs), so I hope the modifier definition allows that. 
Cross-device sharing using tiled formats in machines with multiple 
similar NV GPUs was an important use case for modifiers on our side.


Thanks,
-James


Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-11-08 Thread James Jones

On 9/6/21 5:28 AM, Simon Ser wrote:

Since there's a lot of confusion around this, document both the rules
and the best practice around negotiating, allocating, importing, and
using buffers when crossing context/process/device/subsystem boundaries.

This ties up all of dmabuf, formats and modifiers, and their usage.

Signed-off-by: Daniel Stone 


Thanks a lot for this write-up! This looks very good to me, a few comments
below.


Agreed, it would be awesome if this were merged somewhere. IMHO, a lot 
of the non-trivial/typo suggestions below could be taken care of as 
follow-on patches, as the content here is better in than out, even if it 
could be clarified a bit.


Further feedback inline:


---

This is just a quick first draft, inspired by:
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637

It's not complete or perfect, but I'm off to eat a roast then have a
nice walk in the sun, so figured it'd be better to dash it off rather
than let it rot on my hard drive.


  .../gpu/exchanging-pixel-buffers.rst  | 285 ++
  Documentation/gpu/index.rst   |   1 +
  2 files changed, 286 insertions(+)
  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst

diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst 
b/Documentation/gpu/exchanging-pixel-buffers.rst
new file mode 100644
index ..75c4de13d5c8
--- /dev/null
+++ b/Documentation/gpu/exchanging-pixel-buffers.rst
@@ -0,0 +1,285 @@
+.. Copyright 2021 Collabora Ltd.
+
+
+Exchanging pixel buffers
+
+
+As originally designed, the Linux graphics subsystem had extremely limited
+support for sharing pixel-buffer allocations between processes, devices, and
+subsystems. Modern systems require extensive integration between all three
+classes; this document details how applications and kernel subsystems should
+approach this sharing for two-dimensional image data.
+
+It is written with reference to the DRM subsystem for GPU and display devices,
+V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
+support, however any other subsystems should also follow this design and 
advice.
+
+
+Formats and modifiers
+=
+
+Each buffer must have an underlying format. This format describes the data 
which
+can be stored and loaded for each pixel. Although each subsystem has its own
+format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should be


RST uses double backticks for inline code blocks (applies to the whole 
document).


+reused wherever possible, as they are the standard descriptions used for
+interchange.


Maybe mention that the canonical source of formats and modifiers can be found
in include/uapi/drm/drm_fourcc.h.


+Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
+the translation between one or more pixels in memory, and the color data
+contained within that memory. The number and type of color channels are


Pekka uses the term "color value", which I find a bit better than repeating
"data".


+described: whether they are RGB or YUV, integer or floating-point, the size
+of each channel and their locations within the pixel memory, and the
+relationship between color planes.
+
+For example, `DRM_FORMAT_ARGB` describes a format in which each pixel has a
+single 32-bit value in memory. Alpha, red, green, and blue, color channels are
+available at 8-byte precision per channel, ordered respectively from most to
+least significant bits in little-endian storage. As a more complex example,
+`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
+stored in separate memory planes, where the chroma plane is stored at half the
+resolution in both dimensions (i.e. one U/V chroma sample is stored for each 
2x2
+pixel grouping).
+
+Format modifiers describe a translation mechanism between these per-pixel 
memory
+samples, and the actual memory storage for the buffer. The most straightforward
+modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel 
has
+contiguous storage beginning at (0,0); each pixel's location in memory will be
+`base + (y * stride) + (x * bpp)`. This is considered the baseline interchange
+format, and most convenient for CPU access.


Hm, maybe in more simple terms we could explain that the pixels are stored
sequentially row-by-row from the top-left corner to the bottom-right one?


I wouldn't mention top-left. I'm not clear DRM_FORMAT_MOD_LINEAR 
excludes GL-style bottom-left-oriented images.



Maybe we can drop the "base" from the formula and say that each pixel's
location in memory will be at offset `y * stride + x * bpp`? Or maybe this is
confusing with offset being mentioned below as an additional parameter?


+Modern hardware employs much more sophisticated access mechanisms, typically
+making use of tiled access and possibly also compression. For example, the

Re: [PATCH 1/3] drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes

2021-01-19 Thread James Jones

Gah, yes, good catch.

Reviewed-by: James Jones 

On 1/18/21 5:54 PM, Lyude Paul wrote:

Nvidia hardware doesn't actually support using tiling formats with the
cursor plane, only linear is allowed. In the future, we should write a
testcase for this.

Fixes: c586f30bf74c ("drm/nouveau/kms: Add format mod prop to base/ovly/nvdisp")
Cc: James Jones 
Cc: Martin Peres 
Cc: Jeremy Cline 
Cc: Simon Ser 
Cc:  # v5.8+
Signed-off-by: Lyude Paul 
---
  drivers/gpu/drm/nouveau/dispnv50/wndw.c | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ce451242f79e..271de3a63f21 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -702,6 +702,11 @@ nv50_wndw_init(struct nv50_wndw *wndw)
nvif_notify_get(>notify);
  }
  
+static const u64 nv50_cursor_format_modifiers[] = {

+   DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_INVALID,
+};
+
  int
  nv50_wndw_new_(const struct nv50_wndw_func *func, struct drm_device *dev,
   enum drm_plane_type type, const char *name, int index,
@@ -713,6 +718,7 @@ nv50_wndw_new_(const struct nv50_wndw_func *func, struct 
drm_device *dev,
struct nvif_mmu *mmu = >client.mmu;
struct nv50_disp *disp = nv50_disp(dev);
struct nv50_wndw *wndw;
+   const u64 *format_modifiers;
int nformat;
int ret;
  
@@ -728,10 +734,13 @@ nv50_wndw_new_(const struct nv50_wndw_func *func, struct drm_device *dev,
  
  	for (nformat = 0; format[nformat]; nformat++);
  
-	ret = drm_universal_plane_init(dev, >plane, heads, _wndw,

-  format, nformat,
-  nouveau_display(dev)->format_modifiers,
-  type, "%s-%d", name, index);
+   if (type == DRM_PLANE_TYPE_CURSOR)
+   format_modifiers = nv50_cursor_format_modifiers;
+   else
+   format_modifiers = nouveau_display(dev)->format_modifiers;
+
+   ret = drm_universal_plane_init(dev, >plane, heads, _wndw, 
format, nformat,
+  format_modifiers, type, "%s-%d", name, 
index);
if (ret) {
kfree(*pwndw);
*pwndw = NULL;


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-09-01 Thread James Jones

On 9/1/20 3:59 AM, Karol Herbst wrote:

On Tue, Sep 1, 2020 at 9:13 AM Daniel Vetter  wrote:


On Tue, Aug 18, 2020 at 04:37:51PM +0200, Thierry Reding wrote:

On Fri, Aug 14, 2020 at 07:25:17PM +0200, Daniel Vetter wrote:

On Fri, Aug 14, 2020 at 7:17 PM Daniel Stone  wrote:


Hi,

On Fri, 14 Aug 2020 at 17:22, Thierry Reding  wrote:

I suspect that the reason why this works in X but not in Wayland is
because X passes the right usage flags, whereas Weston may not. But I'll
have to investigate more in order to be sure.


Weston allocates its own buffers for displaying the result of
composition through GBM with USE_SCANOUT, which is definitely correct.

Wayland clients (common to all compositors, in Mesa's
src/egl/drivers/dri2/platform_wayland.c) allocate with USE_SHARED but
_not_ USE_SCANOUT, which is correct in that they are guaranteed to be
shared, but not guaranteed to be scanned out. The expectation is that
non-scanout-compatible buffers would be rejected by gbm_bo_import if
not drmModeAddFB2.

One difference between Weston and all other compositors (GNOME Shell,
KWin, Sway, etc) is that Weston uses KMS planes for composition when
it can (i.e. when gbm_bo_import from dmabuf + drmModeAddFB2 from
gbm_bo handle + atomic check succeed), but the other compositors only
use the GPU. So if you have different assumptions about the layout of
imported buffers between the GPU and KMS, that would explain a fair
bit.


Yeah non-modifiered multi-gpu (of any kind) is pretty much hopeless I
think. I guess the only option is if the tegra mesa driver forces
linear and an extra copy on everything that's USE_SHARED or
USE_SCANOUT.


I ended up trying this, but this fails for the X case, unfortunately,
because there doesn't seem to be a good synchronization point at which
the de-tiling blit could be done. Weston and kmscube end up calling a
gallium driver's ->flush_resource() implementation, but that never
happens for X and glamor.

But after looking into this some more, I don't think that's even the
problem that we're facing here. The root of the problem that causes the
glxgears crash that Karol was originally reporting is because we end up
allocating the glxgears pixmaps using the dri3 loader from Mesa. But the
dri3 loader will unconditionally pass both __DRI_IMAGE_USE_SHARE and
__DRI_IMAGE_USE_SCANOUT, irrespective of whether the buffer will end up
being scanned out directly or whether it will be composited onto the
root window.

What exactly happens depends on whether I run glxgears in fullscreen
mode or windowed mode. In windowed mode, the glxgears buffers will be
composited onto the root window, so there's no need for the buffers to
be scanout-capable. If I modify the dri3 loader to not pass those flags
I can make this work just fine.

When I run glxgears in fullscreen mode, the modesetting driver ends up
wanting to display the glxgears buffer directly on screen, without
compositing it onto the root window. This ends up working if I leave out
the _USE_SHARE and _USE_SCANOUT flags, but I notice that the kernel then
complains about being unable to create a framebuffer, which in turn is
caused by the fact that those buffers are not exported (the Tegra Mesa
driver only exports/imports buffers that are meant for scanout, under
the assumption that those are the only ones that will ever need to be
used by KMS) and therefore Tegra DRM doesn't have a valid handle for
them.

So I think an ideal solution would probably be for glxgears to somehow
pass better usage information when allocating buffers, but I suspect
that that's just not possible, or would be way too much work and require
additional protocol at the DRI level, so it's not really a good option
when all we want to fix is backwards-compatibility with pre-modifiers
userspace.

Given that glamor also doesn't have any synchronization points, I don't
see how I can implement the de-tiling blit reliably. I was wondering if
it shouldn't be possible to flush the framebuffer resource (and perform
the blit) at presentation time, but I couldn't find a good entry point
to do this.

One other solution that occurred to me was to reintroduce an old IOCTL
that we used to have in the Tegra DRM driver. That IOCTL was meant to
attach tiling meta data to an imported buffer and was basically a
simplified, driver-specific way of doing framebuffer modifiers. That's
a very ugly solution, but it would allow us to be backwards-compatible
with pre-modifiers userspace and even use an optimal path for rendering
and scanning out. The only prerequisite would be that the driver IOCTL
was implemented and that a recent enough Mesa was used to make use of
it. I don't like this very much because framebuffer modifiers are a much
more generic solution, but all of the other options above are pretty
much just as ugly.

One other idea that I haven't explored yet is to be a little more clever
about the export/import dance that we do for buffers. Currently we
export/import at allocation time, and that seems to 

Re: [RFC] Experimental DMA-BUF Device Heaps

2020-08-23 Thread James Jones

On 8/23/20 1:46 PM, Laurent Pinchart wrote:

Hi James,

On Sun, Aug 23, 2020 at 01:04:43PM -0700, James Jones wrote:

On 8/20/20 1:15 AM, Ezequiel Garcia wrote:

On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote:

On 8/17/20 8:18 AM, Brian Starkey wrote:

On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote:

This heap is basically a wrapper around DMA-API dma_alloc_attrs,
which will allocate memory suitable for the given device.

The implementation is mostly a port of the Contiguous Videobuf2
memory allocator (see videobuf2/videobuf2-dma-contig.c)
over to the DMA-BUF Heap interface.

The intention of this allocator is to provide applications
with a more system-agnostic API: the only thing the application
needs to know is which device to get the buffer for.

Whether the buffer is backed by CMA, IOMMU or a DMA Pool
is unknown to the application.

I'm not really expecting this patch to be correct or even
a good idea, but just submitting it to start a discussion on DMA-BUF
heap discovery and negotiation.



My initial reaction is that I thought dmabuf heaps are meant for use
to allocate buffers for sharing across devices, which doesn't fit very
well with having per-device heaps.

For single-device allocations, would using the buffer allocation
functionality of that device's native API be better in most
cases? (Some other possibly relevant discussion at [1])

I can see that this can save some boilerplate for devices that want
to expose private chunks of memory, but might it also lead to 100
aliases for the system's generic coherent memory pool?

I wonder if a set of helpers to allow devices to expose whatever they
want with minimal effort would be better.


I'm rather interested on where this goes, as I was toying with using
some sort of heap ID as a basis for a "device-local" constraint in the
memory constraints proposals Simon and I will be discussing at XDC this
year.  It would be rather elegant if there was one type of heap ID used
universally throughout the kernel that could provide a unique handle for
the shared system memory heap(s), as well as accelerator-local heaps on
fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could
negotiate a location among themselves.  This patch seems to be a step
towards that in a way, but I agree it would be counterproductive if a
bunch of devices that were using the same underlying system memory ended
up each getting their own heap ID just because they used some SW
framework that worked that way.

Would appreciate it if you could send along a pointer to your BoF if it
happens!


Here is it:

https://linuxplumbersconf.org/event/7/contributions/818/

It would be great to see you there and discuss this,
given I was hoping we could talk about how to meet a
userspace allocator library expectations as well.


Thanks!  I hadn't registered for LPC and it looks like it's sold out,
but I'll try to watch the live stream.

This is very interesting, in that it looks like we're both trying to
solve roughly the same set of problems but approaching it from different
angles.  From what I gather, your approach is that a "heap" encompasses
all the allocation constraints a device may have.

The approach Simon Ser and I are tossing around so far is somewhat
different, but may potentially leverage dma-buf heaps a bit as well.

Our approach looks more like what I described at XDC a few years ago,
where memory constraints for a given device's usage of an image are
exposed up to applications, which can then somehow perform boolean
intersection/union operations on them to arrive at a common set of
constraints that describe something compatible with all the devices &
usages desired (or fail to do so, and fall back to copying things around
presumably).  I believe this is more flexible than your initial proposal
in that devices often support multiple usages (E.g., different formats,
different proprietary layouts represented by format modifiers, etc.),
and it avoids adding a combinatorial number of heaps to manage that.

In my view, heaps are more like blobs of memory that can be allocated
from in various different ways to satisfy constraints.  I realize heaps
mean something specific in the dma-buf heap design (specifically,
something closer to an association between an "allocation mechanism" and
"physical memory"), but I hope we don't have massive heap/allocator
mechanism proliferation due to constraints alone.  Perhaps some
constraints, such as contiguous memory or device-local memory, are
properly expressed as a specific heap, but consider the proliferation
implied by even that simple pair of examples: How do you express
contiguous device-local memory?  Do you need to spawn two heaps on the
underlying device-local memory, one for contiguous allocations and one
for non-contiguous allocations?  Seems excessive.

Of course, our approach also has downsides and is still being worked on.
   For example, it works best in an id

Re: [RFC] Experimental DMA-BUF Device Heaps

2020-08-23 Thread James Jones

On 8/20/20 1:15 AM, Ezequiel Garcia wrote:

On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote:

On 8/17/20 8:18 AM, Brian Starkey wrote:

Hi Ezequiel,

On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote:

This heap is basically a wrapper around DMA-API dma_alloc_attrs,
which will allocate memory suitable for the given device.

The implementation is mostly a port of the Contiguous Videobuf2
memory allocator (see videobuf2/videobuf2-dma-contig.c)
over to the DMA-BUF Heap interface.

The intention of this allocator is to provide applications
with a more system-agnostic API: the only thing the application
needs to know is which device to get the buffer for.

Whether the buffer is backed by CMA, IOMMU or a DMA Pool
is unknown to the application.

I'm not really expecting this patch to be correct or even
a good idea, but just submitting it to start a discussion on DMA-BUF
heap discovery and negotiation.



My initial reaction is that I thought dmabuf heaps are meant for use
to allocate buffers for sharing across devices, which doesn't fit very
well with having per-device heaps.

For single-device allocations, would using the buffer allocation
functionality of that device's native API be better in most
cases? (Some other possibly relevant discussion at [1])

I can see that this can save some boilerplate for devices that want
to expose private chunks of memory, but might it also lead to 100
aliases for the system's generic coherent memory pool?

I wonder if a set of helpers to allow devices to expose whatever they
want with minimal effort would be better.


I'm rather interested on where this goes, as I was toying with using
some sort of heap ID as a basis for a "device-local" constraint in the
memory constraints proposals Simon and I will be discussing at XDC this
year.  It would be rather elegant if there was one type of heap ID used
universally throughout the kernel that could provide a unique handle for
the shared system memory heap(s), as well as accelerator-local heaps on
fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could
negotiate a location among themselves.  This patch seems to be a step
towards that in a way, but I agree it would be counterproductive if a
bunch of devices that were using the same underlying system memory ended
up each getting their own heap ID just because they used some SW
framework that worked that way.

Would appreciate it if you could send along a pointer to your BoF if it
happens!



Here is it:

https://linuxplumbersconf.org/event/7/contributions/818/

It would be great to see you there and discuss this,
given I was hoping we could talk about how to meet a
userspace allocator library expectations as well.


Thanks!  I hadn't registered for LPC and it looks like it's sold out, 
but I'll try to watch the live stream.


This is very interesting, in that it looks like we're both trying to 
solve roughly the same set of problems but approaching it from different 
angles.  From what I gather, your approach is that a "heap" encompasses 
all the allocation constraints a device may have.


The approach Simon Ser and I are tossing around so far is somewhat 
different, but may potentially leverage dma-buf heaps a bit as well.


Our approach looks more like what I described at XDC a few years ago, 
where memory constraints for a given device's usage of an image are 
exposed up to applications, which can then somehow perform boolean 
intersection/union operations on them to arrive at a common set of 
constraints that describe something compatible with all the devices & 
usages desired (or fail to do so, and fall back to copying things around 
presumably).  I believe this is more flexible than your initial proposal 
in that devices often support multiple usages (E.g., different formats, 
different proprietary layouts represented by format modifiers, etc.), 
and it avoids adding a combinatorial number of heaps to manage that.


In my view, heaps are more like blobs of memory that can be allocated 
from in various different ways to satisfy constraints.  I realize heaps 
mean something specific in the dma-buf heap design (specifically, 
something closer to an association between an "allocation mechanism" and 
"physical memory"), but I hope we don't have massive heap/allocator 
mechanism proliferation due to constraints alone.  Perhaps some 
constraints, such as contiguous memory or device-local memory, are 
properly expressed as a specific heap, but consider the proliferation 
implied by even that simple pair of examples: How do you express 
contiguous device-local memory?  Do you need to spawn two heaps on the 
underlying device-local memory, one for contiguous allocations and one 
for non-contiguous allocations?  Seems excessive.


Of course, our approach also has downsides and is still being worked on. 
 For example, it works best in an ideal world where all the allocators 
available understand all the constraint

Re: [RFC] Experimental DMA-BUF Device Heaps

2020-08-17 Thread James Jones

On 8/17/20 8:18 AM, Brian Starkey wrote:

Hi Ezequiel,

On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote:

This heap is basically a wrapper around DMA-API dma_alloc_attrs,
which will allocate memory suitable for the given device.

The implementation is mostly a port of the Contiguous Videobuf2
memory allocator (see videobuf2/videobuf2-dma-contig.c)
over to the DMA-BUF Heap interface.

The intention of this allocator is to provide applications
with a more system-agnostic API: the only thing the application
needs to know is which device to get the buffer for.

Whether the buffer is backed by CMA, IOMMU or a DMA Pool
is unknown to the application.

I'm not really expecting this patch to be correct or even
a good idea, but just submitting it to start a discussion on DMA-BUF
heap discovery and negotiation.



My initial reaction is that I thought dmabuf heaps are meant for use
to allocate buffers for sharing across devices, which doesn't fit very
well with having per-device heaps.

For single-device allocations, would using the buffer allocation
functionality of that device's native API be better in most
cases? (Some other possibly relevant discussion at [1])

I can see that this can save some boilerplate for devices that want
to expose private chunks of memory, but might it also lead to 100
aliases for the system's generic coherent memory pool?

I wonder if a set of helpers to allow devices to expose whatever they
want with minimal effort would be better.


I'm rather interested on where this goes, as I was toying with using 
some sort of heap ID as a basis for a "device-local" constraint in the 
memory constraints proposals Simon and I will be discussing at XDC this 
year.  It would be rather elegant if there was one type of heap ID used 
universally throughout the kernel that could provide a unique handle for 
the shared system memory heap(s), as well as accelerator-local heaps on 
fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could 
negotiate a location among themselves.  This patch seems to be a step 
towards that in a way, but I agree it would be counterproductive if a 
bunch of devices that were using the same underlying system memory ended 
up each getting their own heap ID just because they used some SW 
framework that worked that way.


Would appreciate it if you could send along a pointer to your BoF if it 
happens!


Thanks,
-James


Cheers,
-Brian

1. 
https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a40...@kapsi.fi/


Given Plumbers is just a couple weeks from now, I've submitted
a BoF proposal to discuss this, as perhaps it would make
sense to discuss this live?

Not-signed-off-by: Ezequiel Garcia 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-08-13 Thread James Jones
I'll defer to Thierry, but I think that may be by design.  Tegra format 
modifiers were added to get things like this working in the first place, 
right?  It's not a regression, is it?


Thanks,
-James

On 8/13/20 10:19 AM, Karol Herbst wrote:

another thing: with gsettings set org.gnome.mutter
experimental-features '["kms-modifiers"]' it all just works out of the
box with wayland, but that won't be enabled for quite some time, so we
need to figure out what is broken (less so with my patch) under
wayland with gnome :)

On Thu, Aug 13, 2020 at 5:39 PM Karol Herbst  wrote:


btw, I just noticed that wayland with gnome-shell is totally busted.
With this MR it at least displays something, but without it doesn't
work at all.

On Thu, Aug 13, 2020 at 3:00 PM Karol Herbst  wrote:


At least for now I've created an MR to revert the change:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6300

But it seems like there was probably a good reason why it got added?
Happy to have better fixes, but that's the best we've got so far I
think?

Thierry, what do you think?

On Wed, Aug 12, 2020 at 8:51 PM Karol Herbst  wrote:


in case you all were wondering, it works on xorg-server git because of
this commit: 
https://gitlab.freedesktop.org/xorg/xserver/-/commit/9b8999411033c9473cd68e92e4690a91aecf5b95

On Wed, Aug 12, 2020 at 8:25 PM James Jones  wrote:


On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote:

...and in merging my code with Alyssa's new panfrost format modifier
support, I see panfrost does the opposite of this and treats a format
modifier list of only INVALID as "don't care".  I modeled the new nouveau
behavior on the Iris driver.  Now I'm not sure which is correct :-(


and neither am I. Uh-oh.

I modeled the panfrost code after v3d_resource_create_with_modifiers,
which treats INVALID as "don't care". I can confirm the panfrost code
works (in the sense that it's functional on the machines I've tested),
but I don't know if it is actually correct. I think it is, since
otherwise you end up using linear in places it's unnecessary, but I'm
not sure where this is spec'd.


It would depend on whether an app actually calls the function this way,
and whether that app was tested I suppose.  If I'm interpreting the Iris
code correctly and it doesn't break anything, then I'm assuming both
implementations are equally valid in that nothing exercises this path,
but it would be good to have the intended behavior documented somewhere
so we can try to work towards consistent in case someone tries it in the
future.

My nouveau change runs afoul of assumptions in the tegra driver, but
that's easy enough to fix in lockstep if desired.

Also, heads up: I'll ping you on my format modifier cleanup MR once I've
pushed the latest version.  The panfrost modifier usage was harder to
merge into the refactoring than most, so it'll be good to have your
review and if you have time, some testing.  I think I landed on an
elegant solution, but open to suggestions.

Thanks,
-James



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-08-12 Thread James Jones

On 8/12/20 10:40 AM, Alyssa Rosenzweig wrote:

...and in merging my code with Alyssa's new panfrost format modifier
support, I see panfrost does the opposite of this and treats a format
modifier list of only INVALID as "don't care".  I modeled the new nouveau
behavior on the Iris driver.  Now I'm not sure which is correct :-(


and neither am I. Uh-oh.

I modeled the panfrost code after v3d_resource_create_with_modifiers,
which treats INVALID as "don't care". I can confirm the panfrost code
works (in the sense that it's functional on the machines I've tested),
but I don't know if it is actually correct. I think it is, since
otherwise you end up using linear in places it's unnecessary, but I'm
not sure where this is spec'd.


It would depend on whether an app actually calls the function this way, 
and whether that app was tested I suppose.  If I'm interpreting the Iris 
code correctly and it doesn't break anything, then I'm assuming both 
implementations are equally valid in that nothing exercises this path, 
but it would be good to have the intended behavior documented somewhere 
so we can try to work towards consistent in case someone tries it in the 
future.


My nouveau change runs afoul of assumptions in the tegra driver, but 
that's easy enough to fix in lockstep if desired.


Also, heads up: I'll ping you on my format modifier cleanup MR once I've 
pushed the latest version.  The panfrost modifier usage was harder to 
merge into the refactoring than most, so it'll be good to have your 
review and if you have time, some testing.  I think I landed on an 
elegant solution, but open to suggestions.


Thanks,
-James
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-08-12 Thread James Jones

On 8/12/20 10:10 AM, Karol Herbst wrote:

On Wed, Aug 12, 2020 at 7:03 PM James Jones  wrote:


On 8/12/20 5:37 AM, Ilia Mirkin wrote:

On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst  wrote:


On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst  wrote:


On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst  wrote:


On Wed, Aug 12, 2020 at 2:19 AM James Jones  wrote:


Sorry for the slow reply here as well.  I've been in the process of
rebasing and reworking the userspace patches.  I'm not clear my changes
will address the Jetson Nano issue, but if you'd like to try them, the
latest userspace changes are available here:

 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724

And the tegra-drm kernel patches are here:


https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/

Those + the kernel changes addressed in this thread are everything I had
outstanding.



I don't know if that's caused by your changes or not, but now the
assert I hit is a different one pointing out that
nvc0_miptree_select_best_modifier fails in a certain case and returns
MOD_INVALID... anyway, it seems like with your patches applied it's
now way easier to debug and figure out what's going wrong, so maybe I
can figure it out now :)



collected some information which might help to track it down.

src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf)

templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0
= 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target =
PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples =
0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0}

inside tegra_screen_resource_create modifier says
DRM_FORMAT_MOD_INVALID as template->bind is 1

and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID,
so the call just returns NULL leading to the assert.

Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears.



So I digged a bit deeper and here is what tripps it of:

when the context gets made current, the normal framebuffer validation
and render buffer allocation is done, but we end up inside
tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set
in template->bind. Now the tegra driver forces the
DRM_FORMAT_MOD_LINEAR modifier and calls into
resource_create_with_modifiers.

If it wouldn't do that, nouveau would allocate a tiled buffer, with
that it's linear and we at some point end up with an assert about a
depth_stencil buffer being there even though it shouldn't. If I always
use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things
just work.

That's kind of the cause I pinpointed the issue down to. But I have no
idea what's supposed to happen and what the actual bug is.


Yeah, the bug with tegra has always been "trying to render to linear
color + tiled depth", which the hardware plain doesn't support. (And
linear depth isn't a thing.)

Question is whether what it's doing necessary. PIPE_BIND_SCANOUT
(/linear) requirements are needed for DRI2 to work (well, maybe not in
theory, but at least in practice the nouveau ddx expects linear
buffers). However tegra operates on a more DRI3-like basis, so with
"client" allocations, tiled should work OK as long as there's
something in tegra to copy it to linear when necessary?


I can confirm the above: Our hardware can't render to linear depth
buffers, nor can it mix linear color buffers with block linear depth
buffers.

I think there's a misunderstanding on expected behavior of
resource_create_with_modifiers() here too:
tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the
only modifier in non-scanout cases.  Previously, I believe nouveau may
have treated that as "no modifiers specified.  Fall back to internal
layout selection logic", but in my patches I "fixed" it to match other
drivers' behavior, in that allocation will fail if that is the only
modifier in the list, since it is equivalent to passing in a list
containing only unsupported modifiers.  To get fallback behavior,
tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers,
count), or just call resource_create() on the underlying screen instead.


...and in merging my code with Alyssa's new panfrost format modifier 
support, I see panfrost does the opposite of this and treats a format 
modifier list of only INVALID as "don't care".  I modeled the new 
nouveau behavior on the Iris driver.  Now I'm not sure which is correct :-(


Thanks,
-James


Beyond that, I can only offer my thoughts based on analysis of the code
referenced here so far:

While I've learned from the origins of this thread applications/things
external to Mesa in general shouldn't be querying format modifiers of
buffers created without format modifiers, tegra is a Mesa internal
component that already has some intimate knowledge of how the nouveau
driver it sits on top of works.  Nouveau will always be able to
const

Re: [git pull] drm for 5.8-rc1

2020-08-12 Thread James Jones

On 8/12/20 5:37 AM, Ilia Mirkin wrote:

On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst  wrote:


On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst  wrote:


On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst  wrote:


On Wed, Aug 12, 2020 at 2:19 AM James Jones  wrote:


Sorry for the slow reply here as well.  I've been in the process of
rebasing and reworking the userspace patches.  I'm not clear my changes
will address the Jetson Nano issue, but if you'd like to try them, the
latest userspace changes are available here:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724

And the tegra-drm kernel patches are here:


https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/

Those + the kernel changes addressed in this thread are everything I had
outstanding.



I don't know if that's caused by your changes or not, but now the
assert I hit is a different one pointing out that
nvc0_miptree_select_best_modifier fails in a certain case and returns
MOD_INVALID... anyway, it seems like with your patches applied it's
now way easier to debug and figure out what's going wrong, so maybe I
can figure it out now :)



collected some information which might help to track it down.

src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf)

templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0
= 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target =
PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples =
0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0}

inside tegra_screen_resource_create modifier says
DRM_FORMAT_MOD_INVALID as template->bind is 1

and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID,
so the call just returns NULL leading to the assert.

Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears.



So I digged a bit deeper and here is what tripps it of:

when the context gets made current, the normal framebuffer validation
and render buffer allocation is done, but we end up inside
tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set
in template->bind. Now the tegra driver forces the
DRM_FORMAT_MOD_LINEAR modifier and calls into
resource_create_with_modifiers.

If it wouldn't do that, nouveau would allocate a tiled buffer, with
that it's linear and we at some point end up with an assert about a
depth_stencil buffer being there even though it shouldn't. If I always
use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things
just work.

That's kind of the cause I pinpointed the issue down to. But I have no
idea what's supposed to happen and what the actual bug is.


Yeah, the bug with tegra has always been "trying to render to linear
color + tiled depth", which the hardware plain doesn't support. (And
linear depth isn't a thing.)

Question is whether what it's doing necessary. PIPE_BIND_SCANOUT
(/linear) requirements are needed for DRI2 to work (well, maybe not in
theory, but at least in practice the nouveau ddx expects linear
buffers). However tegra operates on a more DRI3-like basis, so with
"client" allocations, tiled should work OK as long as there's
something in tegra to copy it to linear when necessary?


I can confirm the above: Our hardware can't render to linear depth 
buffers, nor can it mix linear color buffers with block linear depth 
buffers.


I think there's a misunderstanding on expected behavior of 
resource_create_with_modifiers() here too: 
tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the 
only modifier in non-scanout cases.  Previously, I believe nouveau may 
have treated that as "no modifiers specified.  Fall back to internal 
layout selection logic", but in my patches I "fixed" it to match other 
drivers' behavior, in that allocation will fail if that is the only 
modifier in the list, since it is equivalent to passing in a list 
containing only unsupported modifiers.  To get fallback behavior, 
tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers, 
count), or just call resource_create() on the underlying screen instead.


Beyond that, I can only offer my thoughts based on analysis of the code 
referenced here so far:


While I've learned from the origins of this thread applications/things 
external to Mesa in general shouldn't be querying format modifiers of 
buffers created without format modifiers, tegra is a Mesa internal 
component that already has some intimate knowledge of how the nouveau 
driver it sits on top of works.  Nouveau will always be able to 
construct and return a valid format modifier for unorm single sampled 
color buffers (and hopefully, anything that can scan out going forward), 
both before and after my patches I believe, regardless of how they were 
allocated.  After my patches, it should even work for things that can't 
scan out in theory.  Hence, looking at this without knowledge of what 
motivated the original changes, it s

Re: [git pull] drm for 5.8-rc1

2020-08-11 Thread James Jones
Sorry for the slow reply here as well.  I've been in the process of 
rebasing and reworking the userspace patches.  I'm not clear my changes 
will address the Jetson Nano issue, but if you'd like to try them, the 
latest userspace changes are available here:


  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724

And the tegra-drm kernel patches are here:


https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/

Those + the kernel changes addressed in this thread are everything I had 
outstanding.


Thanks,
-James

On 8/4/20 1:58 AM, Karol Herbst wrote:

Hi James,

I don't know if you knew, but on the Jetson nano we had the issue for
quite some time, that GLX/EGL through mesa on X was broken due to some
fix in mesa related to modifiers.

And I was wondering if the overall state just caused the issue we saw
here and wanted to know what branches/patches I needed for the various
projects to see if the work you have been doing since the last
upstream nouveau regression would be of any help here?

Mind pointing me towards everything I'd need to check that?

I'd really like to fix this, but didn't have the time to investigate
what the core problem here was, but I think it's very likely that a
fixed/improved modifier support could actually fix it as well.
Alternately I'd like to move to kmsro in mesa as this fixes it as
well, but that could just be by coincidence and would break other
devices..

Thanks

On Tue, Jul 14, 2020 at 4:32 PM James Jones  wrote:


Still testing.  I'll get a Sign-off version out this week unless I find
a problem.

Thanks,
-James

On 7/12/20 6:37 PM, Dave Airlie wrote:

How are we going with a fix for this regression I can commit?

Dave.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v4] drm/nouveau: Accept 'legacy' format modifiers

2020-07-30 Thread James Jones
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..07373bbc2acf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,6 +191,7 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
 {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
 
if (modifier == DRM_FORMAT_MOD_LINEAR) {
@@ -202,6 +203,12 @@ nouveau_decode_mod(struct nouveau_drm *drm,
 * Extract the block height and kind from the corresponding
 * modifier fields.  See drm_fourcc.h for details.
 */
+
+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this dev's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
*tile_mode = (uint32_t)(modifier & 0xF);
*kind = (uint8_t)((modifier >> 12) & 0xFF);
 
@@ -227,6 +234,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
 }
 
+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
 static int
 nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +264,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
 
-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
 
nouveau_decode_mod(drm, modifier, tile_mode, kind);
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3] drm/nouveau: Accept 'legacy' format modifiers

2020-07-30 Thread James Jones

On 7/30/20 3:19 PM, Kirill A. Shutemov wrote:

On Thu, Jul 30, 2020 at 10:26:17AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
kmscube hacked to use linear mod, and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
  {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
  
+	if ((modifier & (0xffull << 12)) == 0ull) {

+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;


Em. I thought Ben's suggestion was to move it under != MOD_LINEAR. I don't
see it here.


Yes, it looks like I forgot to commit before generating the patch.  v4 sent.

Thanks,
-James

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH v2] drm/nouveau: Accept 'legacy' format modifiers

2020-07-30 Thread James Jones

On 7/29/20 7:47 AM, Kirill A. Shutemov wrote:

On Wed, Jul 29, 2020 at 01:40:13PM +1000, Ben Skeggs wrote:

On Wed, 29 Jul 2020 at 12:48, Dave Airlie  wrote:


On Tue, 28 Jul 2020 at 04:51, James Jones  wrote:


On 7/23/20 9:06 PM, Ben Skeggs wrote:

On Sat, 18 Jul 2020 at 13:34, James Jones  wrote:


Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
   drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
   1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
 uint32_t *tile_mode,
 uint8_t *kind)
   {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
  BUG_ON(!tile_mode || !kind);

+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }

I believe this should be moved into the != MOD_LINEAR case.


Yes, of course, thanks.  I need to re-evaluate my testing yet again to
make sure I hit that case too.  Preparing a v3...


Going to need something here in the next day, two max.

Linus may wait for another week, but it's not guaranteed.

I tested a whole bunch of GPUs before sending nouveau's -next tree,
and with the change I suggested to this patch + the other stuff I sent
through -fixes already, things seemed to be in OK shape.


JFYI, the adjusted (moved into != MOD_LINEAR case) patch works fine for me
on top of drm-fixes-2020-07-29.


Sorry again for the delays (life is terrible lately), but the signed-off 
version with Ben's suggestion went out this morning, and I specifically 
tested linear modifiers in addition to retesting all the other test 
cases mentioned in the patch.


Thanks,
-James
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3] drm/nouveau: Accept 'legacy' format modifiers

2020-07-30 Thread James Jones
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
kmscube hacked to use linear mod, and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
 {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
 
+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
 }
 
+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
 static int
 nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
 
-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
 
nouveau_decode_mod(drm, modifier, tile_mode, kind);
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH v2] drm/nouveau: Accept 'legacy' format modifiers

2020-07-27 Thread James Jones

On 7/23/20 9:06 PM, Ben Skeggs wrote:

On Sat, 18 Jul 2020 at 13:34, James Jones  wrote:


Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
uint32_t *tile_mode,
uint8_t *kind)
  {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
 BUG_ON(!tile_mode || !kind);

+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }

I believe this should be moved into the != MOD_LINEAR case.


Yes, of course, thanks.  I need to re-evaluate my testing yet again to 
make sure I hit that case too.  Preparing a v3...


Thanks,
-James


+
 if (modifier == DRM_FORMAT_MOD_LINEAR) {
 /* tile_mode will not be used in this case */
 *tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
 }
  }

+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
  static int
  nouveau_validate_decode_mod(struct nouveau_drm *drm,
 uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
  (disp->format_modifiers[mod] != modifier);
  mod++);

-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }

 nouveau_decode_mod(drm, modifier, tile_mode, kind);

--
2.17.1

___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
Did you just cherry-pick my change, or were you running the latest 
drm-next or drm-fixes code?  There do appear to be various MM-related 
fixes that may be related to this in drm-fixes when I scroll down the 
log looking for nouveau stuff.  Shot in the dark, but might be worth 
trying with Dave's tree if you weren't already.  I was testing with 
drm-fixes-2020-07-17-1 from here:


git://anongit.freedesktop.org/drm/drm

Thanks,
-James

On 7/17/20 8:13 PM, James Jones wrote:

This doesn't look related.

-James

On 7/17/20 2:30 PM, Kirill A. Shutemov wrote:

On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 


I tried and it crashes. Not sure if it's related.

[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:vblank_disable_fn] disabling vblank on crtc 0
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
BUG: unable to handle page fault for address: 059c
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 13 PID: 3351 Comm: alacritty Tainted: G  I   
5.8.0-rc5-00191-g086f86c033f9 #53
Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 
Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F5d 11/28/2019
RIP: 0010:kmem_cache_alloc_trace 
(/home/kas/linux/torvalds/mm/slub.c:272 
/home/kas/linux/torvalds/mm/slub.c:278 
/home/kas/linux/torvalds/mm/slub.c:292 
/home/kas/linux/torvalds/mm/slub.c:2791 
/home/kas/linux/torvalds/mm/slub.c:2832 
/home/kas/linux/torvalds/mm/slub.c:2849)
Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 
75 e7 4c 8b 38 4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 
1c 07 40 f6 c7 0f 0f 85 95 01 00 00 48 8d 8a 80 00 00 00 4c

All code

    0:    8b 51 08 mov    0x8(%rcx),%edx
    3:    48 89 c8 mov    %rcx,%rax
    6:    65 48 03 05 d4 0e ca add
%gs:0x70ca0ed4(%rip),%rax    # 0x70ca0ee2

    d:    70
    e:    48 8b 70 08  mov    0x8(%rax),%rsi
   12:    48 39 f2 cmp    %rsi,%rdx
   15:    75 e7    jne    0xfffe
   17:    4c 8b 38 mov    (%rax),%r15
   1a:    4d 85 ff test   %r15,%r15
   1d:    0f 84 8f 01 00 00    je 0x1b2
   23:    8b 45 20 mov    0x20(%rbp),%eax
   26:    48 8b 7d 00  mov    0x0(%rbp),%rdi
   2a:*    49 8b 1c 07  mov    (%r15,%rax,1),%rbx
<-- trapping instruction

   2e:    40 f6 c7 0f  test   $0xf,%dil
   32:    0f 85 95 01 00 00    jne    0x1cd
   38:    48 8d 8a 80 00 00 00 lea    0x80(%rdx),%rcx
   3f:    4c   rex.WR

Code starting with the faulting instruction
===
    0:    49 8b 1c 07  mov    (%r15,%rax,1),%rbx
    4:    40 f6 c7 0f  test   $0xf,%dil
    8:    0f 85 95 01 00 00    jne    0x1a3
    e:    48 8d 8a 80 00 00 00 lea    0x80(%rdx),%rcx
   15:    4c   rex.WR
RSP: 0018:a8a381bcfba0 EFLAGS: 00010206
RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50
RDX: fc8d RSI: fc8d RDI: 0003fe50
RBP: 9c0b18407840 R08:  R09: 0001
R10: 9c0b06c28000 R11: 0001 R12: 0dc0
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE
R13: 0060 R14: 8fa35a47 R15: 056c
FS:  7fbe7a8e3900() GS:9c0b1f88() 
knlGS:

CS:  0010 DS:  ES:  CR0: 80050033
CR2: 059c CR3: 00103c7fe004 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 
/home/kas/linux/torvalds/include/linux/slab.h:669 
/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423)

[drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
nouveau_gem_ioctl_pushbuf 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852)
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, 
DRM_IOCTL_CRTC_QUEUE_SEQUENCE
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
? drm_ioctl_kernel 
(/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793)
? nouveau_gem_ioctl_ne

Re: [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones

On 7/17/20 12:47 PM, Daniel Vetter wrote:

On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5


Just bikeshed, but maybe a few more words on what exactly is broken and
how this works around it. Specifically why we only accept these, but don't
advertise them.


Added quite a few words.



Signed-off-by: James Jones 


Needs Fixes: line here. Also nice to mention the bug reporter/link.


Done in v2.


---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
  {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
  
+	if ((modifier & (0xffull << 12)) == 0ull) {

+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }


Hm I tried to understand what this magic does by looking at drm_fourcc.h,
but the drm_fourcc_canonicalize_nvidia_format_mod() in there implements
something else. Is that function wrong, or should we use it here instead?


> Or is there something else going on entirely?

This may be slightly clearer with the expanded change description:

Canonicalize assumes the old modifiers are only used by certain Tegra 
revisions, because the Mesa patches were supposed to land and obliterate 
all uses beyond that.  That assumption means it can assume the specific 
page kind (0xfe) used by the display-engine-compatible layout on those 
specific devices.  There is no way to generally canonicalize a legacy 
modifier without referencing a specific device type, as is indirectly 
done here.


This code does a limited device-specific canonicalization: It 
substitutes the display-appropriate page kind used by this specific 
device, ensuring we derive this correct page kind later in the function. 
 I iterated on the best way to accomplish this a few times, and this 
was the least-invasive thing I came up with, but it does require a 
pretty thorough understanding of the NVIDIA modifier macros.


Thanks for the quick review.

-James



Cheers, Daniel


+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
  }
  
+static const u64 legacy_modifiers[] = {

+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
  static int
  nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
  
-	if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)

-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
  
  	nouveau_decode_mod(drm, modifier, tile_mode, kind);
  
--

2.17.1




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers. Existing Mesa
drivers are still aware of only these older
format modifiers which do not differentiate
between different variations of the block linear
layout. When the format modifier support flag was
flipped in the nouveau kernel driver, the X.org
modesetting driver began attempting to use its
format modifier-enabled framebuffer path. Because
the set of format modifiers advertised by the
kernel prior to this change do not intersect with
the set of format modifiers advertised by Mesa,
allocating GBM buffers using format modifiers
fails and the modesetting driver falls back to
non-modifier allocation. However, it still later
queries the modifier of the GBM buffer when
creating its DRM-KMS framebuffer object, receives
the old-format modifier from Mesa, and attempts
to create a framebuffer with it. Since the kernel
is still not aware of these formats, this fails.

Userspace should not be attempting to query format
modifiers of GBM buffers allocated with a non-
format-modifier-aware allocation path, but to
avoid breaking existing userspace behavior, this
change accepts the old-style format modifiers when
creating framebuffers and applying them to planes
by translating them to the equivalent new-style
modifier. To accomplish this, some layout
parameters must be assumed to match properties of
the device targeted by the relevant ioctls. To
avoid perpetuating misuse of the old-style
modifiers, this change does not advertise support
for them. Doing so would imply compatibility
between devices with incompatible memory layouts.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Reported-by: Kirill A. Shutemov 
Fixes: fa4f4c213f5f ("drm/nouveau/kms: Support NVIDIA format modifiers")
Link: https://lkml.org/lkml/2020/6/30/1251
Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
 {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
 
+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
 }
 
+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
 static int
 nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
 
-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
 
nouveau_decode_mod(drm, modifier, tile_mode, kind);
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones

This doesn't look related.

-James

On 7/17/20 2:30 PM, Kirill A. Shutemov wrote:

On Fri, Jul 17, 2020 at 11:57:57AM -0700, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 


I tried and it crashes. Not sure if it's related.

[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:vblank_disable_fn] disabling vblank on crtc 0
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_CPU_PREP
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
[drm:drm_ioctl] pid=3351, dev=0xe200, auth=1, NOUVEAU_GEM_PUSHBUF
BUG: unable to handle page fault for address: 059c
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP PTI
CPU: 13 PID: 3351 Comm: alacritty Tainted: G  I   
5.8.0-rc5-00191-g086f86c033f9 #53
Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS 
Gaming 3 Pro-CF, BIOS F5d 11/28/2019
RIP: 0010:kmem_cache_alloc_trace (/home/kas/linux/torvalds/mm/slub.c:272 
/home/kas/linux/torvalds/mm/slub.c:278 /home/kas/linux/torvalds/mm/slub.c:292 
/home/kas/linux/torvalds/mm/slub.c:2791 /home/kas/linux/torvalds/mm/slub.c:2832 
/home/kas/linux/torvalds/mm/slub.c:2849)
Code: 8b 51 08 48 89 c8 65 48 03 05 d4 0e ca 70 48 8b 70 08 48 39 f2 75 e7 4c 8b 38 
4d 85 ff 0f 84 8f 01 00 00 8b 45 20 48 8b 7d 00 <49> 8b 1c 07 40 f6 c7 0f 0f 85 
95 01 00 00 48 8d 8a 80 00 00 00 4c
All code

0:  8b 51 08mov0x8(%rcx),%edx
3:  48 89 c8mov%rcx,%rax
6:  65 48 03 05 d4 0e caadd%gs:0x70ca0ed4(%rip),%rax# 
0x70ca0ee2
d:  70
e:  48 8b 70 08 mov0x8(%rax),%rsi
   12:  48 39 f2cmp%rsi,%rdx
   15:  75 e7   jne0xfffe
   17:  4c 8b 38mov(%rax),%r15
   1a:  4d 85 fftest   %r15,%r15
   1d:  0f 84 8f 01 00 00   je 0x1b2
   23:  8b 45 20mov0x20(%rbp),%eax
   26:  48 8b 7d 00 mov0x0(%rbp),%rdi
   2a:* 49 8b 1c 07 mov(%r15,%rax,1),%rbx   <-- 
trapping instruction
   2e:  40 f6 c7 0f test   $0xf,%dil
   32:  0f 85 95 01 00 00   jne0x1cd
   38:  48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx
   3f:  4c  rex.WR

Code starting with the faulting instruction
===
0:  49 8b 1c 07 mov(%r15,%rax,1),%rbx
4:  40 f6 c7 0f test   $0xf,%dil
8:  0f 85 95 01 00 00   jne0x1a3
e:  48 8d 8a 80 00 00 00lea0x80(%rdx),%rcx
   15:  4c  rex.WR
RSP: 0018:a8a381bcfba0 EFLAGS: 00010206
RAX: 0030 RBX: 9c0b15e05e00 RCX: 0003fe50
RDX: fc8d RSI: fc8d RDI: 0003fe50
RBP: 9c0b18407840 R08:  R09: 0001
R10: 9c0b06c28000 R11: 0001 R12: 0dc0
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_GET_SEQUENCE
R13: 0060 R14: 8fa35a47 R15: 056c
FS:  7fbe7a8e3900() GS:9c0b1f88() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 059c CR3: 00103c7fe004 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
nouveau_fence_new (/home/kas/linux/torvalds/include/linux/slab.h:555 
/home/kas/linux/torvalds/include/linux/slab.h:669 
/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_fence.c:423)
[drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
nouveau_gem_ioctl_pushbuf 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:852)
[drm:drm_ioctl] pid=3327, dev=0xe200, auth=1, DRM_IOCTL_CRTC_QUEUE_SEQUENCE
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
? drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793)
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
drm_ioctl_kernel (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:793)
drm_ioctl (/home/kas/linux/torvalds/drivers/gpu/drm/drm_ioctl.c:888)
? nouveau_gem_ioctl_new 
(/home/kas/linux/torvalds/drivers/gpu/drm/nouveau/nouveau_gem.c:680)
? _raw_spin_unlock_irqrestore 
(/home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:41 
/home/kas/linux/torvalds/arch/x86/include/asm/irqflags.h:84 
/home/kas/linux/torvalds/include/linux/spinlock_api_smp.h:160 
/home/kas/linux/torvalds/k

Re: [PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
This should resolve the inability to start X with the new NV format 
modifier support in nouveau.  FYI, I'm offline next week, but I'll check 
in tonight in case there are any review comments.  When I'm back, I'll 
get the associated userspace fixes cleaned up and out to the appropriate 
lists.


Thanks,
-James

On 7/17/20 11:57 AM, James Jones wrote:

Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
  1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
  {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
  
+	if ((modifier & (0xffull << 12)) == 0ull) {

+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
  }
  
+static const u64 legacy_modifiers[] = {

+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
  static int
  nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
  
-	if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)

-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
  
  	nouveau_decode_mod(drm, modifier, tile_mode, kind);
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/nouveau: Accept 'legacy' format modifiers

2020-07-17 Thread James Jones
Accept the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()
family of modifiers to handle broken userspace
Xorg modesetting and Mesa drivers.

Tested with Xorg 1.20 modesetting driver,
weston@c46c70dac84a4b3030cd05b380f9f410536690fc,
gnome & KDE wayland desktops from Ubuntu 18.04,
and sway 1.5

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 26 +--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 496c4621cc78..31543086254b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -191,8 +191,14 @@ nouveau_decode_mod(struct nouveau_drm *drm,
   uint32_t *tile_mode,
   uint8_t *kind)
 {
+   struct nouveau_display *disp = nouveau_display(drm->dev);
BUG_ON(!tile_mode || !kind);
 
+   if ((modifier & (0xffull << 12)) == 0ull) {
+   /* Legacy modifier.  Translate to this device's 'kind.' */
+   modifier |= disp->format_modifiers[0] & (0xffull << 12);
+   }
+
if (modifier == DRM_FORMAT_MOD_LINEAR) {
/* tile_mode will not be used in this case */
*tile_mode = 0;
@@ -227,6 +233,16 @@ nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
}
 }
 
+static const u64 legacy_modifiers[] = {
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4),
+   DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5),
+   DRM_FORMAT_MOD_INVALID
+};
+
 static int
 nouveau_validate_decode_mod(struct nouveau_drm *drm,
uint64_t modifier,
@@ -247,8 +263,14 @@ nouveau_validate_decode_mod(struct nouveau_drm *drm,
 (disp->format_modifiers[mod] != modifier);
 mod++);
 
-   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
-   return -EINVAL;
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID) {
+   for (mod = 0;
+(legacy_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(legacy_modifiers[mod] != modifier);
+mod++);
+   if (legacy_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+   }
 
nouveau_decode_mod(drm, modifier, tile_mode, kind);
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-07-14 Thread James Jones
Still testing.  I'll get a Sign-off version out this week unless I find 
a problem.


Thanks,
-James

On 7/12/20 6:37 PM, Dave Airlie wrote:

How are we going with a fix for this regression I can commit?

Dave.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-07-03 Thread James Jones

On 7/2/20 2:14 PM, James Jones wrote:

On 7/2/20 1:22 AM, Daniel Stone wrote:

Hi,

On Wed, 1 Jul 2020 at 20:45, James Jones  wrote:

OK, I think I see what's going on.  In the Xorg modesetting driver, the
logic is basically:

if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) {
    drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm));
} else {
    drmModeAddFB(...);
}


I read this thread expecting to explain the correct behaviour we
implement in Weston and how modesetting needs to be fixed, but ...
that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we
used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'.


Yes, the hazards of reporting findings before verifying.  I now see 
modesetting does query the DRM-KMS modifiers and attempt to allocate 
with them if it found any.  However, I still see a lot of ways things 
can go wrong, but I'm not going to share my speculation again until I've 
actually verified it, which is taking a frustratingly long time.  The 
modesetting driver is not my friend right now.


OK, several hours of dumb build+config mistakes later, I was actually 
able to reproduce the failure and walk through things.  There is a 
trivial fix for the issues in the X modesetting driver, working off 
Daniel Stone's claim that gbm_bo_get_modifier() should only be called 
when the allocation was made with gbm_bo_create_with_modifiers(). 
modeset doesn't respect that requirement now in the case that the atomic 
modesetting path is disabled, which is always the case currently because 
that path is broken.  Respecting that requirement is a half-liner and 
allows X to start properly.


If I force modeset to use the atomic path, X still fails to start with 
the above fix, validating the second theory I'd had:


-Current Mesa nouveau code basically ignores the modifier list passed in 
unless it is a single modifier requesting linear layout, and goes about 
allocating whatever layout it sees fit, and succeeds the allocation 
despite being passed a list of modifiers it knows nothing about.  Not 
great, fixed in my pending patches, obviously doesn't help existing 
deployed userspace.


-Current Mesa nouveau code, when asked what modifier it used for the 
above allocation, returns one of the "legacy" modifiers nouveau DRM-KMS 
knows nothing about.


-When the modeset driver tries to create an FB for that BO with the 
returned modifier, the nouveau kernel driver of course refuses.


I think it's probably worth fixing the modesetting driver for the 
reasons Daniel Vetter mentioned.  Then if I get my Mesa patches in 
before a new modesetting driver with working Atomic support is released, 
there'll be no need for long-term workarounds in the kernel.


Down to the real question of what to do in the kernel to support current 
userspace code: I still think the best fix is to accept the old 
modifiers but not advertise them.  However, Daniel Stone and others, if 
you think this will actually break userspace in other ways (Could you 
describe in a bit more detail or point me to test cases if so?), I 
suppose the only option would be to advertise & accept the old modifiers 
for now, and I suppose at a config option at some point to phase the old 
ones out, eventually drop them entirely.  This would be unfortunate, 
because as I mentioned, it could sometimes result in situations where 
apps think they can share a buffer between two devices but will get 
garbled data in practice.


I've included an initial version of the kernel patch inline below. 
Needs more testing, but I wanted to share it in case anyone has feedback 
on the idea, wants to see the general workflow, or wants to help test.



There's no attempt to verify the DRM-KMS device supports the modifier,
but then, why would there be?  GBM presumably chose a supported modifier
at buffer creation time, and we don't know which plane the FB is going
to be used with yet.  GBM doesn't actually ask the kernel which
modifiers it supports here either though.


Right, it doesn't ask, because userspace tells it which modifiers to
use. The correct behaviour is to take the list from the KMS
`IN_FORMATS` property and then pass that to
`gbm_(bo|surface)_create_with_modifiers`; GBM must then select from
that list and only that list. If that call does not succeed and Xorg
falls back to `gbm_surface_create`, then it must not call
`gbm_bo_get_modifier` - so that would be a modesetting bug. If that
call does succeed and `gbm_bo_get_modifier` subsequently reports a
modifier which was not in the list, that's a Mesa driver bug.


It just goes into Mesa via
DRI and reports the modifier (unpatched) Mesa chose on its own.  Mesa
just hard-codes the modifiers in its driver backends since its thinking
in terms of a device's 3D engine, not display.  In theory, Mesa's DRI
drivers could query KMS for supported modifiers if allocating from GBM
using the non-modifiers path and the SCANOUT flag is set (perhaps some
drivers 

Re: [git pull] drm for 5.8-rc1

2020-07-02 Thread James Jones

On 7/2/20 1:22 AM, Daniel Stone wrote:

Hi,

On Wed, 1 Jul 2020 at 20:45, James Jones  wrote:

OK, I think I see what's going on.  In the Xorg modesetting driver, the
logic is basically:

if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) {
drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm));
} else {
drmModeAddFB(...);
}


I read this thread expecting to explain the correct behaviour we
implement in Weston and how modesetting needs to be fixed, but ...
that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we
used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'.


Yes, the hazards of reporting findings before verifying.  I now see 
modesetting does query the DRM-KMS modifiers and attempt to allocate 
with them if it found any.  However, I still see a lot of ways things 
can go wrong, but I'm not going to share my speculation again until I've 
actually verified it, which is taking a frustratingly long time.  The 
modesetting driver is not my friend right now.



There's no attempt to verify the DRM-KMS device supports the modifier,
but then, why would there be?  GBM presumably chose a supported modifier
at buffer creation time, and we don't know which plane the FB is going
to be used with yet.  GBM doesn't actually ask the kernel which
modifiers it supports here either though.


Right, it doesn't ask, because userspace tells it which modifiers to
use. The correct behaviour is to take the list from the KMS
`IN_FORMATS` property and then pass that to
`gbm_(bo|surface)_create_with_modifiers`; GBM must then select from
that list and only that list. If that call does not succeed and Xorg
falls back to `gbm_surface_create`, then it must not call
`gbm_bo_get_modifier` - so that would be a modesetting bug. If that
call does succeed and `gbm_bo_get_modifier` subsequently reports a
modifier which was not in the list, that's a Mesa driver bug.


It just goes into Mesa via
DRI and reports the modifier (unpatched) Mesa chose on its own.  Mesa
just hard-codes the modifiers in its driver backends since its thinking
in terms of a device's 3D engine, not display.  In theory, Mesa's DRI
drivers could query KMS for supported modifiers if allocating from GBM
using the non-modifiers path and the SCANOUT flag is set (perhaps some
drivers do this or its equivalent?  Haven't checked.), but that seems
pretty gnarly and doesn't fix the modifier-based GBM allocation path
AFAIK.  Bit of a mess.


Two options for GBM users:
* call gbm_*_create_with_modifiers, it succeeds, call
gbm_bo_get_modifier, pass modifier into AddFB
* call gbm_*_create (without modifiers), it succeeds, do not call
gbm_bo_get_modifier, do not pass a modifier into AddFB

Anything else is a bug in the user. Note that falling back from 1 to 2
is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back
to the non-modifier path, provided you don't later try to get a
modifier back out.


For a quick userspace fix that could probably be pushed out everywhere
(Only affects Xorg server 1.20+ AFAIK), just retrying
drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on
failure should be sufficient.


This would break other drivers.


I think this could be done in a way that wouldn't, though it wouldn't be 
quite as simple.  Let's see what the true root cause is first though.



Still need to verify as I'm having
trouble wrangling my Xorg build at the moment and I'm pressed for time.
A more complete fix would be quite involved, as modesetting isn't really
properly plumbed to validate GBM's modifiers against KMS planes, and it
doesn't seem like GBM/Mesa/DRI should be responsible for this as noted
above given the general modifier workflow/design.

Most importantly, options I've considered for fixing from the kernel side:

-Accept "legacy" modifiers in nouveau in addition to the new modifiers,
though avoid reporting them to userspace as supported to avoid further
proliferation.  This is pretty straightforward.  I'll need to modify
both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set
plane validation logic (nv50_plane_format_mod_supported), but it should
end up just being a few lines of code.


I do think that they should also be reported to userspace if they are
accepted. Other users can and do look at the modifier list to see if
the buffer is acceptable for a given plane, so the consistency is good
here. Of course, in Mesa you would want to prioritise the new
modifiers over the legacy ones, and not allocate or return the legacy
ones unless that was all you were asked for. This would involve
tracking the used modifier explicitly through Mesa, rather than
throwing it away at alloc time and then later divining it from the
tiling mode.


Reporting them as supported is equivalent to reporting support for a 
memory layout the chips don't actually support (It corresponds to a 
valid layout on Tegra chips, but not on discrete NV chips).  This is 
what the new modifier

Re: [git pull] drm for 5.8-rc1

2020-07-01 Thread James Jones
OK, I think I see what's going on.  In the Xorg modesetting driver, the 
logic is basically:


if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) {
  drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm));
} else {
  drmModeAddFB(...);
}

There's no attempt to verify the DRM-KMS device supports the modifier, 
but then, why would there be?  GBM presumably chose a supported modifier 
at buffer creation time, and we don't know which plane the FB is going 
to be used with yet.  GBM doesn't actually ask the kernel which 
modifiers it supports here either though.  It just goes into Mesa via 
DRI and reports the modifier (unpatched) Mesa chose on its own.  Mesa 
just hard-codes the modifiers in its driver backends since its thinking 
in terms of a device's 3D engine, not display.  In theory, Mesa's DRI 
drivers could query KMS for supported modifiers if allocating from GBM 
using the non-modifiers path and the SCANOUT flag is set (perhaps some 
drivers do this or its equivalent?  Haven't checked.), but that seems 
pretty gnarly and doesn't fix the modifier-based GBM allocation path 
AFAIK.  Bit of a mess.


For a quick userspace fix that could probably be pushed out everywhere 
(Only affects Xorg server 1.20+ AFAIK), just retrying 
drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on 
failure should be sufficient.  Still need to verify as I'm having 
trouble wrangling my Xorg build at the moment and I'm pressed for time. 
A more complete fix would be quite involved, as modesetting isn't really 
properly plumbed to validate GBM's modifiers against KMS planes, and it 
doesn't seem like GBM/Mesa/DRI should be responsible for this as noted 
above given the general modifier workflow/design.


Most importantly, options I've considered for fixing from the kernel side:

-Accept "legacy" modifiers in nouveau in addition to the new modifiers, 
though avoid reporting them to userspace as supported to avoid further 
proliferation.  This is pretty straightforward.  I'll need to modify 
both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set 
plane validation logic (nv50_plane_format_mod_supported), but it should 
end up just being a few lines of code.


-Don't validate modifiers in AddFB.  This doesn't really gain anything 
because it just pushes the failure down to mode set time, so it's not 
that useful, so I don't plan on pursuing this.


As noted, need to run just now, but I should have a kernel patch to test 
out either tonight or tomorrow.


If anyone's curious, the reason my testing missed this was I did most of 
my verification of "old" code against the Xorg 1.19 build included with 
my distro.  I did hack up a Xorg 1.20-ish build to test as well that 
would have included this path, but I must not have properly configured 
it with GBM modifier support somehow.  I was pretty focused on just 
testing the forcibly-disabled atomic path in the modesetting driver in 
this build, so I didn't look too closely at things beyond that.


Thanks,
-James

On 7/1/20 12:59 AM, Kirill A. Shutemov wrote:

On Wed, Jul 01, 2020 at 10:57:19AM +0300, Kirill A. Shutemov wrote:

On Tue, Jun 30, 2020 at 09:40:19PM -0700, James Jones wrote:

This implies something is trying to use one of the old
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without
first checking whether it is supported by the kernel.  I had tried to force
an Xorg+Mesa stack without my userspace patches to hit this error when
testing, but must have missed some permutation.  If the stalled Mesa patches
go in, this would stop happening of course, but those were held up for a
long time in review, and are now waiting on me to make some modifications.

Are you using the modesetting driver in X? If so, with glamor I presume?


Yes and yes. I attached Xorg.log.


Attached now.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-07-01 Thread James Jones

On 7/1/20 10:04 AM, Karol Herbst wrote:

On Wed, Jul 1, 2020 at 6:01 PM Daniel Vetter  wrote:


On Wed, Jul 1, 2020 at 5:51 PM James Jones  wrote:


On 7/1/20 4:24 AM, Karol Herbst wrote:

On Wed, Jul 1, 2020 at 6:45 AM James Jones  wrote:


This implies something is trying to use one of the old
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without
first checking whether it is supported by the kernel.  I had tried to
force an Xorg+Mesa stack without my userspace patches to hit this error
when testing, but must have missed some permutation.  If the stalled
Mesa patches go in, this would stop happening of course, but those were
held up for a long time in review, and are now waiting on me to make
some modifications.



that's completely irrelevant. If a kernel change breaks userspace,
it's a kernel bug.


Agreed it is unacceptable to break userspace, but I don't think it's
irrelevant.  Perhaps the musings on pending userspace patches are.

My intent here was to point out it appears at first glance that
something isn't behaving as expected in userspace, so fixing this would
likely require some sort of work-around for broken userspace rather than
straight-forward fixing of a bug in the kernel logic.  My intent was not
to shift blame to something besides my code & testing for the
regression, though I certainly see how it could be interpreted that way.

Regardless, I'm looking in to it.




I assume the MR you were talking about is
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724 ?


Correct.


I am
also aware of the tegra driver being broken on my jetson nano and I am
now curious if this MR could fix this bug as well... and sorry for the
harsh reply, I was just a annoyed by the fact that "everything
modifier related is just breaking things", first tegra and that nobody
is looking into fixing it and then apparently the userspace code being
quite broken as well :/

Anyway, yeah I trust you guys on figuring out the keeping "broken"
userspace happy from a kernel side and maybe I can help out with
reviewing the mesa bits. I am just wondering if it could help with the
tegra situation giving me more reasons to look into it as this would
solve other issues I should be working on :)


Not sure if you're claiming this, but if there's Tegra breakage 
attributable to this patch series, I'd love to hear more details there 
as well.  The Tegra patches did have backwards-compat code to handle the 
old modifiers, since Tegra was the only working use case I could find 
for them within the kernel itself.  However, the Tegra kernel patches 
are independent (and haven't even been reviewed yet to my knowledge), so 
Tegra shouldn't be affected at all given it uses TegraDRM rather than 
Nouveau's modesetting driver.


If there are just general existing issues with modifier support on 
Tegra, let's take that to a smaller venue.  I probably won't be as much 
help there, but I can at least try to help get some eyes on it.


Thanks,
-James


If we do need to have a kernel workaround I'm happy to help out, I've
done a bunch of these and occasionally it's good to get rather
creative :-)

Ideally we'd also push a minimal fix in userspace to all stable
branches and make sure distros upgrade (might need releases if some
distro is stuck on old horrors), so that we don't have to keep the
hack in place for 10+ years or so. Definitely if the hack amounts to
disabling modifiers on nouveau, that would be kinda sad.
-Daniel



Thanks,
-James


Are you using the modesetting driver in X?  If so, with glamor I
presume?  What version of Mesa?  Any distro patches?  Any non-default
xorg.conf options that would affect modesetting, your X driver if it
isn't modesetting, or glamour?

Thanks,
-James

On 6/30/20 4:08 PM, Kirill A. Shutemov wrote:

On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote:

James Jones (4):

...

 drm/nouveau/kms: Support NVIDIA format modifiers


This commit is the first one that breaks Xorg startup for my setup:
GTX 1080 + Dell UP2414Q (4K DP MST monitor).

I believe this is the crucial part of dmesg (full dmesg is attached):

[   29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 
0x314
[   29.997143] [drm:drm_internal_framebuffer_create] could not create 
framebuffer
[   29.997145] [drm:drm_ioctl] pid=3393, ret = -22

Any suggestions?


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


_

Re: [git pull] drm for 5.8-rc1

2020-07-01 Thread James Jones

On 7/1/20 4:24 AM, Karol Herbst wrote:

On Wed, Jul 1, 2020 at 6:45 AM James Jones  wrote:


This implies something is trying to use one of the old
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without
first checking whether it is supported by the kernel.  I had tried to
force an Xorg+Mesa stack without my userspace patches to hit this error
when testing, but must have missed some permutation.  If the stalled
Mesa patches go in, this would stop happening of course, but those were
held up for a long time in review, and are now waiting on me to make
some modifications.



that's completely irrelevant. If a kernel change breaks userspace,
it's a kernel bug.


Agreed it is unacceptable to break userspace, but I don't think it's 
irrelevant.  Perhaps the musings on pending userspace patches are.


My intent here was to point out it appears at first glance that 
something isn't behaving as expected in userspace, so fixing this would 
likely require some sort of work-around for broken userspace rather than 
straight-forward fixing of a bug in the kernel logic.  My intent was not 
to shift blame to something besides my code & testing for the 
regression, though I certainly see how it could be interpreted that way.


Regardless, I'm looking in to it.

Thanks,
-James


Are you using the modesetting driver in X?  If so, with glamor I
presume?  What version of Mesa?  Any distro patches?  Any non-default
xorg.conf options that would affect modesetting, your X driver if it
isn't modesetting, or glamour?

Thanks,
-James

On 6/30/20 4:08 PM, Kirill A. Shutemov wrote:

On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote:

James Jones (4):

...

drm/nouveau/kms: Support NVIDIA format modifiers


This commit is the first one that breaks Xorg startup for my setup:
GTX 1080 + Dell UP2414Q (4K DP MST monitor).

I believe this is the crucial part of dmesg (full dmesg is attached):

[   29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 
0x314
[   29.997143] [drm:drm_internal_framebuffer_create] could not create 
framebuffer
[   29.997145] [drm:drm_ioctl] pid=3393, ret = -22

Any suggestions?


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm for 5.8-rc1

2020-06-30 Thread James Jones
This implies something is trying to use one of the old 
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifiers with DRM-KMS without 
first checking whether it is supported by the kernel.  I had tried to 
force an Xorg+Mesa stack without my userspace patches to hit this error 
when testing, but must have missed some permutation.  If the stalled 
Mesa patches go in, this would stop happening of course, but those were 
held up for a long time in review, and are now waiting on me to make 
some modifications.


Are you using the modesetting driver in X?  If so, with glamor I 
presume?  What version of Mesa?  Any distro patches?  Any non-default 
xorg.conf options that would affect modesetting, your X driver if it 
isn't modesetting, or glamour?


Thanks,
-James

On 6/30/20 4:08 PM, Kirill A. Shutemov wrote:

On Tue, Jun 02, 2020 at 04:06:32PM +1000, Dave Airlie wrote:

James Jones (4):

...

   drm/nouveau/kms: Support NVIDIA format modifiers


This commit is the first one that breaks Xorg startup for my setup:
GTX 1080 + Dell UP2414Q (4K DP MST monitor).

I believe this is the crucial part of dmesg (full dmesg is attached):

[   29.997140] [drm:nouveau_framebuffer_new] Unsupported modifier: 
0x314
[   29.997143] [drm:drm_internal_framebuffer_create] could not create 
framebuffer
[   29.997145] [drm:drm_ioctl] pid=3393, ret = -22

Any suggestions?


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer

2020-02-11 Thread James Jones




On 2/10/20 3:35 PM, Ben Skeggs wrote:

On Tue, 11 Feb 2020 at 09:17, James Jones  wrote:


On 2/10/20 12:25 AM, Thomas Zimmermann wrote:

Hi

Am 10.02.20 um 09:20 schrieb Ben Skeggs:

On Sat, 8 Feb 2020 at 07:10, James Jones  wrote:


I've sent out a v4 version of the format modifier patches which avoid
caching values in the nouveau_framebuffer struct.  It will have a few
trivial conflicts with your series, but should make them structurally
compatible.

I'm fine with either v3 or v4 of my series personally, but if these
cleanup patches are taken, only v4 will work.

I've taken Tomas' cleanup patches in my tree, and will take James'
also once they've been fixed up to work on top of the cleanup.


Thanks!


After applying this series locally, I'm hitting a NULL deref loading the
nouveau module with fbconsole caused by patch 3/4.  I've sent out a
trivial fix for review separately.  Please have a look, and Ben, feel
free to squash it with Thomas's original patch if you prefer.

Oops.  Squashed!





James, are you happy for me to take the drm_fourcc.h patch that's on
dri-devel through my tree for the next merge window too?


Yes, that would be great.  I couldn't find a public version of your tree
with Thomas's patches applied, but I pulled them in locally and rebased
my series on top of that as v5, resolving all the remaining trivial
conflicts.  Appologies for all the patch spam this generated.

I've pulled in your patches now too.


Awesome.  Thanks!

-James


Thank you!
Ben.


Thanks,
-James


Ben.



Thanks,
-James

On 2/6/20 8:45 AM, James Jones wrote:

Yes, that's certainly viable.  If that's the general preference in
direction, I'll rework that patches to do so.

Thanks,
-James

On 2/6/20 7:49 AM, Thomas Zimmermann wrote:

Hi James

Am 06.02.20 um 16:17 schrieb James Jones:

Note I'm adding some fields to nouveau_framebuffer in the series
"drm/nouveau: Support NVIDIA format modifiers."  I sent out v3 of that
yesterday.  It would probably still be possible to avoid them by
re-extracting the relevant data from the format modifier on the fly when
needed, but it is simpler and likely less error-prone with the wrapper
struct.


Thanks for the note.

I just took a look at your patchset. I think struct nouveau_framebuffer
should not store tile_mode and kind. AFAICT there are only two trivial
places where these values are used and they can be extracted from the
framebuffer at any time.

I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and
return the correct values. Kind of what you do in
nouveau_framebuffer_new() near line 330.

Thoughts?

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/70786/#rev3



Thanks,
-James

On 2/6/20 2:19 AM, Thomas Zimmermann wrote:

After its cleanup, struct nouveau_framebuffer is only a wrapper around
struct drm_framebuffer. Use the latter directly.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 26
+++
 drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++--
 drivers/gpu/drm/nouveau/nouveau_display.h | 12 +--
 drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 14 ++--
 4 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ba1399965a1c..4a67a656e007 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma
*ctxdma)
 }
   static struct nv50_wndw_ctxdma *
-nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct
nouveau_framebuffer *fb)
+nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer
*fb)
 {
-struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
+struct nouveau_drm *drm = nouveau_drm(fb->dev);
 struct nv50_wndw_ctxdma *ctxdma;
-struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
 const u8kind = nvbo->kind;
 const u32 handle = 0xfb00 | kind;
 struct {
@@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
struct nv50_wndw_atom *asyw,
struct nv50_head_atom *asyh)
 {
-struct nouveau_framebuffer *fb =
nouveau_framebuffer(asyw->state.fb);
+struct drm_framebuffer *fb = asyw->state.fb;
 struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
 int ret;
   NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
 -if (asyw->state.fb != armw->state.fb || !armw->visible ||
modeset) {
-asyw->image.w = fb->base.width;
-asyw->image.h = fb->base.height;
+if (fb != armw-&g

Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer

2020-02-10 Thread James Jones

On 2/10/20 12:25 AM, Thomas Zimmermann wrote:

Hi

Am 10.02.20 um 09:20 schrieb Ben Skeggs:

On Sat, 8 Feb 2020 at 07:10, James Jones  wrote:


I've sent out a v4 version of the format modifier patches which avoid
caching values in the nouveau_framebuffer struct.  It will have a few
trivial conflicts with your series, but should make them structurally
compatible.

I'm fine with either v3 or v4 of my series personally, but if these
cleanup patches are taken, only v4 will work.

I've taken Tomas' cleanup patches in my tree, and will take James'
also once they've been fixed up to work on top of the cleanup.


Thanks!


After applying this series locally, I'm hitting a NULL deref loading the 
nouveau module with fbconsole caused by patch 3/4.  I've sent out a 
trivial fix for review separately.  Please have a look, and Ben, feel 
free to squash it with Thomas's original patch if you prefer.




James, are you happy for me to take the drm_fourcc.h patch that's on
dri-devel through my tree for the next merge window too?


Yes, that would be great.  I couldn't find a public version of your tree 
with Thomas's patches applied, but I pulled them in locally and rebased 
my series on top of that as v5, resolving all the remaining trivial 
conflicts.  Appologies for all the patch spam this generated.


Thanks,
-James


Ben.



Thanks,
-James

On 2/6/20 8:45 AM, James Jones wrote:

Yes, that's certainly viable.  If that's the general preference in
direction, I'll rework that patches to do so.

Thanks,
-James

On 2/6/20 7:49 AM, Thomas Zimmermann wrote:

Hi James

Am 06.02.20 um 16:17 schrieb James Jones:

Note I'm adding some fields to nouveau_framebuffer in the series
"drm/nouveau: Support NVIDIA format modifiers."  I sent out v3 of that
yesterday.  It would probably still be possible to avoid them by
re-extracting the relevant data from the format modifier on the fly when
needed, but it is simpler and likely less error-prone with the wrapper
struct.


Thanks for the note.

I just took a look at your patchset. I think struct nouveau_framebuffer
should not store tile_mode and kind. AFAICT there are only two trivial
places where these values are used and they can be extracted from the
framebuffer at any time.

I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and
return the correct values. Kind of what you do in
nouveau_framebuffer_new() near line 330.

Thoughts?

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/70786/#rev3



Thanks,
-James

On 2/6/20 2:19 AM, Thomas Zimmermann wrote:

After its cleanup, struct nouveau_framebuffer is only a wrapper around
struct drm_framebuffer. Use the latter directly.

Signed-off-by: Thomas Zimmermann 
---
drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 26
+++
drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++--
drivers/gpu/drm/nouveau/nouveau_display.h | 12 +--
drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 14 ++--
4 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ba1399965a1c..4a67a656e007 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma
*ctxdma)
}
  static struct nv50_wndw_ctxdma *
-nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct
nouveau_framebuffer *fb)
+nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer
*fb)
{
-struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
+struct nouveau_drm *drm = nouveau_drm(fb->dev);
struct nv50_wndw_ctxdma *ctxdma;
-struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
const u8kind = nvbo->kind;
const u32 handle = 0xfb00 | kind;
struct {
@@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
   struct nv50_wndw_atom *asyw,
   struct nv50_head_atom *asyh)
{
-struct nouveau_framebuffer *fb =
nouveau_framebuffer(asyw->state.fb);
+struct drm_framebuffer *fb = asyw->state.fb;
struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
int ret;
  NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
-if (asyw->state.fb != armw->state.fb || !armw->visible ||
modeset) {
-asyw->image.w = fb->base.width;
-asyw->image.h = fb->base.height;
+if (fb != armw->state.fb || !armw->visible || modeset) {
+asyw->image.w = fb->width;
+asyw->image.h = fb->height;
asyw->image.kind = nvbo->kind;
  ret = nv50_wndw_ato

[PATCH v5 3/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-10 Thread James Jones
Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against the list of modifiers supported
by the target display hardware.

v2: Used Tesla family instead of NV50 chipset compare
v4: Do not cache kind, tile_mode in nouveau_framebuffer
v5: Resolved against nouveau_framebuffer cleanup

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 20 +++--
 drivers/gpu/drm/nouveau/nouveau_display.c | 89 ++-
 drivers/gpu/drm/nouveau/nouveau_display.h |  4 +
 3 files changed, 104 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index 8d6ef70602e1..6821195d65b7 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -44,9 +44,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
drm_framebuffer *fb)
 {
struct nouveau_drm *drm = nouveau_drm(fb->dev);
struct nv50_wndw_ctxdma *ctxdma;
-   struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
-   const u8kind = nvbo->kind;
-   const u32 handle = 0xfb00 | kind;
+   u32 handle;
+   u32 unused;
+   u8  kind;
struct {
struct nv_dma_v0 base;
union {
@@ -58,6 +58,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
drm_framebuffer *fb)
u32 argc = sizeof(args.base);
int ret;
 
+   nouveau_framebuffer_get_layout(fb, , );
+   handle = 0xfb00 | kind;
+
list_for_each_entry(ctxdma, >ctxdma.list, head) {
if (ctxdma->object.handle == handle)
return ctxdma;
@@ -238,15 +241,18 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, 
bool modeset,
 {
struct drm_framebuffer *fb = asyw->state.fb;
struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-   struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
+   uint8_t kind;
+   uint32_t tile_mode;
int ret;
 
NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
 
if (fb != armw->state.fb || !armw->visible || modeset) {
+   nouveau_framebuffer_get_layout(fb, _mode, );
+
asyw->image.w = fb->width;
asyw->image.h = fb->height;
-   asyw->image.kind = nvbo->kind;
+   asyw->image.kind = kind;
 
ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
if (ret) {
@@ -258,9 +264,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->image.kind) {
asyw->image.layout = 0;
if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = nvbo->mode >> 4;
+   asyw->image.blockh = tile_mode >> 4;
else
-   asyw->image.blockh = nvbo->mode;
+   asyw->image.blockh = tile_mode;
asyw->image.blocks[0] = fb->pitches[0] / 64;
asyw->image.pitch[0] = 0;
} else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 3048a43a8d36..616c9e486efb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -203,6 +203,76 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = drm_gem_fb_create_handle,
 };
 
+static void
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   BUG_ON(!tile_mode || !kind);
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+* Extract the block height and kind from the corresponding
+* modifier fields.  See drm_fourcc.h for details.
+*/
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+}
+
+void
+nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   if (fb->flags & DRM_MODE_FB_MODIFIERS) {
+   struct nouveau_drm *drm = nouveau_drm(fb->dev);
+
+ 

[PATCH v5 2/3] drm/nouveau: Check framebuffer size against bo

2020-02-10 Thread James Jones
Make sure framebuffer dimensions and tiling
parameters will not result in accesses beyond the
end of the GEM buffer they are bound to.

v3: Return EINVAL when creating FB against BO with
unsupported tiling
v5: Resolved against nouveau_framebuffer cleanup

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 98 +++
 1 file changed, 98 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 94f7fd48e1cf..3048a43a8d36 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -203,6 +203,76 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = drm_gem_fb_create_handle,
 };
 
+static inline uint32_t
+nouveau_get_width_in_blocks(uint32_t stride)
+{
+   /* GOBs per block in the x direction is always one, and GOBs are
+* 64 bytes wide
+*/
+   static const uint32_t log_block_width = 6;
+
+   return (stride + (1 << log_block_width) - 1) >> log_block_width;
+}
+
+static inline uint32_t
+nouveau_get_height_in_blocks(struct nouveau_drm *drm,
+uint32_t height,
+uint32_t log_block_height_in_gobs)
+{
+   uint32_t log_gob_height;
+   uint32_t log_block_height;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   log_gob_height = 2;
+   else
+   log_gob_height = 3;
+
+   log_block_height = log_block_height_in_gobs + log_gob_height;
+
+   return (height + (1 << log_block_height) - 1) >> log_block_height;
+}
+
+static int
+nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo,
+ uint32_t offset, uint32_t stride, uint32_t h,
+ uint32_t tile_mode)
+{
+   uint32_t gob_size, bw, bh;
+   uint64_t bl_size;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.chipset >= 0xc0) {
+   if (tile_mode & 0xF)
+   return -EINVAL;
+   tile_mode >>= 4;
+   }
+
+   if (tile_mode & 0xFFF0)
+   return -EINVAL;
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   gob_size = 256;
+   else
+   gob_size = 512;
+
+   bw = nouveau_get_width_in_blocks(stride);
+   bh = nouveau_get_height_in_blocks(drm, h, tile_mode);
+
+   bl_size = bw * bh * (1 << tile_mode) * gob_size;
+
+   DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u 
gob_size=%u bl_size=%llu size=%lu\n",
+ offset, stride, h, tile_mode, bw, bh, gob_size, bl_size,
+ nvbo->bo.mem.size);
+
+   if (bl_size + offset > nvbo->bo.mem.size)
+   return -ERANGE;
+
+   return 0;
+}
+
 int
 nouveau_framebuffer_new(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd,
@@ -210,7 +280,10 @@ nouveau_framebuffer_new(struct drm_device *dev,
struct drm_framebuffer **pfb)
 {
struct nouveau_drm *drm = nouveau_drm(dev);
+   struct nouveau_bo *nvbo = nouveau_gem_object(gem);
struct drm_framebuffer *fb;
+   const struct drm_format_info *info;
+   unsigned int width, height, i;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -233,6 +306,31 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   info = drm_get_format_info(dev, mode_cmd);
+
+   for (i = 0; i < info->num_planes; i++) {
+   width = drm_format_info_plane_width(info,
+   mode_cmd->width,
+   i);
+   height = drm_format_info_plane_height(info,
+ mode_cmd->height,
+ i);
+
+   if (nvbo->kind) {
+   ret = nouveau_check_bl_size(drm, nvbo,
+   mode_cmd->offsets[i],
+   mode_cmd->pitches[i],
+   height, nvbo->mode);
+   if (ret)
+   return ret;
+   } else {
+   uint32_t size = mode_cmd->pitches[i] * height;
+
+   if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size)
+   return -ERANGE;
+   }
+   }
+
if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL)

[PATCH v5 0/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-10 Thread James Jones
This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available on the
Mesa-dev gitlab merge request 3724:

  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3724

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

v2: Used Tesla family instead of NV50 chipset compare to avoid treating
oddly numbered NV4x-class chipsets as NV50+ GPUs.  Other instances
of compares with chipset number in the series were audited, deemed
safe, and left as-is for consistency with existing code.

v3: -Rebased on nouveau linux-5.6 @ 137c4ba7163ad9d5696b9fde78b1c0898a9c115b
-Noted corresponding Mesa patches are production-worthy now
-Better validate bo tile_mode when checking framebuffer size.

v4: Do not cache kind, tile_mode in nouveau_framebuffer

v5: Resolved against nouveau_framebuffer cleanup

James Jones (3):
  drm/nouveau: Add format mod prop to base/ovly/nvdisp
  drm/nouveau: Check framebuffer size against bo
  drm/nouveau: Support NVIDIA format modifiers

 drivers/gpu/drm/nouveau/dispnv50/base507c.c |   7 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c |  59 +++
 drivers/gpu/drm/nouveau/dispnv50/disp.h |   4 +
 drivers/gpu/drm/nouveau/dispnv50/wndw.c |  47 -
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c |  17 ++
 drivers/gpu/drm/nouveau/nouveau_display.c   | 183 
 drivers/gpu/drm/nouveau/nouveau_display.h   |   6 +
 7 files changed, 312 insertions(+), 11 deletions(-)

-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v5 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp

2020-02-10 Thread James Jones
Advertise support for the full list of format
modifiers supported by each class of NVIDIA
desktop GPU display hardware.  Stash the array
of modifiers in the nouveau_display struct for
use when validating userspace framebuffer
creation requests, which will be supportd in
a subsequent change.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/base507c.c |  7 +--
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +
 drivers/gpu/drm/nouveau/dispnv50/disp.h |  4 ++
 drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +-
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++
 drivers/gpu/drm/nouveau/nouveau_display.h   |  2 +
 6 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c 
b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
index 00a85f1e1a4a..025b8f996a0a 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
@@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
struct nv50_disp_base_channel_dma_v0 args = {
.head = head,
};
-   struct nv50_disp *disp = nv50_disp(drm->dev);
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   struct nv50_disp *disp50 = nv50_disp(drm->dev);
struct nv50_wndw *wndw;
int ret;
 
@@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
if (*pwndw = wndw, ret)
return ret;
 
-   ret = nv50_dmac_create(>client.device, >disp->object,
+   ret = nv50_dmac_create(>client.device, >disp.object,
   , head, , sizeof(args),
-  disp->sync->bo.offset, >wndw);
+  disp50->sync->bo.offset, >wndw);
if (ret) {
NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret);
return ret;
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index a3dc2ba19fb2..f017d05072b8 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2481,6 +2481,15 @@ nv50_display_create(struct drm_device *dev)
if (ret)
goto out;
 
+   /* Assign the correct format modifiers */
+   if (disp->disp->object.oclass >= TU102_DISP)
+   nouveau_display(dev)->format_modifiers = wndwc57e_modifiers;
+   else
+   if (disp->disp->object.oclass >= GF110_DISP)
+   nouveau_display(dev)->format_modifiers = disp90xx_modifiers;
+   else
+   nouveau_display(dev)->format_modifiers = disp50xx_modifiers;
+
/* create crtc objects to represent the hw heads */
if (disp->disp->object.oclass >= GV100_DISP)
crtcs = nvif_rd32(>object, 0x610060) & 0xff;
@@ -2576,3 +2585,53 @@ nv50_display_create(struct drm_device *dev)
nv50_display_destroy(dev);
return ret;
 }
+
+/**
+ * Format modifiers
+ */
+
+/
+ *Log2(block height) +  *
+ *Page Kind --+  |  *
+ *Gob Height/Page Kind Generation --+ |  |  *
+ *  Sector layout ---+  | |  |  *
+ *  Compression --+  |  | |  |  */
+const u64 disp50xx_modifiers[] = { /* |  |  | |  |  */
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4),
+   DRM_FORMAT_MOD_NVIDIA_B

[PATCH] drm/nouveau: Fix NULL ptr access in nv50_wndw_prepare_fb()

2020-02-10 Thread James Jones
This fixes a kernel oops when loading the nouveau
module with fb console enabled after the change:

  drm/nouveau: Remove field nvbo from struct nouveau_framebuffer

state->fb may be NULL in nv50_wndw_prepare_fb(),
so defer initializing nvbo from its obj[] array
until after the NULL check.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index 4a67a656e007..68c0dc2dc2d3 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -490,7 +490,7 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct 
drm_plane_state *state)
struct nouveau_drm *drm = nouveau_drm(plane->dev);
struct nv50_wndw *wndw = nv50_wndw(plane);
struct nv50_wndw_atom *asyw = nv50_wndw_atom(state);
-   struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
+   struct nouveau_bo *nvbo;
struct nv50_head_atom *asyh;
struct nv50_wndw_ctxdma *ctxdma;
int ret;
@@ -499,6 +499,7 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct 
drm_plane_state *state)
if (!asyw->state.fb)
return 0;
 
+   nvbo = nouveau_gem_object(fb->obj[0]);
ret = nouveau_bo_pin(nvbo, TTM_PL_FLAG_VRAM, true);
if (ret)
return ret;
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer

2020-02-07 Thread James Jones
I've sent out a v4 version of the format modifier patches which avoid 
caching values in the nouveau_framebuffer struct.  It will have a few 
trivial conflicts with your series, but should make them structurally 
compatible.


I'm fine with either v3 or v4 of my series personally, but if these 
cleanup patches are taken, only v4 will work.


Thanks,
-James

On 2/6/20 8:45 AM, James Jones wrote:
Yes, that's certainly viable.  If that's the general preference in 
direction, I'll rework that patches to do so.


Thanks,
-James

On 2/6/20 7:49 AM, Thomas Zimmermann wrote:

Hi James

Am 06.02.20 um 16:17 schrieb James Jones:

Note I'm adding some fields to nouveau_framebuffer in the series
"drm/nouveau: Support NVIDIA format modifiers."  I sent out v3 of that
yesterday.  It would probably still be possible to avoid them by
re-extracting the relevant data from the format modifier on the fly when
needed, but it is simpler and likely less error-prone with the wrapper
struct.


Thanks for the note.

I just took a look at your patchset. I think struct nouveau_framebuffer
should not store tile_mode and kind. AFAICT there are only two trivial
places where these values are used and they can be extracted from the
framebuffer at any time.

I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and
return the correct values. Kind of what you do in
nouveau_framebuffer_new() near line 330.

Thoughts?

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/70786/#rev3



Thanks,
-James

On 2/6/20 2:19 AM, Thomas Zimmermann wrote:

After its cleanup, struct nouveau_framebuffer is only a wrapper around
struct drm_framebuffer. Use the latter directly.

Signed-off-by: Thomas Zimmermann 
---
   drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 26 
+++

   drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++--
   drivers/gpu/drm/nouveau/nouveau_display.h | 12 +--
   drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 14 ++--
   4 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ba1399965a1c..4a67a656e007 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma 
*ctxdma)

   }
     static struct nv50_wndw_ctxdma *
-nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct
nouveau_framebuffer *fb)
+nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer 
*fb)

   {
-    struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
+    struct nouveau_drm *drm = nouveau_drm(fb->dev);
   struct nv50_wndw_ctxdma *ctxdma;
-    struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
   const u8    kind = nvbo->kind;
   const u32 handle = 0xfb00 | kind;
   struct {
@@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
  struct nv50_wndw_atom *asyw,
  struct nv50_head_atom *asyh)
   {
-    struct nouveau_framebuffer *fb =
nouveau_framebuffer(asyw->state.fb);
+    struct drm_framebuffer *fb = asyw->state.fb;
   struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-    struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
   int ret;
     NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
   -    if (asyw->state.fb != armw->state.fb || !armw->visible ||
modeset) {
-    asyw->image.w = fb->base.width;
-    asyw->image.h = fb->base.height;
+    if (fb != armw->state.fb || !armw->visible || modeset) {
+    asyw->image.w = fb->width;
+    asyw->image.h = fb->height;
   asyw->image.kind = nvbo->kind;
     ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
@@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
   asyw->image.blockh = nvbo->mode >> 4;
   else
   asyw->image.blockh = nvbo->mode;
-    asyw->image.blocks[0] = fb->base.pitches[0] / 64;
+    asyw->image.blocks[0] = fb->pitches[0] / 64;
   asyw->image.pitch[0] = 0;
   } else {
   asyw->image.layout = 1;
   asyw->image.blockh = 0;
   asyw->image.blocks[0] = 0;
-    asyw->image.pitch[0] = fb->base.pitches[0];
+    asyw->image.pitch[0] = fb->pitches[0];
   }
     if (!asyh->state.async_flip)
@@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane,
struct drm_plane_state *old_state)
   static int
   nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state
*state)
   {
-    struct nouveau_framebuffer

[PATCH v4 2/3] drm/nouveau: Check framebuffer size against bo

2020-02-07 Thread James Jones
Make sure framebuffer dimensions and tiling
parameters will not result in accesses beyond the
end of the GEM buffer they are bound to.

v3: Return EINVAL when creating FB against BO with
unsupported tiling

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 97 +++
 1 file changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 53f9bceaf17a..4273d9387cda 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,76 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static inline uint32_t
+nouveau_get_width_in_blocks(uint32_t stride)
+{
+   /* GOBs per block in the x direction is always one, and GOBs are
+* 64 bytes wide
+*/
+   static const uint32_t log_block_width = 6;
+
+   return (stride + (1 << log_block_width) - 1) >> log_block_width;
+}
+
+static inline uint32_t
+nouveau_get_height_in_blocks(struct nouveau_drm *drm,
+uint32_t height,
+uint32_t log_block_height_in_gobs)
+{
+   uint32_t log_gob_height;
+   uint32_t log_block_height;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   log_gob_height = 2;
+   else
+   log_gob_height = 3;
+
+   log_block_height = log_block_height_in_gobs + log_gob_height;
+
+   return (height + (1 << log_block_height) - 1) >> log_block_height;
+}
+
+static int
+nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo,
+ uint32_t offset, uint32_t stride, uint32_t h,
+ uint32_t tile_mode)
+{
+   uint32_t gob_size, bw, bh;
+   uint64_t bl_size;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.chipset >= 0xc0) {
+   if (tile_mode & 0xF)
+   return -EINVAL;
+   tile_mode >>= 4;
+   }
+
+   if (tile_mode & 0xFFF0)
+   return -EINVAL;
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   gob_size = 256;
+   else
+   gob_size = 512;
+
+   bw = nouveau_get_width_in_blocks(stride);
+   bh = nouveau_get_height_in_blocks(drm, h, tile_mode);
+
+   bl_size = bw * bh * (1 << tile_mode) * gob_size;
+
+   DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u 
gob_size=%u bl_size=%llu size=%lu\n",
+ offset, stride, h, tile_mode, bw, bh, gob_size, bl_size,
+ nvbo->bo.mem.size);
+
+   if (bl_size + offset > nvbo->bo.mem.size)
+   return -ERANGE;
+
+   return 0;
+}
+
 int
 nouveau_framebuffer_new(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd,
@@ -232,6 +302,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
 {
struct nouveau_drm *drm = nouveau_drm(dev);
struct nouveau_framebuffer *fb;
+   const struct drm_format_info *info;
+   unsigned int width, height, i;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -254,6 +326,31 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   info = drm_get_format_info(dev, mode_cmd);
+
+   for (i = 0; i < info->num_planes; i++) {
+   width = drm_format_info_plane_width(info,
+   mode_cmd->width,
+   i);
+   height = drm_format_info_plane_height(info,
+ mode_cmd->height,
+ i);
+
+   if (nvbo->kind) {
+   ret = nouveau_check_bl_size(drm, nvbo,
+   mode_cmd->offsets[i],
+   mode_cmd->pitches[i],
+   height, nvbo->mode);
+   if (ret)
+   return ret;
+   } else {
+   uint32_t size = mode_cmd->pitches[i] * height;
+
+   if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size)
+   return -ERANGE;
+   }
+   }
+
if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL)))
return -ENOMEM;
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v4 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp

2020-02-07 Thread James Jones
Advertise support for the full list of format
modifiers supported by each class of NVIDIA
desktop GPU display hardware.  Stash the array
of modifiers in the nouveau_display struct for
use when validating userspace framebuffer
creation requests, which will be supportd in
a subsequent change.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/base507c.c |  7 +--
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +
 drivers/gpu/drm/nouveau/dispnv50/disp.h |  4 ++
 drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +-
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++
 drivers/gpu/drm/nouveau/nouveau_display.h   |  2 +
 6 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c 
b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
index 00a85f1e1a4a..025b8f996a0a 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
@@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
struct nv50_disp_base_channel_dma_v0 args = {
.head = head,
};
-   struct nv50_disp *disp = nv50_disp(drm->dev);
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   struct nv50_disp *disp50 = nv50_disp(drm->dev);
struct nv50_wndw *wndw;
int ret;
 
@@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
if (*pwndw = wndw, ret)
return ret;
 
-   ret = nv50_dmac_create(>client.device, >disp->object,
+   ret = nv50_dmac_create(>client.device, >disp.object,
   , head, , sizeof(args),
-  disp->sync->bo.offset, >wndw);
+  disp50->sync->bo.offset, >wndw);
if (ret) {
NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret);
return ret;
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index a3dc2ba19fb2..f017d05072b8 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2481,6 +2481,15 @@ nv50_display_create(struct drm_device *dev)
if (ret)
goto out;
 
+   /* Assign the correct format modifiers */
+   if (disp->disp->object.oclass >= TU102_DISP)
+   nouveau_display(dev)->format_modifiers = wndwc57e_modifiers;
+   else
+   if (disp->disp->object.oclass >= GF110_DISP)
+   nouveau_display(dev)->format_modifiers = disp90xx_modifiers;
+   else
+   nouveau_display(dev)->format_modifiers = disp50xx_modifiers;
+
/* create crtc objects to represent the hw heads */
if (disp->disp->object.oclass >= GV100_DISP)
crtcs = nvif_rd32(>object, 0x610060) & 0xff;
@@ -2576,3 +2585,53 @@ nv50_display_create(struct drm_device *dev)
nv50_display_destroy(dev);
return ret;
 }
+
+/**
+ * Format modifiers
+ */
+
+/
+ *Log2(block height) +  *
+ *Page Kind --+  |  *
+ *Gob Height/Page Kind Generation --+ |  |  *
+ *  Sector layout ---+  | |  |  *
+ *  Compression --+  |  | |  |  */
+const u64 disp50xx_modifiers[] = { /* |  |  | |  |  */
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4),
+   DRM_FORMAT_MOD_NVIDIA_B

[PATCH v4 0/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-07 Thread James Jones
This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available on the
Mesa-dev gitlab merge request 3724:

  https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3724

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

v2: Used Tesla family instead of NV50 chipset compare to avoid treating
oddly numbered NV4x-class chipsets as NV50+ GPUs.  Other instances
of compares with chipset number in the series were audited, deemed
safe, and left as-is for consistency with existing code.

v3: -Rebased on nouveau linux-5.6 @ 137c4ba7163ad9d5696b9fde78b1c0898a9c115b
-Noted corresponding Mesa patches are production-worthy now
-Better validate bo tile_mode when checking framebuffer size.

v4: Do not cache kind, tile_mode in nouveau_framebuffer

James Jones (3):
  drm/nouveau: Add format mod prop to base/ovly/nvdisp
  drm/nouveau: Check framebuffer size against bo
  drm/nouveau: Support NVIDIA format modifiers

 drivers/gpu/drm/nouveau/dispnv50/base507c.c |   7 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c |  59 +++
 drivers/gpu/drm/nouveau/dispnv50/disp.h |   4 +
 drivers/gpu/drm/nouveau/dispnv50/wndw.c |  45 -
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c |  17 ++
 drivers/gpu/drm/nouveau/nouveau_display.c   | 183 
 drivers/gpu/drm/nouveau/nouveau_display.h   |   6 +
 7 files changed, 312 insertions(+), 9 deletions(-)

-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v4 3/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-07 Thread James Jones
Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against the list of modifiers supported
by the target display hardware.

v2: Used Tesla family instead of NV50 chipset compare
v4: Do not cache kind, tile_mode in nouveau_framebuffer

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 18 +++--
 drivers/gpu/drm/nouveau/nouveau_display.c | 90 ++-
 drivers/gpu/drm/nouveau/nouveau_display.h |  4 +
 3 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index a424ecfdf8e9..064e8825d451 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -43,8 +43,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)
 {
struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
struct nv50_wndw_ctxdma *ctxdma;
-   const u8kind = fb->nvbo->kind;
-   const u32 handle = 0xfb00 | kind;
+   u32 handle;
+   u32 unused;
+   u8  kind;
struct {
struct nv_dma_v0 base;
union {
@@ -56,6 +57,9 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)
u32 argc = sizeof(args.base);
int ret;
 
+   nouveau_framebuffer_get_layout(>base, , );
+   handle = 0xfb00 | kind;
+
list_for_each_entry(ctxdma, >ctxdma.list, head) {
if (ctxdma->object.handle == handle)
return ctxdma;
@@ -236,14 +240,18 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, 
bool modeset,
 {
struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb);
struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
+   uint8_t kind;
+   uint32_t tile_mode;
int ret;
 
NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
 
if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) {
+   nouveau_framebuffer_get_layout(>base, _mode, );
+
asyw->image.w = fb->base.width;
asyw->image.h = fb->base.height;
-   asyw->image.kind = fb->nvbo->kind;
+   asyw->image.kind = kind;
 
ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
if (ret) {
@@ -255,9 +263,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->image.kind) {
asyw->image.layout = 0;
if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = fb->nvbo->mode >> 4;
+   asyw->image.blockh = tile_mode >> 4;
else
-   asyw->image.blockh = fb->nvbo->mode;
+   asyw->image.blockh = tile_mode;
asyw->image.blocks[0] = fb->base.pitches[0] / 64;
asyw->image.pitch[0] = 0;
} else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 4273d9387cda..da8319182cf0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,76 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static void
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   BUG_ON(!tile_mode || !kind);
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+* Extract the block height and kind from the corresponding
+* modifier fields.  See drm_fourcc.h for details.
+*/
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+}
+
+void
+nouveau_framebuffer_get_layout(struct drm_framebuffer *fb,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   if (fb->flags & DRM_MODE_FB_MODIFIERS) {
+   struct nouveau_drm *drm = nouveau_drm(fb->dev);
+
+   nouveau_decode_mod(drm, fb->modifier, tile_mode, kind);
+   } else {
+   

Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer

2020-02-06 Thread James Jones
Yes, that's certainly viable.  If that's the general preference in 
direction, I'll rework that patches to do so.


Thanks,
-James

On 2/6/20 7:49 AM, Thomas Zimmermann wrote:

Hi James

Am 06.02.20 um 16:17 schrieb James Jones:

Note I'm adding some fields to nouveau_framebuffer in the series
"drm/nouveau: Support NVIDIA format modifiers."  I sent out v3 of that
yesterday.  It would probably still be possible to avoid them by
re-extracting the relevant data from the format modifier on the fly when
needed, but it is simpler and likely less error-prone with the wrapper
struct.


Thanks for the note.

I just took a look at your patchset. I think struct nouveau_framebuffer
should not store tile_mode and kind. AFAICT there are only two trivial
places where these values are used and they can be extracted from the
framebuffer at any time.

I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and
return the correct values. Kind of what you do in
nouveau_framebuffer_new() near line 330.

Thoughts?

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/70786/#rev3



Thanks,
-James

On 2/6/20 2:19 AM, Thomas Zimmermann wrote:

After its cleanup, struct nouveau_framebuffer is only a wrapper around
struct drm_framebuffer. Use the latter directly.

Signed-off-by: Thomas Zimmermann 
---
   drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 26 +++
   drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++--
   drivers/gpu/drm/nouveau/nouveau_display.h | 12 +--
   drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 14 ++--
   4 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ba1399965a1c..4a67a656e007 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma)
   }
     static struct nv50_wndw_ctxdma *
-nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct
nouveau_framebuffer *fb)
+nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb)
   {
-    struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
+    struct nouveau_drm *drm = nouveau_drm(fb->dev);
   struct nv50_wndw_ctxdma *ctxdma;
-    struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
   const u8    kind = nvbo->kind;
   const u32 handle = 0xfb00 | kind;
   struct {
@@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
  struct nv50_wndw_atom *asyw,
  struct nv50_head_atom *asyh)
   {
-    struct nouveau_framebuffer *fb =
nouveau_framebuffer(asyw->state.fb);
+    struct drm_framebuffer *fb = asyw->state.fb;
   struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-    struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
   int ret;
     NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
   -    if (asyw->state.fb != armw->state.fb || !armw->visible ||
modeset) {
-    asyw->image.w = fb->base.width;
-    asyw->image.h = fb->base.height;
+    if (fb != armw->state.fb || !armw->visible || modeset) {
+    asyw->image.w = fb->width;
+    asyw->image.h = fb->height;
   asyw->image.kind = nvbo->kind;
     ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
@@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
   asyw->image.blockh = nvbo->mode >> 4;
   else
   asyw->image.blockh = nvbo->mode;
-    asyw->image.blocks[0] = fb->base.pitches[0] / 64;
+    asyw->image.blocks[0] = fb->pitches[0] / 64;
   asyw->image.pitch[0] = 0;
   } else {
   asyw->image.layout = 1;
   asyw->image.blockh = 0;
   asyw->image.blocks[0] = 0;
-    asyw->image.pitch[0] = fb->base.pitches[0];
+    asyw->image.pitch[0] = fb->pitches[0];
   }
     if (!asyh->state.async_flip)
@@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane,
struct drm_plane_state *old_state)
   static int
   nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state
*state)
   {
-    struct nouveau_framebuffer *fb = nouveau_framebuffer(state->fb);
+    struct drm_framebuffer *fb = state->fb;
   struct nouveau_drm *drm = nouveau_drm(plane->dev);
   struct nv50_wndw *wndw = nv50_wndw(plane);
   struct nv50_wndw_atom *asyw = nv50_wndw_atom(state);
-    struct nouveau_bo *nvbo = nouveau_gem_object(state->fb->obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
 

Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer

2020-02-06 Thread James Jones
Yes, that's certainly viable.  If that's the general preference in 
direction, I'll rework that patches to do so.


Thanks,
-James

On 2/6/20 7:49 AM, Thomas Zimmermann wrote:

Hi James

Am 06.02.20 um 16:17 schrieb James Jones:

Note I'm adding some fields to nouveau_framebuffer in the series
"drm/nouveau: Support NVIDIA format modifiers."  I sent out v3 of that
yesterday.  It would probably still be possible to avoid them by
re-extracting the relevant data from the format modifier on the fly when
needed, but it is simpler and likely less error-prone with the wrapper
struct.


Thanks for the note.

I just took a look at your patchset. I think struct nouveau_framebuffer
should not store tile_mode and kind. AFAICT there are only two trivial
places where these values are used and they can be extracted from the
framebuffer at any time.

I'd suggest to expand nouveau_decode_mod() to take a drm_framebuffer and
return the correct values. Kind of what you do in
nouveau_framebuffer_new() near line 330.

Thoughts?

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/70786/#rev3



Thanks,
-James

On 2/6/20 2:19 AM, Thomas Zimmermann wrote:

After its cleanup, struct nouveau_framebuffer is only a wrapper around
struct drm_framebuffer. Use the latter directly.

Signed-off-by: Thomas Zimmermann 
---
   drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 26 +++
   drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++--
   drivers/gpu/drm/nouveau/nouveau_display.h | 12 +--
   drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 14 ++--
   4 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ba1399965a1c..4a67a656e007 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma)
   }
     static struct nv50_wndw_ctxdma *
-nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct
nouveau_framebuffer *fb)
+nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb)
   {
-    struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
+    struct nouveau_drm *drm = nouveau_drm(fb->dev);
   struct nv50_wndw_ctxdma *ctxdma;
-    struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
   const u8    kind = nvbo->kind;
   const u32 handle = 0xfb00 | kind;
   struct {
@@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
  struct nv50_wndw_atom *asyw,
  struct nv50_head_atom *asyh)
   {
-    struct nouveau_framebuffer *fb =
nouveau_framebuffer(asyw->state.fb);
+    struct drm_framebuffer *fb = asyw->state.fb;
   struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-    struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
   int ret;
     NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
   -    if (asyw->state.fb != armw->state.fb || !armw->visible ||
modeset) {
-    asyw->image.w = fb->base.width;
-    asyw->image.h = fb->base.height;
+    if (fb != armw->state.fb || !armw->visible || modeset) {
+    asyw->image.w = fb->width;
+    asyw->image.h = fb->height;
   asyw->image.kind = nvbo->kind;
     ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
@@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw
*wndw, bool modeset,
   asyw->image.blockh = nvbo->mode >> 4;
   else
   asyw->image.blockh = nvbo->mode;
-    asyw->image.blocks[0] = fb->base.pitches[0] / 64;
+    asyw->image.blocks[0] = fb->pitches[0] / 64;
   asyw->image.pitch[0] = 0;
   } else {
   asyw->image.layout = 1;
   asyw->image.blockh = 0;
   asyw->image.blocks[0] = 0;
-    asyw->image.pitch[0] = fb->base.pitches[0];
+    asyw->image.pitch[0] = fb->pitches[0];
   }
     if (!asyh->state.async_flip)
@@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane,
struct drm_plane_state *old_state)
   static int
   nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state
*state)
   {
-    struct nouveau_framebuffer *fb = nouveau_framebuffer(state->fb);
+    struct drm_framebuffer *fb = state->fb;
   struct nouveau_drm *drm = nouveau_drm(plane->dev);
   struct nv50_wndw *wndw = nv50_wndw(plane);
   struct nv50_wndw_atom *asyw = nv50_wndw_atom(state);
-    struct nouveau_bo *nvbo = nouveau_gem_object(state->fb->obj[0]);
+    struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
 

Re: [Nouveau] [PATCH 4/4] drm/nouveau: Remove struct nouveau_framebuffer

2020-02-06 Thread James Jones
Note I'm adding some fields to nouveau_framebuffer in the series 
"drm/nouveau: Support NVIDIA format modifiers."  I sent out v3 of that 
yesterday.  It would probably still be possible to avoid them by 
re-extracting the relevant data from the format modifier on the fly when 
needed, but it is simpler and likely less error-prone with the wrapper 
struct.


Thanks,
-James

On 2/6/20 2:19 AM, Thomas Zimmermann wrote:

After its cleanup, struct nouveau_framebuffer is only a wrapper around
struct drm_framebuffer. Use the latter directly.

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 26 +++
  drivers/gpu/drm/nouveau/nouveau_display.c | 14 ++--
  drivers/gpu/drm/nouveau/nouveau_display.h | 12 +--
  drivers/gpu/drm/nouveau/nouveau_fbcon.c   | 14 ++--
  4 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ba1399965a1c..4a67a656e007 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -40,11 +40,11 @@ nv50_wndw_ctxdma_del(struct nv50_wndw_ctxdma *ctxdma)
  }
  
  static struct nv50_wndw_ctxdma *

-nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct nouveau_framebuffer *fb)
+nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct drm_framebuffer *fb)
  {
-   struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
+   struct nouveau_drm *drm = nouveau_drm(fb->dev);
struct nv50_wndw_ctxdma *ctxdma;
-   struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+   struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
const u8kind = nvbo->kind;
const u32 handle = 0xfb00 | kind;
struct {
@@ -236,16 +236,16 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, 
bool modeset,
   struct nv50_wndw_atom *asyw,
   struct nv50_head_atom *asyh)
  {
-   struct nouveau_framebuffer *fb = nouveau_framebuffer(asyw->state.fb);
+   struct drm_framebuffer *fb = asyw->state.fb;
struct nouveau_drm *drm = nouveau_drm(wndw->plane.dev);
-   struct nouveau_bo *nvbo = nouveau_gem_object(fb->base.obj[0]);
+   struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
int ret;
  
  	NV_ATOMIC(drm, "%s acquire\n", wndw->plane.name);
  
-	if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) {

-   asyw->image.w = fb->base.width;
-   asyw->image.h = fb->base.height;
+   if (fb != armw->state.fb || !armw->visible || modeset) {
+   asyw->image.w = fb->width;
+   asyw->image.h = fb->height;
asyw->image.kind = nvbo->kind;
  
  		ret = nv50_wndw_atomic_check_acquire_rgb(asyw);

@@ -261,13 +261,13 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, 
bool modeset,
asyw->image.blockh = nvbo->mode >> 4;
else
asyw->image.blockh = nvbo->mode;
-   asyw->image.blocks[0] = fb->base.pitches[0] / 64;
+   asyw->image.blocks[0] = fb->pitches[0] / 64;
asyw->image.pitch[0] = 0;
} else {
asyw->image.layout = 1;
asyw->image.blockh = 0;
asyw->image.blocks[0] = 0;
-   asyw->image.pitch[0] = fb->base.pitches[0];
+   asyw->image.pitch[0] = fb->pitches[0];
}
  
  		if (!asyh->state.async_flip)

@@ -486,16 +486,16 @@ nv50_wndw_cleanup_fb(struct drm_plane *plane, struct 
drm_plane_state *old_state)
  static int
  nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state)
  {
-   struct nouveau_framebuffer *fb = nouveau_framebuffer(state->fb);
+   struct drm_framebuffer *fb = state->fb;
struct nouveau_drm *drm = nouveau_drm(plane->dev);
struct nv50_wndw *wndw = nv50_wndw(plane);
struct nv50_wndw_atom *asyw = nv50_wndw_atom(state);
-   struct nouveau_bo *nvbo = nouveau_gem_object(state->fb->obj[0]);
+   struct nouveau_bo *nvbo = nouveau_gem_object(fb->obj[0]);
struct nv50_head_atom *asyh;
struct nv50_wndw_ctxdma *ctxdma;
int ret;
  
-	NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, state->fb);

+   NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, fb);
if (!asyw->state.fb)
return 0;
  
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c

index bbbff55eb5d5..94f7fd48e1cf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -207,10 +207,10 @@ int
  nouveau_framebuffer_new(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd,
struct drm_gem_object *gem,
- 

Re: [Nouveau] [PATCH v2 0/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-05 Thread James Jones




On 1/6/20 3:27 PM, Ben Skeggs wrote:

On Tue, 7 Jan 2020 at 05:17, James Jones  wrote:


On 1/5/20 5:30 PM, Ben Skeggs wrote:

On Tue, 17 Dec 2019 at 10:44, James Jones  wrote:


This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available here:

https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work

But those need a bit of cleanup before they're ready to submit.

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

v2: Used Tesla family instead of NV50 chipset compare to avoid treating
  oddly numbered NV4x-class chipsets as NV50+ GPUs.  Other instances
  of compares with chipset number in the series were audited, deemed
  safe, and left as-is for consistency with existing code.

Hey James,

These look OK to me, with the minor issue I mentioned on one of the
patches dealt with.  I'll hold off merging anything until I get the
go-ahead that the modifier definitions are definitely set in stone /
userspace is ready for inclusion.


Thanks for having a look.  I'll try to get the userspace changes
finalized soon.  I think from the NV side, we consider the modifier
definition itself (the v3 version of the patch) final, so if there's any
stand-alone feedback from yourself or other drm/nouveau developers on
that layout, we'd be eager to hear it.  I don't want it rushed in, but
we do have several projects blocked on getting that approved & committed.

I assume the sequencing should be:

* Fix the minor issue you identified here/complete review of nouveau
kernel patches
* Complete review of the related TegraDRM new modifier support patch
* Finalize and complete review of userspace/Mesa nouveau modifier
support patches
* Get drm_fourcc.h updates committed
* Get these patches and TegraDRM patches committed
* Integrate final drm_fourcc.h to Mesa patches and get Mesa patches
committed

Does that sound right to you?

Seems very reasonable!


Thanks.  I needed to do more cleanup than I expected (a rewrite in the 
end), but the corresponding Mesa patches are out for review now, and 
I've sent out v3 of this patchset to address the remaining issue raised 
here.


Thanks,
-James


Ben.



Thanks,
-James


Thanks,
Ben.



James Jones (3):
drm/nouveau: Add format mod prop to base/ovly/nvdisp
drm/nouveau: Check framebuffer size against bo
drm/nouveau: Support NVIDIA format modifiers

   drivers/gpu/drm/nouveau/dispnv50/base507c.c |   7 +-
   drivers/gpu/drm/nouveau/dispnv50/disp.c |  59 
   drivers/gpu/drm/nouveau/dispnv50/disp.h |   4 +
   drivers/gpu/drm/nouveau/dispnv50/wndw.c |  35 -
   drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c |  17 +++
   drivers/gpu/drm/nouveau/nouveau_display.c   | 154 
   drivers/gpu/drm/nouveau/nouveau_display.h   |   4 +
   7 files changed, 272 insertions(+), 8 deletions(-)

--
2.17.1

___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3 3/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-05 Thread James Jones
Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against he list of modifiers supported
by the target display hardware.

v2: Used Tesla family instead of NV50 chipset compare

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   |  8 +--
 drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++-
 drivers/gpu/drm/nouveau/nouveau_display.h |  2 +
 3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index a424ecfdf8e9..0047ba710da0 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)
 {
struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
struct nv50_wndw_ctxdma *ctxdma;
-   const u8kind = fb->nvbo->kind;
+   const u8kind = fb->kind;
const u32 handle = 0xfb00 | kind;
struct {
struct nv_dma_v0 base;
@@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) {
asyw->image.w = fb->base.width;
asyw->image.h = fb->base.height;
-   asyw->image.kind = fb->nvbo->kind;
+   asyw->image.kind = fb->kind;
 
ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
if (ret) {
@@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->image.kind) {
asyw->image.layout = 0;
if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = fb->nvbo->mode >> 4;
+   asyw->image.blockh = fb->tile_mode >> 4;
else
-   asyw->image.blockh = fb->nvbo->mode;
+   asyw->image.blockh = fb->tile_mode;
asyw->image.blocks[0] = fb->base.pitches[0] / 64;
asyw->image.pitch[0] = 0;
} else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 4273d9387cda..05bb077a9dd9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static int
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   int mod;
+
+   BUG_ON(!tile_mode || !kind);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA) {
+   return -EINVAL;
+   }
+
+   BUG_ON(!disp->format_modifiers);
+
+   for (mod = 0;
+(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(disp->format_modifiers[mod] != modifier);
+mod++);
+
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+* Extract the block height and kind from the corresponding
+* modifier fields.  See drm_fourcc.h for details.
+*/
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+
+   return 0;
+}
+
 static inline uint32_t
 nouveau_get_width_in_blocks(uint32_t stride)
 {
@@ -304,6 +348,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
struct nouveau_framebuffer *fb;
const struct drm_format_info *info;
unsigned int width, height, i;
+   uint32_t tile_mode;
+   uint8_t kind;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -326,6 +372,18 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
+   if (nouveau_decode_mod(drm, mode_cmd->modifier[0], _mode,
+   

[PATCH v3 0/3] drm/nouveau: Support NVIDIA format modifiers

2020-02-05 Thread James Jones
This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available on the
Mesa-dev mailing list as the series:

  nouveau: Improved format modifier support

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

v2: Used Tesla family instead of NV50 chipset compare to avoid treating
oddly numbered NV4x-class chipsets as NV50+ GPUs.  Other instances
of compares with chipset number in the series were audited, deemed
safe, and left as-is for consistency with existing code.

v3: -Rebased on nouveau linux-5.6 @ 137c4ba7163ad9d5696b9fde78b1c0898a9c115b
-Noted corresponding Mesa patches are production-worthy now
-Better validate bo tile_mode when checking framebuffer size.

James Jones (3):
  drm/nouveau: Add format mod prop to base/ovly/nvdisp
  drm/nouveau: Check framebuffer size against bo
  drm/nouveau: Support NVIDIA format modifiers

 drivers/gpu/drm/nouveau/dispnv50/base507c.c |   7 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c |  59 
 drivers/gpu/drm/nouveau/dispnv50/disp.h |   4 +
 drivers/gpu/drm/nouveau/dispnv50/wndw.c |  35 -
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c |  17 +++
 drivers/gpu/drm/nouveau/nouveau_display.c   | 158 
 drivers/gpu/drm/nouveau/nouveau_display.h   |   4 +
 7 files changed, 276 insertions(+), 8 deletions(-)

-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3 2/3] drm/nouveau: Check framebuffer size against bo

2020-02-05 Thread James Jones
Make sure framebuffer dimensions and tiling
parameters will not result in accesses beyond the
end of the GEM buffer they are bound to.

v3: Return EINVAL when creating FB against BO with
unsupported tiling

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 97 +++
 1 file changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 53f9bceaf17a..4273d9387cda 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,76 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static inline uint32_t
+nouveau_get_width_in_blocks(uint32_t stride)
+{
+   /* GOBs per block in the x direction is always one, and GOBs are
+* 64 bytes wide
+*/
+   static const uint32_t log_block_width = 6;
+
+   return (stride + (1 << log_block_width) - 1) >> log_block_width;
+}
+
+static inline uint32_t
+nouveau_get_height_in_blocks(struct nouveau_drm *drm,
+uint32_t height,
+uint32_t log_block_height_in_gobs)
+{
+   uint32_t log_gob_height;
+   uint32_t log_block_height;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   log_gob_height = 2;
+   else
+   log_gob_height = 3;
+
+   log_block_height = log_block_height_in_gobs + log_gob_height;
+
+   return (height + (1 << log_block_height) - 1) >> log_block_height;
+}
+
+static int
+nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo,
+ uint32_t offset, uint32_t stride, uint32_t h,
+ uint32_t tile_mode)
+{
+   uint32_t gob_size, bw, bh;
+   uint64_t bl_size;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.chipset >= 0xc0) {
+   if (tile_mode & 0xF)
+   return -EINVAL;
+   tile_mode >>= 4;
+   }
+
+   if (tile_mode & 0xFFF0)
+   return -EINVAL;
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   gob_size = 256;
+   else
+   gob_size = 512;
+
+   bw = nouveau_get_width_in_blocks(stride);
+   bh = nouveau_get_height_in_blocks(drm, h, tile_mode);
+
+   bl_size = bw * bh * (1 << tile_mode) * gob_size;
+
+   DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u 
gob_size=%u bl_size=%llu size=%lu\n",
+ offset, stride, h, tile_mode, bw, bh, gob_size, bl_size,
+ nvbo->bo.mem.size);
+
+   if (bl_size + offset > nvbo->bo.mem.size)
+   return -ERANGE;
+
+   return 0;
+}
+
 int
 nouveau_framebuffer_new(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd,
@@ -232,6 +302,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
 {
struct nouveau_drm *drm = nouveau_drm(dev);
struct nouveau_framebuffer *fb;
+   const struct drm_format_info *info;
+   unsigned int width, height, i;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -254,6 +326,31 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   info = drm_get_format_info(dev, mode_cmd);
+
+   for (i = 0; i < info->num_planes; i++) {
+   width = drm_format_info_plane_width(info,
+   mode_cmd->width,
+   i);
+   height = drm_format_info_plane_height(info,
+ mode_cmd->height,
+ i);
+
+   if (nvbo->kind) {
+   ret = nouveau_check_bl_size(drm, nvbo,
+   mode_cmd->offsets[i],
+   mode_cmd->pitches[i],
+   height, nvbo->mode);
+   if (ret)
+   return ret;
+   } else {
+   uint32_t size = mode_cmd->pitches[i] * height;
+
+   if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size)
+   return -ERANGE;
+   }
+   }
+
if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL)))
return -ENOMEM;
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v3 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp

2020-02-05 Thread James Jones
Advertise support for the full list of format
modifiers supported by each class of NVIDIA
desktop GPU display hardware.  Stash the array
of modifiers in the nouveau_display struct for
use when validating userspace framebuffer
creation requests, which will be supportd in
a subsequent change.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/base507c.c |  7 +--
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +
 drivers/gpu/drm/nouveau/dispnv50/disp.h |  4 ++
 drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +-
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++
 drivers/gpu/drm/nouveau/nouveau_display.h   |  2 +
 6 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c 
b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
index 00a85f1e1a4a..025b8f996a0a 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
@@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
struct nv50_disp_base_channel_dma_v0 args = {
.head = head,
};
-   struct nv50_disp *disp = nv50_disp(drm->dev);
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   struct nv50_disp *disp50 = nv50_disp(drm->dev);
struct nv50_wndw *wndw;
int ret;
 
@@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
if (*pwndw = wndw, ret)
return ret;
 
-   ret = nv50_dmac_create(>client.device, >disp->object,
+   ret = nv50_dmac_create(>client.device, >disp.object,
   , head, , sizeof(args),
-  disp->sync->bo.offset, >wndw);
+  disp50->sync->bo.offset, >wndw);
if (ret) {
NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret);
return ret;
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index a3dc2ba19fb2..f017d05072b8 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2481,6 +2481,15 @@ nv50_display_create(struct drm_device *dev)
if (ret)
goto out;
 
+   /* Assign the correct format modifiers */
+   if (disp->disp->object.oclass >= TU102_DISP)
+   nouveau_display(dev)->format_modifiers = wndwc57e_modifiers;
+   else
+   if (disp->disp->object.oclass >= GF110_DISP)
+   nouveau_display(dev)->format_modifiers = disp90xx_modifiers;
+   else
+   nouveau_display(dev)->format_modifiers = disp50xx_modifiers;
+
/* create crtc objects to represent the hw heads */
if (disp->disp->object.oclass >= GV100_DISP)
crtcs = nvif_rd32(>object, 0x610060) & 0xff;
@@ -2576,3 +2585,53 @@ nv50_display_create(struct drm_device *dev)
nv50_display_destroy(dev);
return ret;
 }
+
+/**
+ * Format modifiers
+ */
+
+/
+ *Log2(block height) +  *
+ *Page Kind --+  |  *
+ *Gob Height/Page Kind Generation --+ |  |  *
+ *  Sector layout ---+  | |  |  *
+ *  Compression --+  |  | |  |  */
+const u64 disp50xx_modifiers[] = { /* |  |  | |  |  */
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4),
+   DRM_FORMAT_MOD_NVIDIA_B

Re: [Nouveau] [PATCH v2 0/3] drm/nouveau: Support NVIDIA format modifiers

2020-01-06 Thread James Jones

On 1/5/20 5:30 PM, Ben Skeggs wrote:

On Tue, 17 Dec 2019 at 10:44, James Jones  wrote:


This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available here:

https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work

But those need a bit of cleanup before they're ready to submit.

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

v2: Used Tesla family instead of NV50 chipset compare to avoid treating
 oddly numbered NV4x-class chipsets as NV50+ GPUs.  Other instances
 of compares with chipset number in the series were audited, deemed
 safe, and left as-is for consistency with existing code.

Hey James,

These look OK to me, with the minor issue I mentioned on one of the
patches dealt with.  I'll hold off merging anything until I get the
go-ahead that the modifier definitions are definitely set in stone /
userspace is ready for inclusion.


Thanks for having a look.  I'll try to get the userspace changes 
finalized soon.  I think from the NV side, we consider the modifier 
definition itself (the v3 version of the patch) final, so if there's any 
stand-alone feedback from yourself or other drm/nouveau developers on 
that layout, we'd be eager to hear it.  I don't want it rushed in, but 
we do have several projects blocked on getting that approved & committed.


I assume the sequencing should be:

* Fix the minor issue you identified here/complete review of nouveau 
kernel patches

* Complete review of the related TegraDRM new modifier support patch
* Finalize and complete review of userspace/Mesa nouveau modifier 
support patches

* Get drm_fourcc.h updates committed
* Get these patches and TegraDRM patches committed
* Integrate final drm_fourcc.h to Mesa patches and get Mesa patches 
committed


Does that sound right to you?

Thanks,
-James


Thanks,
Ben.



James Jones (3):
   drm/nouveau: Add format mod prop to base/ovly/nvdisp
   drm/nouveau: Check framebuffer size against bo
   drm/nouveau: Support NVIDIA format modifiers

  drivers/gpu/drm/nouveau/dispnv50/base507c.c |   7 +-
  drivers/gpu/drm/nouveau/dispnv50/disp.c |  59 
  drivers/gpu/drm/nouveau/dispnv50/disp.h |   4 +
  drivers/gpu/drm/nouveau/dispnv50/wndw.c |  35 -
  drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c |  17 +++
  drivers/gpu/drm/nouveau/nouveau_display.c   | 154 
  drivers/gpu/drm/nouveau/nouveau_display.h   |   4 +
  7 files changed, 272 insertions(+), 8 deletions(-)

--
2.17.1

___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH v2 2/3] drm/nouveau: Check framebuffer size against bo

2020-01-06 Thread James Jones

On 1/5/20 5:25 PM, Ben Skeggs wrote:

On Tue, 17 Dec 2019 at 10:45, James Jones  wrote:


Make sure framebuffer dimensions and tiling
parameters will not result in accesses beyond the
end of the GEM buffer they are bound to.

Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 93 +++
  1 file changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 6f038511a03a..f1509392d7b7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,72 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
 .create_handle = nouveau_user_framebuffer_create_handle,
  };

+static inline uint32_t
+nouveau_get_width_in_blocks(uint32_t stride)
+{
+   /* GOBs per block in the x direction is always one, and GOBs are
+* 64 bytes wide
+*/
+   static const uint32_t log_block_width = 6;
+
+   return (stride + (1 << log_block_width) - 1) >> log_block_width;
+}
+
+static inline uint32_t
+nouveau_get_height_in_blocks(struct nouveau_drm *drm,
+uint32_t height,
+uint32_t log_block_height_in_gobs)
+{
+   uint32_t log_gob_height;
+   uint32_t log_block_height;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   log_gob_height = 2;
+   else
+   log_gob_height = 3;
+
+   log_block_height = log_block_height_in_gobs + log_gob_height;
+
+   return (height + (1 << log_block_height) - 1) >> log_block_height;
+}
+
+static int
+nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo,
+ uint32_t offset, uint32_t stride, uint32_t h,
+ uint32_t tile_mode)
+{
+   uint32_t gob_size, bw, bh;
+   uint64_t bl_size;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   tile_mode >>= 4;
+
+   BUG_ON(tile_mode & 0xFFF0);

As far as I can tell, tile_mode can be fed into this function
unsanitised from userspace, so we probably want something different to
a BUG_ON() here.


Good catch.  I had assumed nouveau_bo::mode was validated at creation 
time.  I'll get this fixed up.


Thanks,
-James


+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   gob_size = 256;
+   else
+   gob_size = 512;
+
+   bw = nouveau_get_width_in_blocks(stride);
+   bh = nouveau_get_height_in_blocks(drm, h, tile_mode);
+
+   bl_size = bw * bh * (1 << tile_mode) * gob_size;
+
+   DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u 
gob_size=%u bl_size=%llu size=%lu\n",
+ offset, stride, h, tile_mode, bw, bh, gob_size, bl_size,
+ nvbo->bo.mem.size);
+
+   if (bl_size + offset > nvbo->bo.mem.size)
+   return -ERANGE;
+
+   return 0;
+}
+
  int
  nouveau_framebuffer_new(struct drm_device *dev,
 const struct drm_mode_fb_cmd2 *mode_cmd,
@@ -232,6 +298,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
  {
 struct nouveau_drm *drm = nouveau_drm(dev);
 struct nouveau_framebuffer *fb;
+   const struct drm_format_info *info;
+   unsigned int width, height, i;
 int ret;

  /* YUV overlays have special requirements pre-NV50 */
@@ -254,6 +322,31 @@ nouveau_framebuffer_new(struct drm_device *dev,
 return -EINVAL;
 }

+   info = drm_get_format_info(dev, mode_cmd);
+
+   for (i = 0; i < info->num_planes; i++) {
+   width = drm_format_info_plane_width(info,
+   mode_cmd->width,
+   i);
+   height = drm_format_info_plane_height(info,
+ mode_cmd->height,
+ i);
+
+   if (nvbo->kind) {
+   ret = nouveau_check_bl_size(drm, nvbo,
+   mode_cmd->offsets[i],
+   mode_cmd->pitches[i],
+   height, nvbo->mode);
+   if (ret)
+   return ret;
+   } else {
+   uint32_t size = mode_cmd->pitches[i] * height;
+
+   if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size)
+   return -ERANGE;
+   }
+   }
+
 if (!(fb = *pfb = kzalloc(si

[PATCH] drm/nouveau: Add correct turing page kinds

2019-12-16 Thread James Jones
Turing introduced a new simplified page kind
scheme, reducing the number of possible page
kinds from 256 to 16.  It also is the first
NVIDIA GPU in which the highest possible page
kind value is not reserved as an "invalid" page
kind.

To address this, the invalid page kind is made
an explicit property of the MMU HAL, and a new
table of page kinds is added to the tu102 MMU
HAL.

One hardware change not addressed here is that
0x00 is technically no longer a supported page
kind, and pitch surfaces are instead intended to
share the block-linear generic page kind 0x06.
However, because that will be a rather invasive
change to nouveau and 0x00 still works fine in
practice on Turing hardware, addressing this new
behavior is deferred.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/include/nvif/if0008.h|  2 +-
 drivers/gpu/drm/nouveau/include/nvif/mmu.h   |  4 ++--
 drivers/gpu/drm/nouveau/nvif/mmu.c   |  1 +
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c  |  3 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c  |  3 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c   |  3 ++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/priv.h   |  8 
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/tu102.c  | 16 +++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/ummu.c   |  7 +--
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmgf100.c   |  6 +++---
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c   |  6 +++---
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c|  6 +++---
 12 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if0008.h 
b/drivers/gpu/drm/nouveau/include/nvif/if0008.h
index 8450127420f5..c21d09f04f1d 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if0008.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if0008.h
@@ -35,7 +35,7 @@ struct nvif_mmu_type_v0 {
 
 struct nvif_mmu_kind_v0 {
__u8  version;
-   __u8  pad01[1];
+   __u8  kind_inv;
__u16 count;
__u8  data[];
 };
diff --git a/drivers/gpu/drm/nouveau/include/nvif/mmu.h 
b/drivers/gpu/drm/nouveau/include/nvif/mmu.h
index 747ecf67e403..cec1e88a0a05 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/mmu.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/mmu.h
@@ -7,6 +7,7 @@ struct nvif_mmu {
u8  dmabits;
u8  heap_nr;
u8  type_nr;
+   u8  kind_inv;
u16 kind_nr;
s32 mem;
 
@@ -36,9 +37,8 @@ void nvif_mmu_fini(struct nvif_mmu *);
 static inline bool
 nvif_mmu_kind_valid(struct nvif_mmu *mmu, u8 kind)
 {
-   const u8 invalid = mmu->kind_nr - 1;
if (kind) {
-   if (kind >= mmu->kind_nr || mmu->kind[kind] == invalid)
+   if (kind >= mmu->kind_nr || mmu->kind[kind] == mmu->kind_inv)
return false;
}
return true;
diff --git a/drivers/gpu/drm/nouveau/nvif/mmu.c 
b/drivers/gpu/drm/nouveau/nvif/mmu.c
index 5641bda2046d..47efc408efa6 100644
--- a/drivers/gpu/drm/nouveau/nvif/mmu.c
+++ b/drivers/gpu/drm/nouveau/nvif/mmu.c
@@ -121,6 +121,7 @@ nvif_mmu_init(struct nvif_object *parent, s32 oclass, 
struct nvif_mmu *mmu)
   kind, argc);
if (ret == 0)
memcpy(mmu->kind, kind->data, kind->count);
+   mmu->kind_inv = kind->kind_inv;
kfree(kind);
}
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c
index 2d075246dc46..2cd5ec81c0d0 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gf100.c
@@ -30,7 +30,7 @@
  * The value 0xff represents an invalid storage type.
  */
 const u8 *
-gf100_mmu_kind(struct nvkm_mmu *mmu, int *count)
+gf100_mmu_kind(struct nvkm_mmu *mmu, int *count, u8 *invalid)
 {
static const u8
kind[256] = {
@@ -69,6 +69,7 @@ gf100_mmu_kind(struct nvkm_mmu *mmu, int *count)
};
 
*count = ARRAY_SIZE(kind);
+   *invalid = 0xff;
return kind;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c
index dbf644ebac97..83990c83f9f8 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/gm200.c
@@ -27,7 +27,7 @@
 #include 
 
 const u8 *
-gm200_mmu_kind(struct nvkm_mmu *mmu, int *count)
+gm200_mmu_kind(struct nvkm_mmu *mmu, int *count, u8 *invalid)
 {
static const u8
kind[256] = {
@@ -65,6 +65,7 @@ gm200_mmu_kind(struct nvkm_mmu *mmu, int *count)
0xfe, 0xfe, 0xfe, 0xfe, 0xff, 0xfd, 0xfe, 0xff
};
*count = ARRAY_SIZE(kind);
+   *invalid = 0xff;
return kind;
 }
 
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c
index db3dfbbb2aa0..c0083ddda65a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/nv50.c
+++ b/driver

[PATCH] drm/nouveau: Fix ttm move init with multiple GPUs

2019-12-16 Thread James Jones
The pointer used to walk the table of move ops
and pick the right one for the current GPU was
declared static, meaning its state was carried
over between invocations of the function, and also
made the function non-rentrant and thread-unsafe.
Since the table is ordered such that newer GPU
methods are listed first, the result of this was
that initializing newer GPUs after older GPUs
would result in no suitable ttm move acceleration
operations being found, and ttm would fall back
to CPU blits on the older GPUs.

This change declares the walking pointer
separately from the table and makes it non-static
to fix the logic.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index f8015e0318d7..1b62ccc57aef 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1162,7 +1162,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict, bool intr,
 void
 nouveau_bo_move_init(struct nouveau_drm *drm)
 {
-   static const struct {
+   static const struct _method_table {
const char *name;
int engine;
s32 oclass;
@@ -1192,7 +1192,8 @@ nouveau_bo_move_init(struct nouveau_drm *drm)
{  "M2MF", 0, 0x0039, nv04_bo_move_m2mf, nv04_bo_move_init },
{},
{ "CRYPT", 0, 0x88b4, nv98_bo_move_exec, nv50_bo_move_init },
-   }, *mthd = _methods;
+   };
+   const struct _method_table *mthd = _methods;
const char *name = "CPU";
int ret;
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/tegra: Use more descriptive format modifiers

2019-12-16 Thread James Jones
Advertise and accept both the existing
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based format
modifiers and the more descriptive
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D()-based
format modifiers, preserving backwards
compatibility with existing userspace drivers, but
providing forwards compatibility with future
userspace drivers that also make use of the more
descriptive modifiers to enable differentiation
between desktop and tegra, as well as compressed
and non-compressed surfaces.

This patch depends on the "[PATCH v3] drm: Generalized NV Block Linear DRM
format mod" patch submitted to dri-devel.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/tegra/dc.c  | 10 ++
 drivers/gpu/drm/tegra/fb.c  | 14 +++---
 drivers/gpu/drm/tegra/hub.c | 10 ++
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index fbf57bc3cdab..a2cc687dc2d8 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -588,6 +588,16 @@ static const u32 tegra124_primary_formats[] = {
 
 static const u64 tegra124_modifiers[] = {
DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5),
+   /*
+* For backwards compatibility with older userspace that may have
+* baked in usage of the less-descriptive modifiers
+*/
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index e34325c83d28..d04e0b1c61ea 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
 {
uint64_t modifier = framebuffer->modifier;
 
-   switch (modifier) {
+   switch (drm_fourcc_canonicalize_nvidia_format_mod(modifier)) {
case DRM_FORMAT_MOD_LINEAR:
tiling->mode = TEGRA_BO_TILING_MODE_PITCH;
tiling->value = 0;
@@ -55,32 +55,32 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
tiling->value = 0;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 0;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 1;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 2;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 3;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 4;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 5;
break;
diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c
index 839b49c40e51..03c97b10b122 100644
--- a/drivers/gpu/drm/tegra/hub.c
+++ b/drivers/gpu/drm/tegra/hub.c
@@ -49,6 +49,16 @@ static const u32 tegra_shared_plane_formats[] = {
 
 static const u64 tegra_shared_plane_modifiers[] = {
DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5),
+   /*
+* For backwards compatibility with older userspace that may have
+* baked in usage of the less-descriptive modifiers
+*/
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)

Re: [Nouveau] [PATCH 3/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-16 Thread James Jones

On 12/12/19 6:51 PM, James Jones wrote:

On 12/11/19 1:13 PM, Ilia Mirkin wrote:

On Wed, Dec 11, 2019 at 4:04 PM James Jones  wrote:


Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against he list of modifiers supported
by the target display hardware.

Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/dispnv50/wndw.c   |  8 +--
  drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++-
  drivers/gpu/drm/nouveau/nouveau_display.h |  2 +
  3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c

index 70ad64cb2d34..06c1b18479c1 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)

  {
 struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
 struct nv50_wndw_ctxdma *ctxdma;
-   const u8    kind = fb->nvbo->kind;
+   const u8    kind = fb->kind;
 const u32 handle = 0xfb00 | kind;
 struct {
 struct nv_dma_v0 base;
@@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw 
*wndw, bool modeset,
 if (asyw->state.fb != armw->state.fb || !armw->visible || 
modeset) {

 asyw->image.w = fb->base.width;
 asyw->image.h = fb->base.height;
-   asyw->image.kind = fb->nvbo->kind;
+   asyw->image.kind = fb->kind;

 ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
 if (ret) {
@@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw 
*wndw, bool modeset,

 if (asyw->image.kind) {
 asyw->image.layout = 0;
 if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = fb->nvbo->mode 
>> 4;

+   asyw->image.blockh = fb->tile_mode >> 4;
 else
-   asyw->image.blockh = fb->nvbo->mode;
+   asyw->image.blockh = fb->tile_mode;
 asyw->image.blocks[0] = fb->base.pitches[0] 
/ 64;

 asyw->image.pitch[0] = 0;
 } else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c

index f1509392d7b7..351b58410e1a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {

 .create_handle = nouveau_user_framebuffer_create_handle,
  };

+static int
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   int mod;
+
+   BUG_ON(!tile_mode || !kind);
+
+   if (drm->client.device.info.chipset < 0x50) {


Not a full review, but you want to go off the family (chip_class iirc?
something like that, should be obvious). Sadly 0x67/0x68 are higher
than 0x50 numerically, but are logically part of the nv4x generation.


Good catch.  I'll get this fixed and send out an updated patchset.


I fixed this one instance in the v2 series, and I didn't see any other 
potentially dangerous uses of chipset, so I left the others as-is, as 
they seemed to better match surrounding code or existing checks used for 
a given bit of functionality.



Thanks,
-James


+   return -EINVAL;
+   }
+
+   BUG_ON(!disp->format_modifiers);
+
+   for (mod = 0;
+    (disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+    (disp->format_modifiers[mod] != modifier);
+    mod++);
+
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+    * Extract the block height and kind from the 
corresponding

+    * modifier fields.  See drm_fourcc.h for details.
+    */
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+
+   return 0;
+}
+
  static inline u

[PATCH v2 2/3] drm/nouveau: Check framebuffer size against bo

2019-12-16 Thread James Jones
Make sure framebuffer dimensions and tiling
parameters will not result in accesses beyond the
end of the GEM buffer they are bound to.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 93 +++
 1 file changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 6f038511a03a..f1509392d7b7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,72 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static inline uint32_t
+nouveau_get_width_in_blocks(uint32_t stride)
+{
+   /* GOBs per block in the x direction is always one, and GOBs are
+* 64 bytes wide
+*/
+   static const uint32_t log_block_width = 6;
+
+   return (stride + (1 << log_block_width) - 1) >> log_block_width;
+}
+
+static inline uint32_t
+nouveau_get_height_in_blocks(struct nouveau_drm *drm,
+uint32_t height,
+uint32_t log_block_height_in_gobs)
+{
+   uint32_t log_gob_height;
+   uint32_t log_block_height;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   log_gob_height = 2;
+   else
+   log_gob_height = 3;
+
+   log_block_height = log_block_height_in_gobs + log_gob_height;
+
+   return (height + (1 << log_block_height) - 1) >> log_block_height;
+}
+
+static int
+nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo,
+ uint32_t offset, uint32_t stride, uint32_t h,
+ uint32_t tile_mode)
+{
+   uint32_t gob_size, bw, bh;
+   uint64_t bl_size;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   tile_mode >>= 4;
+
+   BUG_ON(tile_mode & 0xFFF0);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   gob_size = 256;
+   else
+   gob_size = 512;
+
+   bw = nouveau_get_width_in_blocks(stride);
+   bh = nouveau_get_height_in_blocks(drm, h, tile_mode);
+
+   bl_size = bw * bh * (1 << tile_mode) * gob_size;
+
+   DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u 
gob_size=%u bl_size=%llu size=%lu\n",
+ offset, stride, h, tile_mode, bw, bh, gob_size, bl_size,
+ nvbo->bo.mem.size);
+
+   if (bl_size + offset > nvbo->bo.mem.size)
+   return -ERANGE;
+
+   return 0;
+}
+
 int
 nouveau_framebuffer_new(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd,
@@ -232,6 +298,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
 {
struct nouveau_drm *drm = nouveau_drm(dev);
struct nouveau_framebuffer *fb;
+   const struct drm_format_info *info;
+   unsigned int width, height, i;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -254,6 +322,31 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   info = drm_get_format_info(dev, mode_cmd);
+
+   for (i = 0; i < info->num_planes; i++) {
+   width = drm_format_info_plane_width(info,
+   mode_cmd->width,
+   i);
+   height = drm_format_info_plane_height(info,
+ mode_cmd->height,
+ i);
+
+   if (nvbo->kind) {
+   ret = nouveau_check_bl_size(drm, nvbo,
+   mode_cmd->offsets[i],
+   mode_cmd->pitches[i],
+   height, nvbo->mode);
+   if (ret)
+   return ret;
+   } else {
+   uint32_t size = mode_cmd->pitches[i] * height;
+
+   if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size)
+   return -ERANGE;
+   }
+   }
+
if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL)))
return -ENOMEM;
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 0/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-16 Thread James Jones
This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available here:

https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work

But those need a bit of cleanup before they're ready to submit.

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

v2: Used Tesla family instead of NV50 chipset compare to avoid treating
oddly numbered NV4x-class chipsets as NV50+ GPUs.  Other instances
of compares with chipset number in the series were audited, deemed
safe, and left as-is for consistency with existing code.

James Jones (3):
  drm/nouveau: Add format mod prop to base/ovly/nvdisp
  drm/nouveau: Check framebuffer size against bo
  drm/nouveau: Support NVIDIA format modifiers

 drivers/gpu/drm/nouveau/dispnv50/base507c.c |   7 +-
 drivers/gpu/drm/nouveau/dispnv50/disp.c |  59 
 drivers/gpu/drm/nouveau/dispnv50/disp.h |   4 +
 drivers/gpu/drm/nouveau/dispnv50/wndw.c |  35 -
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c |  17 +++
 drivers/gpu/drm/nouveau/nouveau_display.c   | 154 
 drivers/gpu/drm/nouveau/nouveau_display.h   |   4 +
 7 files changed, 272 insertions(+), 8 deletions(-)

-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2 3/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-16 Thread James Jones
Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against he list of modifiers supported
by the target display hardware.

v2: Used Tesla family instead of NV50 chipset compare

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   |  8 +--
 drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++-
 drivers/gpu/drm/nouveau/nouveau_display.h |  2 +
 3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index 70ad64cb2d34..06c1b18479c1 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)
 {
struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
struct nv50_wndw_ctxdma *ctxdma;
-   const u8kind = fb->nvbo->kind;
+   const u8kind = fb->kind;
const u32 handle = 0xfb00 | kind;
struct {
struct nv_dma_v0 base;
@@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) {
asyw->image.w = fb->base.width;
asyw->image.h = fb->base.height;
-   asyw->image.kind = fb->nvbo->kind;
+   asyw->image.kind = fb->kind;
 
ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
if (ret) {
@@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->image.kind) {
asyw->image.layout = 0;
if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = fb->nvbo->mode >> 4;
+   asyw->image.blockh = fb->tile_mode >> 4;
else
-   asyw->image.blockh = fb->nvbo->mode;
+   asyw->image.blockh = fb->tile_mode;
asyw->image.blocks[0] = fb->base.pitches[0] / 64;
asyw->image.pitch[0] = 0;
} else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index f1509392d7b7..50e055adebd4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static int
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   int mod;
+
+   BUG_ON(!tile_mode || !kind);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA) {
+   return -EINVAL;
+   }
+
+   BUG_ON(!disp->format_modifiers);
+
+   for (mod = 0;
+(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(disp->format_modifiers[mod] != modifier);
+mod++);
+
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+* Extract the block height and kind from the corresponding
+* modifier fields.  See drm_fourcc.h for details.
+*/
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+
+   return 0;
+}
+
 static inline uint32_t
 nouveau_get_width_in_blocks(uint32_t stride)
 {
@@ -300,6 +344,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
struct nouveau_framebuffer *fb;
const struct drm_format_info *info;
unsigned int width, height, i;
+   uint32_t tile_mode;
+   uint8_t kind;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -322,6 +368,18 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
+   if (nouveau_decode_mod(drm, mode_cmd->modifier[0], _mode,
+   

[PATCH v2 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp

2019-12-16 Thread James Jones
Advertise support for the full list of format
modifiers supported by each class of NVIDIA
desktop GPU display hardware.  Stash the array
of modifiers in the nouveau_display struct for
use when validating userspace framebuffer
creation requests, which will be supportd in
a subsequent change.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/base507c.c |  7 +--
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +
 drivers/gpu/drm/nouveau/dispnv50/disp.h |  4 ++
 drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +-
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++
 drivers/gpu/drm/nouveau/nouveau_display.h   |  2 +
 6 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c 
b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
index 00a85f1e1a4a..025b8f996a0a 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
@@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
struct nv50_disp_base_channel_dma_v0 args = {
.head = head,
};
-   struct nv50_disp *disp = nv50_disp(drm->dev);
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   struct nv50_disp *disp50 = nv50_disp(drm->dev);
struct nv50_wndw *wndw;
int ret;
 
@@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
if (*pwndw = wndw, ret)
return ret;
 
-   ret = nv50_dmac_create(>client.device, >disp->object,
+   ret = nv50_dmac_create(>client.device, >disp.object,
   , head, , sizeof(args),
-  disp->sync->bo.offset, >wndw);
+  disp50->sync->bo.offset, >wndw);
if (ret) {
NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret);
return ret;
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 064a69d161e3..0956367d27a2 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2337,6 +2337,15 @@ nv50_display_create(struct drm_device *dev)
if (ret)
goto out;
 
+   /* Assign the correct format modifiers */
+   if (disp->disp->object.oclass >= TU102_DISP)
+   nouveau_display(dev)->format_modifiers = wndwc57e_modifiers;
+   else
+   if (disp->disp->object.oclass >= GF110_DISP)
+   nouveau_display(dev)->format_modifiers = disp90xx_modifiers;
+   else
+   nouveau_display(dev)->format_modifiers = disp50xx_modifiers;
+
/* create crtc objects to represent the hw heads */
if (disp->disp->object.oclass >= GV100_DISP)
crtcs = nvif_rd32(>object, 0x610060) & 0xff;
@@ -2404,3 +2413,53 @@ nv50_display_create(struct drm_device *dev)
nv50_display_destroy(dev);
return ret;
 }
+
+/**
+ * Format modifiers
+ */
+
+/
+ *Log2(block height) +  *
+ *Page Kind --+  |  *
+ *Gob Height/Page Kind Generation --+ |  |  *
+ *  Sector layout ---+  | |  |  *
+ *  Compression --+  |  | |  |  */
+const u64 disp50xx_modifiers[] = { /* |  |  | |  |  */
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4),
+   DRM_FORMAT_MOD_NVIDIA_B

Re: [Nouveau] [PATCH 3/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-12 Thread James Jones

On 12/11/19 1:13 PM, Ilia Mirkin wrote:

On Wed, Dec 11, 2019 at 4:04 PM James Jones  wrote:


Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against he list of modifiers supported
by the target display hardware.

Signed-off-by: James Jones 
---
  drivers/gpu/drm/nouveau/dispnv50/wndw.c   |  8 +--
  drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++-
  drivers/gpu/drm/nouveau/nouveau_display.h |  2 +
  3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index 70ad64cb2d34..06c1b18479c1 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)
  {
 struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
 struct nv50_wndw_ctxdma *ctxdma;
-   const u8kind = fb->nvbo->kind;
+   const u8kind = fb->kind;
 const u32 handle = 0xfb00 | kind;
 struct {
 struct nv_dma_v0 base;
@@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
 if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) {
 asyw->image.w = fb->base.width;
 asyw->image.h = fb->base.height;
-   asyw->image.kind = fb->nvbo->kind;
+   asyw->image.kind = fb->kind;

 ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
 if (ret) {
@@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
 if (asyw->image.kind) {
 asyw->image.layout = 0;
 if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = fb->nvbo->mode >> 4;
+   asyw->image.blockh = fb->tile_mode >> 4;
 else
-   asyw->image.blockh = fb->nvbo->mode;
+   asyw->image.blockh = fb->tile_mode;
 asyw->image.blocks[0] = fb->base.pitches[0] / 64;
 asyw->image.pitch[0] = 0;
 } else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index f1509392d7b7..351b58410e1a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
 .create_handle = nouveau_user_framebuffer_create_handle,
  };

+static int
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   int mod;
+
+   BUG_ON(!tile_mode || !kind);
+
+   if (drm->client.device.info.chipset < 0x50) {


Not a full review, but you want to go off the family (chip_class iirc?
something like that, should be obvious). Sadly 0x67/0x68 are higher
than 0x50 numerically, but are logically part of the nv4x generation.


Good catch.  I'll get this fixed and send out an updated patchset.

Thanks,
-James


+   return -EINVAL;
+   }
+
+   BUG_ON(!disp->format_modifiers);
+
+   for (mod = 0;
+(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(disp->format_modifiers[mod] != modifier);
+mod++);
+
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+* Extract the block height and kind from the corresponding
+* modifier fields.  See drm_fourcc.h for details.
+*/
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+
+   return 0;
+}
+
  static inline uint32_t
  nouveau_get_width_in_blocks(uint32_t stride)
  {
@@ -300,6 +344,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
 struct nouveau_framebuffer *fb;
 const struct drm_format_info *info;
 unsigned int width, height, i;
+   uint32_t tile_mode;
+   uint8_t kind;
 int ret;

[PATCH 3/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-11 Thread James Jones
Allow setting the block layout of a nouveau FB
object using DRM format modifiers.  When
specified, the format modifier block layout and
kind overrides the GEM buffer's implicit layout
and kind.  The specified format modifier is
validated against he list of modifiers supported
by the target display hardware.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   |  8 +--
 drivers/gpu/drm/nouveau/nouveau_display.c | 65 ++-
 drivers/gpu/drm/nouveau/nouveau_display.h |  2 +
 3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index 70ad64cb2d34..06c1b18479c1 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -43,7 +43,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
nouveau_framebuffer *fb)
 {
struct nouveau_drm *drm = nouveau_drm(fb->base.dev);
struct nv50_wndw_ctxdma *ctxdma;
-   const u8kind = fb->nvbo->kind;
+   const u8kind = fb->kind;
const u32 handle = 0xfb00 | kind;
struct {
struct nv_dma_v0 base;
@@ -243,7 +243,7 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->state.fb != armw->state.fb || !armw->visible || modeset) {
asyw->image.w = fb->base.width;
asyw->image.h = fb->base.height;
-   asyw->image.kind = fb->nvbo->kind;
+   asyw->image.kind = fb->kind;
 
ret = nv50_wndw_atomic_check_acquire_rgb(asyw);
if (ret) {
@@ -255,9 +255,9 @@ nv50_wndw_atomic_check_acquire(struct nv50_wndw *wndw, bool 
modeset,
if (asyw->image.kind) {
asyw->image.layout = 0;
if (drm->client.device.info.chipset >= 0xc0)
-   asyw->image.blockh = fb->nvbo->mode >> 4;
+   asyw->image.blockh = fb->tile_mode >> 4;
else
-   asyw->image.blockh = fb->nvbo->mode;
+   asyw->image.blockh = fb->tile_mode;
asyw->image.blocks[0] = fb->base.pitches[0] / 64;
asyw->image.pitch[0] = 0;
} else {
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index f1509392d7b7..351b58410e1a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,50 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static int
+nouveau_decode_mod(struct nouveau_drm *drm,
+  uint64_t modifier,
+  uint32_t *tile_mode,
+  uint8_t *kind)
+{
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   int mod;
+
+   BUG_ON(!tile_mode || !kind);
+
+   if (drm->client.device.info.chipset < 0x50) {
+   return -EINVAL;
+   }
+
+   BUG_ON(!disp->format_modifiers);
+
+   for (mod = 0;
+(disp->format_modifiers[mod] != DRM_FORMAT_MOD_INVALID) &&
+(disp->format_modifiers[mod] != modifier);
+mod++);
+
+   if (disp->format_modifiers[mod] == DRM_FORMAT_MOD_INVALID)
+   return -EINVAL;
+
+   if (modifier == DRM_FORMAT_MOD_LINEAR) {
+   /* tile_mode will not be used in this case */
+   *tile_mode = 0;
+   *kind = 0;
+   } else {
+   /*
+* Extract the block height and kind from the corresponding
+* modifier fields.  See drm_fourcc.h for details.
+*/
+   *tile_mode = (uint32_t)(modifier & 0xF);
+   *kind = (uint8_t)((modifier >> 12) & 0xFF);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   *tile_mode <<= 4;
+   }
+
+   return 0;
+}
+
 static inline uint32_t
 nouveau_get_width_in_blocks(uint32_t stride)
 {
@@ -300,6 +344,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
struct nouveau_framebuffer *fb;
const struct drm_format_info *info;
unsigned int width, height, i;
+   uint32_t tile_mode;
+   uint8_t kind;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -322,6 +368,18 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   if (mode_cmd->flags & DRM_MODE_FB_MODIFIERS) {
+   if (nouveau_decode_mod(drm, mode_cmd->modifier[0], _mode,
+  )) {
+   DRM_DEBUG_KMS("Unsupported mo

[PATCH 2/3] drm/nouveau: Check framebuffer size against bo

2019-12-11 Thread James Jones
Make sure framebuffer dimensions and tiling
parameters will not result in accesses beyond the
end of the GEM buffer they are bound to.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 93 +++
 1 file changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 6f038511a03a..f1509392d7b7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -224,6 +224,72 @@ static const struct drm_framebuffer_funcs 
nouveau_framebuffer_funcs = {
.create_handle = nouveau_user_framebuffer_create_handle,
 };
 
+static inline uint32_t
+nouveau_get_width_in_blocks(uint32_t stride)
+{
+   /* GOBs per block in the x direction is always one, and GOBs are
+* 64 bytes wide
+*/
+   static const uint32_t log_block_width = 6;
+
+   return (stride + (1 << log_block_width) - 1) >> log_block_width;
+}
+
+static inline uint32_t
+nouveau_get_height_in_blocks(struct nouveau_drm *drm,
+uint32_t height,
+uint32_t log_block_height_in_gobs)
+{
+   uint32_t log_gob_height;
+   uint32_t log_block_height;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   log_gob_height = 2;
+   else
+   log_gob_height = 3;
+
+   log_block_height = log_block_height_in_gobs + log_gob_height;
+
+   return (height + (1 << log_block_height) - 1) >> log_block_height;
+}
+
+static int
+nouveau_check_bl_size(struct nouveau_drm *drm, struct nouveau_bo *nvbo,
+ uint32_t offset, uint32_t stride, uint32_t h,
+ uint32_t tile_mode)
+{
+   uint32_t gob_size, bw, bh;
+   uint64_t bl_size;
+
+   BUG_ON(drm->client.device.info.family < NV_DEVICE_INFO_V0_TESLA);
+
+   if (drm->client.device.info.chipset >= 0xc0)
+   tile_mode >>= 4;
+
+   BUG_ON(tile_mode & 0xFFF0);
+
+   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_FERMI)
+   gob_size = 256;
+   else
+   gob_size = 512;
+
+   bw = nouveau_get_width_in_blocks(stride);
+   bh = nouveau_get_height_in_blocks(drm, h, tile_mode);
+
+   bl_size = bw * bh * (1 << tile_mode) * gob_size;
+
+   DRM_DEBUG_KMS("offset=%u stride=%u h=%u tile_mode=0x%02x bw=%u bh=%u 
gob_size=%u bl_size=%llu size=%lu\n",
+ offset, stride, h, tile_mode, bw, bh, gob_size, bl_size,
+ nvbo->bo.mem.size);
+
+   if (bl_size + offset > nvbo->bo.mem.size)
+   return -ERANGE;
+
+   return 0;
+}
+
 int
 nouveau_framebuffer_new(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd,
@@ -232,6 +298,8 @@ nouveau_framebuffer_new(struct drm_device *dev,
 {
struct nouveau_drm *drm = nouveau_drm(dev);
struct nouveau_framebuffer *fb;
+   const struct drm_format_info *info;
+   unsigned int width, height, i;
int ret;
 
 /* YUV overlays have special requirements pre-NV50 */
@@ -254,6 +322,31 @@ nouveau_framebuffer_new(struct drm_device *dev,
return -EINVAL;
}
 
+   info = drm_get_format_info(dev, mode_cmd);
+
+   for (i = 0; i < info->num_planes; i++) {
+   width = drm_format_info_plane_width(info,
+   mode_cmd->width,
+   i);
+   height = drm_format_info_plane_height(info,
+ mode_cmd->height,
+ i);
+
+   if (nvbo->kind) {
+   ret = nouveau_check_bl_size(drm, nvbo,
+   mode_cmd->offsets[i],
+   mode_cmd->pitches[i],
+   height, nvbo->mode);
+   if (ret)
+   return ret;
+   } else {
+   uint32_t size = mode_cmd->pitches[i] * height;
+
+   if (size + mode_cmd->offsets[i] > nvbo->bo.mem.size)
+   return -ERANGE;
+   }
+   }
+
if (!(fb = *pfb = kzalloc(sizeof(*fb), GFP_KERNEL)))
return -ENOMEM;
 
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-11 Thread James Jones
This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available here:

https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work

But those need a bit of cleanup before they're ready to submit.

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/tegra/dc.c  | 10 ++
 drivers/gpu/drm/tegra/fb.c  | 14 +++---
 drivers/gpu/drm/tegra/hub.c | 10 ++
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index fbf57bc3cdab..a2cc687dc2d8 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -588,6 +588,16 @@ static const u32 tegra124_primary_formats[] = {
 
 static const u64 tegra124_modifiers[] = {
DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5),
+   /*
+* For backwards compatibility with older userspace that may have
+* baked in usage of the less-descriptive modifiers
+*/
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index e34325c83d28..d04e0b1c61ea 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
 {
uint64_t modifier = framebuffer->modifier;
 
-   switch (modifier) {
+   switch (drm_fourcc_canonicalize_nvidia_format_mod(modifier)) {
case DRM_FORMAT_MOD_LINEAR:
tiling->mode = TEGRA_BO_TILING_MODE_PITCH;
tiling->value = 0;
@@ -55,32 +55,32 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
tiling->value = 0;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 0;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 1;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 2;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 3;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 4;
break;
 
-   case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5):
+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 5;
break;
diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c
index 839b49c40e51..03c97b10b122 100644
--- a/drivers/gpu/drm/tegra/hub.c
+++ b/drivers/gpu/drm/tegra/hub.c
@@ -49,6 +49,16 @@ static const u32 tegra_shared_plane_formats[] = {
 
 static const u64 tegra_shared_plane_modifiers[] = {
DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5),
+   /*
+* For backwards compatibility with older userspace that may have
+* baked in usage of the less-descriptive modifiers
+*/
DRM_FORMAT_MOD_NVIDIA_1

[PATCH 1/3] drm/nouveau: Add format mod prop to base/ovly/nvdisp

2019-12-11 Thread James Jones
Advertise support for the full list of format
modifiers supported by each class of NVIDIA
desktop GPU display hardware.  Stash the array
of modifiers in the nouveau_display struct for
use when validating userspace framebuffer
creation requests, which will be supportd in
a subsequent change.

Signed-off-by: James Jones 
---
 drivers/gpu/drm/nouveau/dispnv50/base507c.c |  7 +--
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 59 +
 drivers/gpu/drm/nouveau/dispnv50/disp.h |  4 ++
 drivers/gpu/drm/nouveau/dispnv50/wndw.c | 27 +-
 drivers/gpu/drm/nouveau/dispnv50/wndwc57e.c | 17 ++
 drivers/gpu/drm/nouveau/nouveau_display.h   |  2 +
 6 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/base507c.c 
b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
index 00a85f1e1a4a..025b8f996a0a 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/base507c.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/base507c.c
@@ -262,7 +262,8 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
struct nv50_disp_base_channel_dma_v0 args = {
.head = head,
};
-   struct nv50_disp *disp = nv50_disp(drm->dev);
+   struct nouveau_display *disp = nouveau_display(drm->dev);
+   struct nv50_disp *disp50 = nv50_disp(drm->dev);
struct nv50_wndw *wndw;
int ret;
 
@@ -272,9 +273,9 @@ base507c_new_(const struct nv50_wndw_func *func, const u32 
*format,
if (*pwndw = wndw, ret)
return ret;
 
-   ret = nv50_dmac_create(>client.device, >disp->object,
+   ret = nv50_dmac_create(>client.device, >disp.object,
   , head, , sizeof(args),
-  disp->sync->bo.offset, >wndw);
+  disp50->sync->bo.offset, >wndw);
if (ret) {
NV_ERROR(drm, "base%04x allocation failed: %d\n", oclass, ret);
return ret;
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 064a69d161e3..0956367d27a2 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2337,6 +2337,15 @@ nv50_display_create(struct drm_device *dev)
if (ret)
goto out;
 
+   /* Assign the correct format modifiers */
+   if (disp->disp->object.oclass >= TU102_DISP)
+   nouveau_display(dev)->format_modifiers = wndwc57e_modifiers;
+   else
+   if (disp->disp->object.oclass >= GF110_DISP)
+   nouveau_display(dev)->format_modifiers = disp90xx_modifiers;
+   else
+   nouveau_display(dev)->format_modifiers = disp50xx_modifiers;
+
/* create crtc objects to represent the hw heads */
if (disp->disp->object.oclass >= GV100_DISP)
crtcs = nvif_rd32(>object, 0x610060) & 0xff;
@@ -2404,3 +2413,53 @@ nv50_display_create(struct drm_device *dev)
nv50_display_destroy(dev);
return ret;
 }
+
+/**
+ * Format modifiers
+ */
+
+/
+ *Log2(block height) +  *
+ *Page Kind --+  |  *
+ *Gob Height/Page Kind Generation --+ |  |  *
+ *  Sector layout ---+  | |  |  *
+ *  Compression --+  |  | |  |  */
+const u64 disp50xx_modifiers[] = { /* |  |  | |  |  */
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x7a, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x78, 5),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 1, 1, 0x70, 4),
+   DRM_FORMAT_MOD_NVIDIA_B

Re: [PATCH 0/3] drm/nouveau: Support NVIDIA format modifiers

2019-12-11 Thread James Jones
Please ignore the tegra diff on the bottom of this.  I never fail to 
find a way to mess up git-send-email.


-James

On 12/11/19 12:59 PM, James Jones wrote:

This series modifies the NV5x+ nouveau display backends to advertise
appropriate format modifiers on their display planes in atomic mode
setting blobs.

Corresponding modifications to Mesa/userspace are available here:

https://gitlab.freedesktop.org/cubanismo/mesa/tree/nouveau_work

But those need a bit of cleanup before they're ready to submit.

I've tested this on Tesla, Kepler, Pascal, and Turing-class hardware
using various formats and all the exposed format modifiers, plus some
negative testing with invalid ones.

NOTE: this series depends on the "[PATCH v3] drm: Generalized NV Block
Linear DRM format mod" patch submitted to dri-devel.

Signed-off-by: James Jones 
---
  drivers/gpu/drm/tegra/dc.c  | 10 ++
  drivers/gpu/drm/tegra/fb.c  | 14 +++---
  drivers/gpu/drm/tegra/hub.c | 10 ++
  3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index fbf57bc3cdab..a2cc687dc2d8 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -588,6 +588,16 @@ static const u32 tegra124_primary_formats[] = {
  
  static const u64 tegra124_modifiers[] = {

DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5),
+   /*
+* For backwards compatibility with older userspace that may have
+* baked in usage of the less-descriptive modifiers
+*/
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1),
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2),
diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/drm/tegra/fb.c
index e34325c83d28..d04e0b1c61ea 100644
--- a/drivers/gpu/drm/tegra/fb.c
+++ b/drivers/gpu/drm/tegra/fb.c
@@ -44,7 +44,7 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
  {
uint64_t modifier = framebuffer->modifier;
  
-	switch (modifier) {

+   switch (drm_fourcc_canonicalize_nvidia_format_mod(modifier)) {
case DRM_FORMAT_MOD_LINEAR:
tiling->mode = TEGRA_BO_TILING_MODE_PITCH;
tiling->value = 0;
@@ -55,32 +55,32 @@ int tegra_fb_get_tiling(struct drm_framebuffer *framebuffer,
tiling->value = 0;
break;
  
-	case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0):

+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 0;
break;
  
-	case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1):

+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 1;
break;
  
-	case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2):

+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 2;
break;
  
-	case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3):

+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 3;
break;
  
-	case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4):

+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 4;
break;
  
-	case DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5):

+   case DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5):
tiling->mode = TEGRA_BO_TILING_MODE_BLOCK;
tiling->value = 5;
break;
diff --git a/drivers/gpu/drm/tegra/hub.c b/drivers/gpu/drm/tegra/hub.c
index 839b49c40e51..03c97b10b122 100644
--- a/drivers/gpu/drm/tegra/hub.c
+++ b/drivers/gpu/drm/tegra/hub.c
@@ -49,6 +49,16 @@ static const u32 tegra_shared_plane_formats[] = {
  
  static const u64 tegra_shared_plane_modifiers[] = {

DRM_FORMAT_MOD_LINEAR,
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 0),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 1),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 2),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 3),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 4),
+   DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0xfe, 5),
+   /*
+* For backwards compatibility with older userspace that 

[PATCH v3] drm: Generalized NV Block Linear DRM format mod

2019-12-11 Thread James Jones
Builds upon the existing NVIDIA 16Bx2 block linear
format modifiers by adding more "fields" to the
existing parameterized
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
macro that allow fully defining a unique-across-
all-NVIDIA-hardware bit layout using a minimal
set of fields and values.  The new modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
effectively backwards compatible with the existing
macro, introducing a superset of the previously
definable format modifiers.

Backwards compatibility has two quirks.  First,
the zero value for the "kind" field, which is
implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
macro, must be special cased in drivers and
assumed to map to the pre-Turing generic kind of
0xfe, since a kind of "zero" is reserved for
linear buffer layouts on all GPUs.

Second, it is assumed backwards compatibility
is only needed when running on Tegra GPUs, and
specifically Tegra GPUs prior to Xavier.  This
is based on two assertions:

-Tegra GPUs prior to Xavier used a slightly
 different raw bit layout than desktop GPUs,
 making it impossible to directly share block
 linear buffers between the two.

-Support for the existing block linear modifiers
 was incomplete, making them useful only for
 exporting buffers created by nouveau and
 importing them to Tegra DRM as framebuffers for
 scan out.  There was no support for adding
 framebuffers using format modifiers in nouveau,
 nor importing dma-buf/PRIME GEM objects into
 nouveau userspace drivers with modifiers in Mesa.

Hence it is assumed the prior modifiers were not
intended for use on desktop GPUs, and as a
corollary, were not intended to support sharing
block linear buffers across two different NVIDIA
GPUs.

v2:
  - Added canonicalize helper function

v3:
  - Added additional bit to compression field to
support Tesla (NV5x,G8x,G9x,GT1xx,GT2xx) class
    chips.

Signed-off-by: James Jones 
---
 include/uapi/drm/drm_fourcc.h | 122 +++---
 1 file changed, 114 insertions(+), 8 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..4330d930bdbb 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -497,7 +497,113 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
 
 /*
- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with 
NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  3:0  h log2(height) of each block, in GOBs.  Placed here for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  - Must be 1, to indicate block-linear layout.  Necessary for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  - Reserved (To support 3D-surfaces with variable log2(depth) block
+ * size).  Must be zero.
+ *
+ * Note there is no log2(width) parameter.  Some portions of the
+ * hardware support a block width of two gobs, but it is 
impractical
+ * to use due to lack of support elsewhere, and has no known
+ * benefits.
+ *
+ * 11:9  - Reserved (To support 2D-array textures with variable array 
stride
+ * in blocks, specified via log2(tile width in blocks)).  Must be
+ * zero.
+ *
+ * 19:12 k Page Kind.  This value directly maps to a field in the page
+ * tables of all GPUs >= NV50.  It affects the exact layout of bits
+ * in memory and can be derived from the tuple
+ *
+ *   (format, GPU model, compression type, samples per pixel)
+ *
+ * Where compression type is defined below.  If GPU model were
+ * implied by the format modifier, format, or memory buffer, page
+ * kind would not need to be included in the modifier itself, but
+ * since the modifier should define the layout of the associated
+ * memory buffer independent from any device or other context, it
+ * must be included here.
+ *
+ * 21:20 g GOB Height and Page Kind Generation.  The height of a GOB 
changed
+ *  

Re: [PATCH] drm: Generalized NV Block Linear DRM format mod

2019-10-16 Thread James Jones

On 10/15/19 8:42 AM, Daniel Vetter wrote:

On Tue, Oct 15, 2019 at 5:14 PM James Jones  wrote:


On 10/15/19 7:19 AM, Daniel Vetter wrote:

On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:

Builds upon the existing NVIDIA 16Bx2 block linear
format modifiers by adding more "fields" to the
existing parameterized
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
macro that allow fully defining a unique-across-
all-NVIDIA-hardware bit layout using a minimal
set of fields and values.  The new modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
effectively backwards compatible with the existing
macro, introducing a superset of the previously
definable format modifiers.

Backwards compatibility has two quirks.  First,
the zero value for the "kind" field, which is
implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
macro, must be special cased in drivers and
assumed to map to the pre-Turing generic kind of
0xfe, since a kind of "zero" is reserved for
linear buffer layouts on all GPUs.

Second, it is assumed backwards compatibility
is only needed when running on Tegra GPUs, and
specifically Tegra GPUs prior to Xavier.  This
is based on two assertions:

-Tegra GPUs prior to Xavier used a slightly
   different raw bit layout than desktop GPUs,
   making it impossible to directly share block
   linear buffers between the two.

-Support for the existing block linear modifiers
   was incomplete, making them useful only for
   exporting buffers created by nouveau and
   importing them to Tegra DRM as framebuffers for
   scan out.  There was no support for adding
   framebuffers using format modifiers in nouveau,
   nor importing dma-buf/PRIME GEM objects into
   nouveau userspace drivers with modifiers in Mesa.

Hence it is assumed the prior modifiers were not
intended for use on desktop GPUs, and as a
corrolary, were not intended to support sharing
block linear buffers across two different NVIDIA
GPUs.

Signed-off-by: James Jones 
---
   include/uapi/drm/drm_fourcc.h | 108 +++---
   1 file changed, 100 insertions(+), 8 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..cc9853d42a24 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -497,7 +497,99 @@ extern "C" {
   #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)

   /*
- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with 
NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  3:0  h log2(height) of each block, in GOBs.  Placed here for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  - Must be 1, to indicate block-linear layout.  Necessary for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  - Reserved (To support 3D-surfaces with variable log2(depth) block
+ * size).  Must be zero.
+ *
+ * Note there is no log2(width) parameter.  Some portions of the
+ * hardware support a block width of two gobs, but it is 
impractical
+ * to use due to lack of support elsewhere, and has no known
+ * benefits.
+ *
+ * 11:9  - Reserved (To support 2D-array textures with variable array 
stride
+ * in blocks, specified via log2(tile width in blocks)).  Must be
+ * zero.
+ *
+ * 19:12 k Page Kind.  This value directly maps to a field in the page
+ * tables of all GPUs >= NV50.  It affects the exact layout of bits
+ * in memory and can be derived from the tuple
+ *
+ *   (format, GPU model, compression type, samples per pixel)
+ *
+ * Where compression type is defined below.  If GPU model were
+ * implied by the format modifier, format, or memory buffer, page
+ * kind would not need to be included in the modifier itself, but
+ * since the modifier should define the layout of the associated
+ * memory buffer independent from any device or other context, it
+ * must be included here.
+ *
+ * To grandfat

[PATCH v2] drm: Generalized NV Block Linear DRM format mod

2019-10-16 Thread James Jones
Builds upon the existing NVIDIA 16Bx2 block linear
format modifiers by adding more "fields" to the
existing parameterized
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
macro that allow fully defining a unique-across-
all-NVIDIA-hardware bit layout using a minimal
set of fields and values.  The new modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
effectively backwards compatible with the existing
macro, introducing a superset of the previously
definable format modifiers.

Backwards compatibility has two quirks.  First,
the zero value for the "kind" field, which is
implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
macro, must be special cased in drivers and
assumed to map to the pre-Turing generic kind of
0xfe, since a kind of "zero" is reserved for
linear buffer layouts on all GPUs.

Second, it is assumed backwards compatibility
is only needed when running on Tegra GPUs, and
specifically Tegra GPUs prior to Xavier.  This
is based on two assertions:

-Tegra GPUs prior to Xavier used a slightly
 different raw bit layout than desktop GPUs,
 making it impossible to directly share block
 linear buffers between the two.

-Support for the existing block linear modifiers
 was incomplete, making them useful only for
 exporting buffers created by nouveau and
 importing them to Tegra DRM as framebuffers for
 scan out.  There was no support for adding
 framebuffers using format modifiers in nouveau,
 nor importing dma-buf/PRIME GEM objects into
 nouveau userspace drivers with modifiers in Mesa.

Hence it is assumed the prior modifiers were not
intended for use on desktop GPUs, and as a
corrolary, were not intended to support sharing
block linear buffers across two different NVIDIA
GPUs.

v2:
  - Added canonicalize helper function

Signed-off-by: James Jones 
---
 include/uapi/drm/drm_fourcc.h | 116 +++---
 1 file changed, 108 insertions(+), 8 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..56c8fe30caab 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -497,7 +497,107 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
 
 /*
- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with 
NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  3:0  h log2(height) of each block, in GOBs.  Placed here for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  - Must be 1, to indicate block-linear layout.  Necessary for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  - Reserved (To support 3D-surfaces with variable log2(depth) block
+ * size).  Must be zero.
+ *
+ * Note there is no log2(width) parameter.  Some portions of the
+ * hardware support a block width of two gobs, but it is 
impractical
+ * to use due to lack of support elsewhere, and has no known
+ * benefits.
+ *
+ * 11:9  - Reserved (To support 2D-array textures with variable array 
stride
+ * in blocks, specified via log2(tile width in blocks)).  Must be
+ * zero.
+ *
+ * 19:12 k Page Kind.  This value directly maps to a field in the page
+ * tables of all GPUs >= NV50.  It affects the exact layout of bits
+ * in memory and can be derived from the tuple
+ *
+ *   (format, GPU model, compression type, samples per pixel)
+ *
+ * Where compression type is defined below.  If GPU model were
+ * implied by the format modifier, format, or memory buffer, page
+ * kind would not need to be included in the modifier itself, but
+ * since the modifier should define the layout of the associated
+ * memory buffer independent from any device or other context, it
+ * must be included here.
+ *
+ * 21:20 g GOB Height and Page Kind Generation.  The height of a GOB 
changed
+ * starting with Fermi GPUs.  Additionally, the mapping between 
page
+ * kind and bit layout has changed

Re: [PATCH] drm: Generalized NV Block Linear DRM format mod

2019-10-15 Thread James Jones

On 10/15/19 7:19 AM, Daniel Vetter wrote:

On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:

Builds upon the existing NVIDIA 16Bx2 block linear
format modifiers by adding more "fields" to the
existing parameterized
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
macro that allow fully defining a unique-across-
all-NVIDIA-hardware bit layout using a minimal
set of fields and values.  The new modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
effectively backwards compatible with the existing
macro, introducing a superset of the previously
definable format modifiers.

Backwards compatibility has two quirks.  First,
the zero value for the "kind" field, which is
implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
macro, must be special cased in drivers and
assumed to map to the pre-Turing generic kind of
0xfe, since a kind of "zero" is reserved for
linear buffer layouts on all GPUs.

Second, it is assumed backwards compatibility
is only needed when running on Tegra GPUs, and
specifically Tegra GPUs prior to Xavier.  This
is based on two assertions:

-Tegra GPUs prior to Xavier used a slightly
  different raw bit layout than desktop GPUs,
  making it impossible to directly share block
  linear buffers between the two.

-Support for the existing block linear modifiers
  was incomplete, making them useful only for
  exporting buffers created by nouveau and
  importing them to Tegra DRM as framebuffers for
  scan out.  There was no support for adding
  framebuffers using format modifiers in nouveau,
  nor importing dma-buf/PRIME GEM objects into
  nouveau userspace drivers with modifiers in Mesa.

Hence it is assumed the prior modifiers were not
intended for use on desktop GPUs, and as a
corrolary, were not intended to support sharing
block linear buffers across two different NVIDIA
GPUs.

Signed-off-by: James Jones 
---
  include/uapi/drm/drm_fourcc.h | 108 +++---
  1 file changed, 100 insertions(+), 8 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..cc9853d42a24 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -497,7 +497,99 @@ extern "C" {
  #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
  
  /*

- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with 
NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  3:0  h log2(height) of each block, in GOBs.  Placed here for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  - Must be 1, to indicate block-linear layout.  Necessary for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  - Reserved (To support 3D-surfaces with variable log2(depth) block
+ * size).  Must be zero.
+ *
+ * Note there is no log2(width) parameter.  Some portions of the
+ * hardware support a block width of two gobs, but it is 
impractical
+ * to use due to lack of support elsewhere, and has no known
+ * benefits.
+ *
+ * 11:9  - Reserved (To support 2D-array textures with variable array 
stride
+ * in blocks, specified via log2(tile width in blocks)).  Must be
+ * zero.
+ *
+ * 19:12 k Page Kind.  This value directly maps to a field in the page
+ * tables of all GPUs >= NV50.  It affects the exact layout of bits
+ * in memory and can be derived from the tuple
+ *
+ *   (format, GPU model, compression type, samples per pixel)
+ *
+ * Where compression type is defined below.  If GPU model were
+ * implied by the format modifier, format, or memory buffer, page
+ * kind would not need to be included in the modifier itself, but
+ * since the modifier should define the layout of the associated
+ * memory buffer independent from any device or other context, it
+ * must be included here.
+ *
+ * To grandfather in prior block linear format modifiers to this
+ * layout, the page kind "0", w

[PATCH] drm: Generalized NV Block Linear DRM format mod

2019-10-14 Thread James Jones
Beyond general review, I'm looking for feedback on a few things
specifically here:

-Is the level of backwards compatibility described here sufficient?
 Technically I can make the user space drivers support the old
 modifiers too, but that would mean the layout they specify would
 morph based on the GPU they're being used on, and sharing buffers
 between two different NV GPUs, which would appear to be possible,
 would result in corruption on one side or the other.

-I used "magic" numbers for all the bit shifting.  Would it be
 better to use __fourcc_XXX constants like the broadcom modifiers
 do?  I wasn't sure which style was preferred.  The nouveau code is
 full of magic numbers, but that's a bit lower level than this file.

If preferred, I can send this out as part of a patchset that adds
support for the modifiers to nouveau and TegraDRM, but I have some
things to clean up there before it's ready for proper review, and
I didn't want to block review of the basic modifier layout on that
work.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drm: Generalized NV Block Linear DRM format mod

2019-10-14 Thread James Jones
Builds upon the existing NVIDIA 16Bx2 block linear
format modifiers by adding more "fields" to the
existing parameterized
DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
macro that allow fully defining a unique-across-
all-NVIDIA-hardware bit layout using a minimal
set of fields and values.  The new modifier macro
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
effectively backwards compatible with the existing
macro, introducing a superset of the previously
definable format modifiers.

Backwards compatibility has two quirks.  First,
the zero value for the "kind" field, which is
implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
macro, must be special cased in drivers and
assumed to map to the pre-Turing generic kind of
0xfe, since a kind of "zero" is reserved for
linear buffer layouts on all GPUs.

Second, it is assumed backwards compatibility
is only needed when running on Tegra GPUs, and
specifically Tegra GPUs prior to Xavier.  This
is based on two assertions:

-Tegra GPUs prior to Xavier used a slightly
 different raw bit layout than desktop GPUs,
 making it impossible to directly share block
 linear buffers between the two.

-Support for the existing block linear modifiers
 was incomplete, making them useful only for
 exporting buffers created by nouveau and
 importing them to Tegra DRM as framebuffers for
 scan out.  There was no support for adding
 framebuffers using format modifiers in nouveau,
 nor importing dma-buf/PRIME GEM objects into
 nouveau userspace drivers with modifiers in Mesa.

Hence it is assumed the prior modifiers were not
intended for use on desktop GPUs, and as a
corrolary, were not intended to support sharing
block linear buffers across two different NVIDIA
GPUs.

Signed-off-by: James Jones 
---
 include/uapi/drm/drm_fourcc.h | 108 +++---
 1 file changed, 100 insertions(+), 8 deletions(-)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..cc9853d42a24 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -497,7 +497,99 @@ extern "C" {
 #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
 
 /*
- * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
+ * Generalized Block Linear layout, used by desktop GPUs starting with 
NV50/G80,
+ * and Tegra GPUs starting with Tegra K1.
+ *
+ * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
+ * based on the architecture generation.  GOBs themselves are then arranged in
+ * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
+ * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
+ * a block depth or height of "4").
+ *
+ * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
+ * in full detail.
+ *
+ *   Macro
+ * Bits  Param Description
+ *   - 
-
+ *
+ *  3:0  h log2(height) of each block, in GOBs.  Placed here for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  4:4  - Must be 1, to indicate block-linear layout.  Necessary for
+ * compatibility with the existing
+ * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
+ *
+ *  8:5  - Reserved (To support 3D-surfaces with variable log2(depth) block
+ * size).  Must be zero.
+ *
+ * Note there is no log2(width) parameter.  Some portions of the
+ * hardware support a block width of two gobs, but it is 
impractical
+ * to use due to lack of support elsewhere, and has no known
+ * benefits.
+ *
+ * 11:9  - Reserved (To support 2D-array textures with variable array 
stride
+ * in blocks, specified via log2(tile width in blocks)).  Must be
+ * zero.
+ *
+ * 19:12 k Page Kind.  This value directly maps to a field in the page
+ * tables of all GPUs >= NV50.  It affects the exact layout of bits
+ * in memory and can be derived from the tuple
+ *
+ *   (format, GPU model, compression type, samples per pixel)
+ *
+ * Where compression type is defined below.  If GPU model were
+ * implied by the format modifier, format, or memory buffer, page
+ * kind would not need to be included in the modifier itself, but
+ * since the modifier should define the layout of the associated
+ * memory buffer independent from any device or other context, it
+ * must be included here.
+ *
+ * To grandfather in prior block linear format modifiers to this
+ * layout, the page kind "0", which corresponds to "pitch/linear"
+ * and hence is unusable with block-li

Re: XDC allocator workshop and Wayland dmabuf hints

2019-10-14 Thread James Jones

On 10/13/19 2:05 PM, Scott Anderson wrote:

(Sorry to CCs for spam, I made an error in my first posting)

Hi,

There were certainly some interesting changes discussed at the allocator
workshop during XDC this year, and I'd like to just summarise my
thoughts on it and make sure everybody is on the same page.

For those who don't know who I am or my stake in this, I'm the
maintainer of the DRM and graphics code for the wlroots Wayland
compositor library. I'm ascent12 on Github and Freenode.


My understanding of the issue Nvidia was trying to solve was the
in-place transition between different format modifiers. E.g. if a client
is to be scanned out, the buffer would need to be transitioned to a
non-compressed format that the display controller can work with, but if
the client is to be composited, a compressed format would be used,
saving on memory bandwidth. Hardware may have more efficient ways to
transition between different formats, so it would be good if we can use
these and not rely on having to perform a blit if we don't need to. The
problem is more general than this, but that was just the example given.

The original solution proposed in James' talk was to add functions to
EGL/OpenGL/Vulkan and have the display server perform transitions where
required.


FWIW, I didn't intend to imply the display server should be the thing 
doing transitions.  It is a possible implementation, but I assumed 
display servers would only do these transitions in fallback paths or as 
part of some in-between period before clients picked up on the need for 
them.  Beyond the design goals you imply below, I wanted to note that 
it's more optimal to perform transitions in the client, and since 
transitions were intended to be persistent (paralleling Vulkan layout 
transitions), the compositor would need to transition back to the 
client's view of the image if the client hadn't picked up on the 
transition and agreed to handle it anyway, which would not be ideal and 
could cost additional perf in some cases.



Discussions during the workshop at the start tended to having libliftoff
handle all of this, but would require libliftoff to have its own
rendering context, which I think is bloating the purpose of the library.
Also discussed was to have libliftoff ask the compositor to perform the
transition if it thinks it was possible.


Another suggestion I made was to make use of Simon's dmabuf hints patch
to the wp_linux_dmabuf protocol [1] and leave it up to the client's GPU
driver to handle any transitions. This wasn't adequately represented in
the lightning talk summarising the workshop, so I'll go over it here
now, making sure everyone understands what it is and why I think it is
the way we should go forward.

Right now, a Wayland compositor will advertise all of the
format+modifier pairs that it supports, but currently does not provide
any context for clients as to which one they should actually choose.
It's basically up to chance if a client is able to be scanned out and is
likely to lead to several suboptimal situations.

The dmabuf hints patch adds a way to suggest a better format to use,
based on the current context. This is dynamic, and can be sent multiple
times over the lifetime of a surface. The patch also adds a way for the
compositor to tell the client which GPU its using, which is useful for
clients to know in multi GPU situations.

These hints are in various "tranches", which are just groups of
format+modifier pairs of the same preference. The tranches are ordered
from most optimal to least optimal. The most optimal tranche would imply
direct scanout, while a less optimal tranche would imply compositing,
but is not actually defined like that in the protocol.

If a client becomes fullscreen, we would send the format+modifier pairs
for the primary plane as the most optimal tranche. If a client is
eligible to be scanned out on an overlay plane, we would send the
format+modifier pairs for that plane. If a client is partially occluded
or otherwise not possible to be scanned out, we'd just have the normal
format+modifier pairs that we can use as a texture. Note that the
compositor won't send format+modifier pairs which we cannot texture
from, even if the plane advertises it's supported. We always need to be
able to fall back to compositing.


The hard part of figuring out which clients are "eligible" for being
scanned out on an overlay plane could be handled by libliftoff (or
something similar) and given back to the compositor to forward to
clients. For libliftoff to make a properly informed decision, I think
the atomic KMS API needs to be changed. We can only TEST_ONLY for valid
buffers, testing the immediate configuration, but doesn't allow us to
test for a configuration we WANT to go to. We need some sort of fake
framebuffer not backed by any real memory, but will allow us to
TEST_ONLY it. Without this, we may tell the client format+modifier pairs
that we think will work for scanout, but don't due to whatever 

Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2018-02-26 Thread James Jones

On 02/22/2018 01:16 PM, Alex Deucher wrote:

On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
 wrote:

On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg  wrote:

On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher  wrote:


On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace 

wrote:

On Thu 21 Dec 2017, Daniel Vetter wrote:

On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <

hoegsb...@google.com> wrote:

On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <

mvicom...@nvidia.com> wrote:

On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <

hoegsb...@gmail.com> wrote:

I'd like to see concrete examples of actual display controllers
supporting more format layouts than what can be specified with a 64
bit modifier.


The main problem is our tiling and other metadata parameters can't
generally fit in a modifier, so we find passing a blob of metadata a
more suitable mechanism.


I understand that you may have n knobs with a total of more than a

total of

56 bits that configure your tiling/swizzling for color buffers. What

I don't

buy is that you need all those combinations when passing buffers

around

between codecs, cameras and display controllers. Even if you're

sharing

between the same 3D drivers in different processes, I expect just

locking

down, say, 64 different combinations (you can add more over time) and
assigning each a modifier would be sufficient. I doubt you'd extract
meaningful performance gains from going all the way to a blob.


I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.

I summarized this opinion in VK_EXT_image_drm_format_modifier,
where I wrote an "introdution to modifiers" section. Here's an excerpt:

 One goal of modifiers in the Linux ecosystem is to enumerate for

each

 vendor a reasonably sized set of tiling formats that are

appropriate for

 images shared across processes, APIs, and/or devices, where each
 participating component may possibly be from different vendors.
 A non-goal is to enumerate all tiling formats supported by all

vendors.

 Some tiling formats used internally by vendors are inappropriate for
 sharing; no modifiers should be assigned to such tiling formats.



Where it gets tricky is how to select that subset?  Our tiling mode
are defined more by the asic specific constraints than the tiling mode
itself.  At a high level we have basically 3 tiling modes (out of 16
possible) that would be the minimum we'd want to expose for gfx6-8.
gfx9 uses a completely new scheme.
1. Linear (per asic stride requirements, not usable by many hw blocks)
2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
tile split (7 possible), sample split (4 possible), num banks (4
possible), bank width (4 possible), bank height (4 possible), macro
tile aspect (4 possible) all of which are asic config specific)



I guess we could do something like:
AMD_GFX6_LINEAR_ALIGNED_64B
AMD_GFX6_LINEAR_ALIGNED_256B
AMD_GFX6_LINEAR_ALIGNED_512B
AMD_GFX6_1D_THIN_DISPLAY
AMD_GFX6_1D_THIN_DEPTH
AMD_GFX6_1D_THIN_ROTATED
AMD_GFX6_1D_THIN_THIN
AMD_GFX6_1D_THIN_THICK


AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1

etc.



We only probably need 40 bits to encode all of the tiling parameters
so we could do family, plus tiling encoding that still seems unwieldy
to deal with from an application 

Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2018-01-03 Thread James Jones

On 12/28/2017 10:24 AM, Miguel Angel Vico wrote:

(Adding dri-devel back, and trying to respond to some comments from
the different forks)

James Jones wrote:


Your worst case analysis above isn't far off from our HW, give or take
some bits and axes here and there.  We've started an internal discussion
about how to lay out all the bits we need.  It's hard to even enumerate
them all without having a complete understanding of what capability sets
are going to include, a fully-optimized implementation of the mechanism
on our HW, and lot's of test scenarios though.


(thanks James for most of the info below)

To elaborate a bit, if we want to share an allocation across GPUs for 3D
rendering, it seems we would need 12 bits to express our
swizzling/tiling memory layouts for fermi+. In addition to that,
maxwell uses 3 more bits for this, and we need an extra bit to identify
pre-fermi representations.

We also need one bit to differentiate between Tegra and desktop, and
another one to indicate whether the layout is otherwise linear.

Then things like whether compression is used (one more bit), and we can
probably get by with 3 bits for the type of compression if we are
creative. However, it'd be way easier to just track arch + page kind,
which would be like 32 bits on its own.


Not clear if this is an NV-only term, so for those not familiar, page 
kind is very loosely the equivalent of a format modifier our HW uses 
internally in its memory management subsystem.  The value mappings vary 
a bit for each HW generation.



Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
bits.

If device-local properties are included, we might need a couple more
bits for caching.

We may also need to express locality information, which may take at
least another 2 or 3 bits.

If we want to share array textures too, you also need to pass the array
pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
its own.

So yes, as James mentioned, with some effort, we could technically fit
our current allocation parameters in a modifier, but I'm still not
convinced this is as future proof as it could be as our hardware grows
in capabilities.


Daniel Stone wrote:


So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.


I'm a bit confused. Can't modifiers be specified by vendors and only
interpreted by drivers? My understanding was that modifiers could
actually be treated as opaque 64-bit data, in which case they would
qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
scalable. What am I missing?


Daniel Vetter wrote:


I think in the interim figuring out how to expose kms capabilities
better (and necessarily standardizing at least some of them which
matter at the compositor level, like size limits of framebuffers)
feels like the place to push the ecosystem forward. In some way
Miguel's proposal looks a bit backwards, since it adds the pitch
capabilities to addfb, but at addfb time you've allocated everything
already, so way too late to fix things up. With modifiers we've added
a very simple per-plane property to list which modifiers can be
combined with which pixel formats. Tiny start, but obviously very far
from all that we'll need.


Not sure whether I might be misunderstanding your statement, but one of
the allocator main features is negotiation of nearly optimal allocation
parameters given a set of uses on different devices/engines by the
capability merge operation. A client should have queried what every
device/engine is capable of for the given uses, find the optimal set of
capabilities, and use it for allocating a buffer. At the moment these
parameters are given to KMS, they are expected to be good. If they
aren't, the client didn't do things right.


Rob Clark wrote:


It does seem like, if possible, starting out with modifiers for now at
the kernel interface would make life easier, vs trying to reinvent
both kernel and userspace APIs at the same time.  Userspace APIs are
easier to change or throw away.  Presumably by the time we get to the
point of changing kernel uabi, we are already using, and pretty happy
with, serialized liballoc data over the wire in userspace so it is
only a matter of changing the kernel interface.


I guess we can indeed start with modifiers for now, if that's what it
takes to get the allocator mechanisms rolling. However, it seems to me
that we won't be able to encode the same type of information included
in capability sets with modifiers in all cases. For instance, if we end
up encoding usage transition information in capability sets, how that
would translate to modifiers?

I assume display doesn't really care about a lot o

Re: [rfc repost] drm sync objects - a new beginning (make ickle happier?)

2017-04-19 Thread James Jones

On 04/19/2017 05:07 AM, Christian König wrote:

Am 13.04.2017 um 03:41 schrieb Dave Airlie:

Okay I've taken Chris's suggestions to heart and reworked things
around a sem_file to see how they might look.

This means the drm_syncobj are currently only useful for semaphores,
the flags field could be used in future to use it for other things,
and we can reintroduce some of the API then if needed.

This refactors sync_file first to add some basic rcu wrappers
about the fence pointer, as this point never updates this should
all be fine unlocked.

It then creates the sem_file with a mutex, and uses that to
track the semaphores with reduced fops and the replace and
get APIs.

Then it reworks the drm stuff on top, and fixes amdgpu bug
with old_fence.

Let's see if anyone prefers one approach over the other.


Yeah, I clearly prefer keeping only one object type for synchronization
in the kernel.

As I wrote in the other mail the argument of using the sync file for
semaphores was to be able to use it as in fence with the atomic mode
setting as well.


This may introduce incompatibilities in userspace though, as the 
response to Dave's original series' pointed out.  For example, the 
Vulkan extensions that allow importing sync files expect them to behave 
as sync files currently do, not as these new objects do.  Introducing 
the new behavior would invalidate language in those specifications, 
causing problems with the very use case I suspect these changes are 
trying to address.  Those specs are not finalized, so it could be fixed, 
but I think that highlights the general concern.



That a wait consumes a previous signal should be a specific behavior of
the operation and not the property of the object.

In other words I'm fine with using the sync_file in a 1:1 fashion with
Vulkan, but for the atomic API we probably want 1:N to be able to flip a
rendering result on multiple CRTCs at the same time.


Agreed, this usage seems valuable too.  Sem files still have a fence in 
them, and that doesn't seem like an implementation detail that needs to 
be hidden from userspace.  Vulkan solved this very issue by letting 
applications directly extract the sync_file fd from a Vulkan semaphore 
so they could use it with native operations that specifically require a 
sync file, via the experimental external semaphore extensions.  Perhaps 
there could be a sem file -> sync file conversion operation with 
semantics similar to a Vulkan semaphore -> sync file export operation? 
Note the Vulkan semantics for this are in churn, so it might be worth 
holding off a bit on adding that interface if this is the path you use, 
but it shouldn't need to block this series from my high-level read.


Thanks,
-James


Regards,
Christian.



Dave.
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Static inline DRM functions calling into GPL-only code

2017-04-11 Thread James Jones

On 04/11/2017 09:09 AM, Harry Wentland wrote:

On 2017-04-11 11:15 AM, James Jones wrote:

On 04/10/2017 11:20 PM, Daniel Vetter wrote:

On Tue, Apr 11, 2017 at 7:52 AM, Daniel Vetter <dan...@ffwll.ch> wrote:

On Tue, Apr 11, 2017 at 6:14 AM, Nikhil Mahale <nmah...@nvidia.com>
wrote:

My name is Nikhil Mahale, and I work at NVIDIA in the Linux drivers
team.

I have been working on adding DRM KMS support to our driver. The
NVIDIA
GPU driver package (364.12 and higher) provides a kernel module,
nvidia-drm.ko, which is licensed as "MIT". This module registers a DRM
driver with the DRM subsystem of the Linux kernel and advertises KMS
capability on Linux kernel v4.1 or higher, with CONFIG_DRM and
CONFIG_DRM_KMS_HELPER enabled.

We have been able to maintain compatibility between nvidia-drm.ko and
Linux kernels from v2.6.9 to v4.10. Unfortunately
with release candidates of v4.11:

* Commit 10383aea2f445bce9b2a2b308def08134b438c8e changed the kernel's
kref implementation to use refcount_inc and refcount_dec_and_test.
* Commit 29dee3c03abce04cd527878ef5f9e5f91b7b83f4 made refcount_inc
and
refcount_dec_and_test EXPORT_SYMBOL_GPL.

DRM drivers call refcount_inc through static inline function
callchains
such as:

drm_crtc_commit_put() => kref_put() => refcount_dec_and_test()
drm_crtc_commit_get() => kref_get() => refcount_inc()

drm_atomic_state_put() => kref_put() => refcount_dec_and_test()
drm_atomic_state_get() => kref_get() => refcount_inc()

drm_gem_object_reference() => kref_get => refcount_inc()

This causes nvidia-drm.ko to inadvertently pick up references to
EXPORT_SYMBOL_GPL symbols.

There is not interest in relaxing the export of refcount_inc, and
changing the license of nvidia-drm.ko isn't viable right now.

So, the remaining options we see are:

* Make these static inline DRM functions EXPORT_SYMBOL instead of
inline.

* Make these static inline DRM functions not use kref.

* Make nvidia-drm.ko not use these static inline DRM functions.

None of those seem good, though the first might be least bad.  Do
any of
those seem reasonable?


* Open-source the nvidia kernel driver? tbh I'm not sure how much you
can still make the case that your driver is fully an independent thing
if you're adopting stuff like atomic modesetting. Might be better to
make all the glue/remapping code from linux atomic to the shared
cross-os code at least open


As the original message stated, this code is already open (MIT license).



Just out of curiosity, can I find this on any public repo or webpage?


This is our usual Linux driver download landing page:

https://www.nvidia.com/object/unix.html

We don't break out the nvidia-drm source into a separate package like we 
do for some of our other open-source components, but it's included when 
you download the full driver.  You can unpack it without installing, e.g:


  $ sh ~/Downloads/NVIDIA-Linux-x86_64-378.13.run -x

Then it will be in ./NVIDIA-Linux-x86_64-378.13/kernel/nvidia-drm/

Feedback welcome.

Thanks,
-James


If inlining is the issue it looks like this is not used by any upstream
DRM driver (or DAL) directly but only from a bunch of atomic functions,
none of which are inline.

If this is an issue for NVidia would this also be an issue for any other
MIT licensed code, such as drm_atomic_helper.c?

Harry


Thanks,
-James


... And atomic is pretty much guaranteed
to change all the time anyway, we're definitely not going to make a
stable kabi for you folks, so you might want to do that for practical
reasons anyway.

Just my 2cents, personal opinion, not reflecting intel's, not legal
advice, yadayada and all that :-)


Apparently coffee didn't work yet, so let me retry the more serious
part of my reply. I'd go with a shim that essentially remaps the linux
atomic to whatever cross-os datastructures and semantics you have in
the blob. That also has the benefit of insulating you a bit more from
upstream changes in atomic (which will happen), and enthusiasts might
get around to porting to new kernels before you do. Essentially pick
the architecture of amd's DAL, then fully open the glue layer. With my
maintainer hat on I'm at least not inclinced to add the "is this fair
use or not" hacks on upstream's side, simply because sooner or later
we'll break them and then we have the angry users, instead of nvidia.
And that's the wrong place for bug reports for blobs :-)
-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Static inline DRM functions calling into GPL-only code

2017-04-11 Thread James Jones

On 04/10/2017 11:20 PM, Daniel Vetter wrote:

On Tue, Apr 11, 2017 at 7:52 AM, Daniel Vetter  wrote:

On Tue, Apr 11, 2017 at 6:14 AM, Nikhil Mahale  wrote:

My name is Nikhil Mahale, and I work at NVIDIA in the Linux drivers
team.

I have been working on adding DRM KMS support to our driver. The NVIDIA
GPU driver package (364.12 and higher) provides a kernel module,
nvidia-drm.ko, which is licensed as "MIT". This module registers a DRM
driver with the DRM subsystem of the Linux kernel and advertises KMS
capability on Linux kernel v4.1 or higher, with CONFIG_DRM and
CONFIG_DRM_KMS_HELPER enabled.

We have been able to maintain compatibility between nvidia-drm.ko and
Linux kernels from v2.6.9 to v4.10. Unfortunately
with release candidates of v4.11:

* Commit 10383aea2f445bce9b2a2b308def08134b438c8e changed the kernel's
kref implementation to use refcount_inc and refcount_dec_and_test.
* Commit 29dee3c03abce04cd527878ef5f9e5f91b7b83f4 made refcount_inc and
refcount_dec_and_test EXPORT_SYMBOL_GPL.

DRM drivers call refcount_inc through static inline function callchains
such as:

drm_crtc_commit_put() => kref_put() => refcount_dec_and_test()
drm_crtc_commit_get() => kref_get() => refcount_inc()

drm_atomic_state_put() => kref_put() => refcount_dec_and_test()
drm_atomic_state_get() => kref_get() => refcount_inc()

drm_gem_object_reference() => kref_get => refcount_inc()

This causes nvidia-drm.ko to inadvertently pick up references to
EXPORT_SYMBOL_GPL symbols.

There is not interest in relaxing the export of refcount_inc, and
changing the license of nvidia-drm.ko isn't viable right now.

So, the remaining options we see are:

* Make these static inline DRM functions EXPORT_SYMBOL instead of
inline.

* Make these static inline DRM functions not use kref.

* Make nvidia-drm.ko not use these static inline DRM functions.

None of those seem good, though the first might be least bad.  Do any of
those seem reasonable?


* Open-source the nvidia kernel driver? tbh I'm not sure how much you
can still make the case that your driver is fully an independent thing
if you're adopting stuff like atomic modesetting. Might be better to
make all the glue/remapping code from linux atomic to the shared
cross-os code at least open


As the original message stated, this code is already open (MIT license).

Thanks,
-James


... And atomic is pretty much guaranteed
to change all the time anyway, we're definitely not going to make a
stable kabi for you folks, so you might want to do that for practical
reasons anyway.

Just my 2cents, personal opinion, not reflecting intel's, not legal
advice, yadayada and all that :-)


Apparently coffee didn't work yet, so let me retry the more serious
part of my reply. I'd go with a shim that essentially remaps the linux
atomic to whatever cross-os datastructures and semantics you have in
the blob. That also has the benefit of insulating you a bit more from
upstream changes in atomic (which will happen), and enthusiasts might
get around to porting to new kernels before you do. Essentially pick
the architecture of amd's DAL, then fully open the glue layer. With my
maintainer hat on I'm at least not inclinced to add the "is this fair
use or not" hacks on upstream's side, simply because sooner or later
we'll break them and then we have the angry users, instead of nvidia.
And that's the wrong place for bug reports for blobs :-)
-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Vulkan WSI+VK_KHR_display for KMS/DRM?

2017-04-11 Thread James Jones

On 04/10/2017 12:32 PM, Jason Ekstrand wrote:

On April 10, 2017 12:29:12 PM Chad Versace 
wrote:


On Tue 04 Apr 2017, Keith Packard wrote:

Jason Ekstrand  writes:

> Interesting question.  To my knowledge, no one has actually
implemented the
> Vulkan WSI direct-to-display extensions.  (I tried to prevent them
from
> getting released with 1.0 but failed.)  I believe the correct
answer is to
> use the external memory dma-buf stuff that chad and I have been
using and
> talk directly to KMS.

Sounds good, and minimizes the amount of code I have to write too :-)


I found an implementation. Nvidia's 2017-04-06 Linux driver release
notes claim newly added support for VK_EXT_direct_mode_diplay, which is
layered atop VK_KHR_display.


If it's useful to do so, we can always pull Keith's work into Mesa or
even put it in a layer.  Let's start with an implementation and figure
out the Vulkan bits later.  Of there's something interesting in NVIDIA's
extensions, we can let that guide the design of course.


http://www.nvidia.com/download/driverResults.aspx/117741/en-us
https://www.khronos.org/registry/vulkan/specs/1.0-extensions/html/vkspec.html#VK_EXT_direct_mode_display



> I see no good reason to have a large abstraction in
> the middle.

Other than 'it's a standard', neither do I.


Yup.


There's one good technical reason, at least on NVIDIA HW but I suspect 
others, and it's the same reason that spawned the EGLStream Vs. raw 
DRM-KMS debate:  dma-buf+KMS doesn't let you transition to the 
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout, so rendering to/texturing from 
the dma-buf images won't be as optimal as rendering to VK_KHR_display 
images.


You could solve that (and I intend to) with the combination of Vulkan + 
the generic allocator stuff we started discussing at XDC last year, but 
it'll take more work.  No, I haven't stopped working on that, I just 
haven't had much time for it lately.  I'll have updates from my side 
there soon.


Besides that, the abstraction's primary purpose is the same as any 
abstraction: portability.  Applications targeting it will work on 
platforms that don't have DRM-KMS.  That's more useful if there's a 
DRM-KMS implementation too.  I fully expect that you could implement it 
via a Vulkan implicit layer as suggested here once the external memory 
and dma-buf stuff is complete, and there'd be nothing sub-optimal about 
that if you could properly transition the layouts.  Nothing wrong with 
that implementation path.  It also shouldn't be a lot of code to add a 
native DRM-KMS implementation in Mesa and then lift it to a layer later, 
or write it as a Vulkan layer now and add optimization once the generic 
allocator + Vulkan interactions are worked out.  Clean interaction with 
DRM-KMS was one of the goals of the spec.


I know of two (maybe three, but I haven't confirmed the last) other 
shipping implementations besides ours BTW, so this isn't a de-facto 
NVIDIA-ism dressed up like a standard.  I don't think the other 
implementations are currently publicly available.


Thanks,
-James



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Unix Device Memory Allocation project

2017-01-03 Thread James Jones
On 01/03/2017 04:06 PM, Marek Olšák wrote:
> On Wed, Jan 4, 2017 at 12:43 AM, James Jones  wrote:
>> On 01/03/2017 03:38 PM, Marek Olšák wrote:
>>>
>>> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter  wrote:
>>>>
>>>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák  wrote:
>>>>>>>
>>>>>>> We've had per buffer metadata in Radeon since KMS, which I believe
>>>>>>> first
>>>>>>> appeared in 2009. It's 4 bytes large and is used to communicate tiling
>>>>>>> flags between Mesa, DDX, and the kernel display code. It was a widely
>>>>>>> accepted solution back then and Red Hat was the main developer. So
>>>>>>> yeah,
>>>>>>> pretty much all people except Intel were collaborating on "sneaking"
>>>>>>> this
>>>>>>> in in 2009. I think radeon driver developers deserve an apology for
>>>>>>> that
>>>>>>> language.
>>>>>>>
>>>>>>> Amdgpu extended that metadata to 8 bytes and it's used in the same way
>>>>>>> as
>>>>>>> radeon. Additionally, amdgpu added opaque metadata having 256 bytes
>>>>>>> for use
>>>>>>> by userspace drivers only. The kernel driver isn't supposed to read it
>>>>>>> or
>>>>>>> parse it. The format is negotiated between userspace driver developers
>>>>>>> for
>>>>>>> sharing of more complex allocations than 2D displayable surfaces.
>>>>>>
>>>>>>
>>>>>> Metadata needed for kms (what Christian also pointed out) is what
>>>>>> everyone
>>>>>> did (intel included) and I think that's perfectly reasonable. And I was
>>>>>> aware of that radeon is doing that since the dawn of ages since
>>>>>> forever.
>>>>>>
>>>>>> What I think is not really ok is opaque metadata blobs that the kernel
>>>>>> never ever inspect, but just carries around. That essentially means
>>>>>> you're
>>>>>> reimplementing some bad form of IPC, and I dont think that's something
>>>>>> the
>>>>>> drm subsystem (or dma-buf) really should be doing. Because you still
>>>>>> have
>>>>>> that real protocol in userspace (dri2/3, wayland, whatever), but now
>>>>>> with
>>>>>> a side channel with no documented ordering and synchronization. It gets
>>>>>> the job done for single-vendor buffer metadata transport, but as soon
>>>>>> as
>>>>>> there's more than one vendor, or as soon as you need to reallocate
>>>>>> buffers
>>>>>> dynamically because the usage changes it gets bad imo (and I've seen
>>>>>> what
>>>>>
>>>>>
>>>>> The metadata is immutable after allocation, so it's not a
>>>>> communication channel. There is no synchronization or ordering needed
>>>>> for immutable metadata. That implies that a shared buffer can't be
>>>>> reused for an entirely different purpose. It can only be used as-is or
>>>>> freed.
>>>>>
>>>>> For suballocated memory, the idea is to reallocate it as a separate
>>>>> buffer on the first "handle" export, so that shared suballocated
>>>>> buffers don't exist.
>>>>
>>>>
>>>> Yeah, once it becomes mutable the fun starts imo. I didn't realize
>>>> that you're treating it strictly immutable since at least the kernel
>>>> ioctl has both set and get (and that's the thing I looked at).
>>>> Immutable stuff shouldn't be any problem (except that of course it
>>>> won't work cross-driver in any fashion)
>>>>
>>>>>> that looks like on android in various forms). And that consensus (at
>>>>>> least
>>>>>> among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
>>>>>> meeting we've had over 5 years ago. Not sure we're gaining anything
>>>>>> with a
>>>>>> "who's older" competition.
>>>>>>
>>>>>> Anyways it's there and it's uabi so will never disappear. Just wanted
>>>>>> to
>>>>>> make sure it's clea

Unix Device Memory Allocation project

2017-01-03 Thread James Jones
On 01/03/2017 03:38 PM, Marek Olšák wrote:
> On Thu, Oct 20, 2016 at 8:31 AM, Daniel Vetter  wrote:
>> On Wed, Oct 19, 2016 at 6:46 PM, Marek Olšák  wrote:
> We've had per buffer metadata in Radeon since KMS, which I believe first
> appeared in 2009. It's 4 bytes large and is used to communicate tiling
> flags between Mesa, DDX, and the kernel display code. It was a widely
> accepted solution back then and Red Hat was the main developer. So yeah,
> pretty much all people except Intel were collaborating on "sneaking" this
> in in 2009. I think radeon driver developers deserve an apology for that
> language.
>
> Amdgpu extended that metadata to 8 bytes and it's used in the same way as
> radeon. Additionally, amdgpu added opaque metadata having 256 bytes for 
> use
> by userspace drivers only. The kernel driver isn't supposed to read it or
> parse it. The format is negotiated between userspace driver developers for
> sharing of more complex allocations than 2D displayable surfaces.

 Metadata needed for kms (what Christian also pointed out) is what everyone
 did (intel included) and I think that's perfectly reasonable. And I was
 aware of that radeon is doing that since the dawn of ages since forever.

 What I think is not really ok is opaque metadata blobs that the kernel
 never ever inspect, but just carries around. That essentially means you're
 reimplementing some bad form of IPC, and I dont think that's something the
 drm subsystem (or dma-buf) really should be doing. Because you still have
 that real protocol in userspace (dri2/3, wayland, whatever), but now with
 a side channel with no documented ordering and synchronization. It gets
 the job done for single-vendor buffer metadata transport, but as soon as
 there's more than one vendor, or as soon as you need to reallocate buffers
 dynamically because the usage changes it gets bad imo (and I've seen what
>>>
>>> The metadata is immutable after allocation, so it's not a
>>> communication channel. There is no synchronization or ordering needed
>>> for immutable metadata. That implies that a shared buffer can't be
>>> reused for an entirely different purpose. It can only be used as-is or
>>> freed.
>>>
>>> For suballocated memory, the idea is to reallocate it as a separate
>>> buffer on the first "handle" export, so that shared suballocated
>>> buffers don't exist.
>>
>> Yeah, once it becomes mutable the fun starts imo. I didn't realize
>> that you're treating it strictly immutable since at least the kernel
>> ioctl has both set and get (and that's the thing I looked at).
>> Immutable stuff shouldn't be any problem (except that of course it
>> won't work cross-driver in any fashion)
>>
 that looks like on android in various forms). And that consensus (at least
 among folks involved in dma-buf) goes back to the dma-buf kickoff 3-day
 meeting we've had over 5 years ago. Not sure we're gaining anything with a
 "who's older" competition.

 Anyways it's there and it's uabi so will never disappear. Just wanted to
 make sure it's clear that for dma-buf we've discussed this years ago, and
 decided it wasn't a great idea. And I think that's still correct.
>>>
>>> The arguments against blob metadata sound reasonable to me. I'm pretty
>>> sceptic that window system protocols will make driver-specific
>>> metadata blobs redundant anytime soon though. It seems the protocols
>>> don't get much attention nowadays and there is no incentive to do
>>> things differently in that area. At least that's how it appears to me,
>>> but I'm not involved in that.
>>
>> Folks are working on protocols again, at least I think the plan is to
>> make all that shared buffer allocation dance also work over
>> compositor/client situation (would be a bit pointless without that).
>> And agreed there'll always be driver-specific stuff which is opaque to
>> everyone else, but I hope at least in the future that all gets
>> shuffled around through protocol extensions. And not in the way every
>> Android gfx stack seems to work, where everyone has their own
>> vendor-private ipc-over-dma-buf thing. Wayland definitely got this
>> right, both protocol versioning and being able to add any kind of
>> new/vendor-private protocol endpoints to any wayland protocol. X is a
>> lot more pain, but since it finally looks like the world is switching
>> away from it we might get away with  a simpler protocol there. At
>> least all the tricky reallocation dances seem to matter a lot more on
>> mobile/tablets/phones, and there Wayland starts to rule.
>
> I've been thinking about it, and it looks like we're gonna continue
> using immutable per-BO metadata (buffer layout, tiling description,
> compression flags). The reasons are that everything else is less
> economical, and the current "modifier" work done in EGL/GBM is
> insufficient for our hardware - we need 

Unix Device Memory Allocation project

2016-10-18 Thread James Jones
Thanks for the detailed writeup, and it was good to meet you at XDC.  Below:

On 10/18/2016 04:40 PM, Marek Olšák wrote:
> Hi,
>
> The text below describes how open source AMDGPU buffer sharing works.
> I hope you'll find some useful bits in it.
>
>
> Producer = allocates a buffer (or texture), and exports its handle
> (DMABUF, etc.), and can use the buffer in various ways
>
> Consumer = imports the handle, and can use the buffer in various ways
>
>
> *** Producer-consumer interaction. ***
>
> 1) On handle export, the producer receives these flags:
>
> - READ, WRITE, READ+WRITE: Describe the expected usage in the consumer.
>   * The producer decides if it needs to disable compression based on
> those flags.
>
> - EXPLICIT_FLUSH flag: Meaning that the producer will explicitly
> receive a "flush_resource" call before the consumer starts using the
> buffer. This is a hint that the producer doesn't have to keep track of
> "when to do decompression" when sharing the buffer with the consumer.
>
>
> 2) Passing metadata (tiling, pixel ordering, format, layout) info
> between the producer and consumer:
>
> - All AMDGPU buffer/texture allocations have 256 bytes (64 dwords) of
> internal per-allocation metadata storage that lives in the kernel
> space. There are amdgpu-specific ioctls that can "set" and "get" the
> metadata. Any process that has a buffer handle can do that.
>   * The produces writes the metadata, the consumer reads it.
>
> - The producer-consumer interop API doesn't know about the metadata.
> All you need to pass around is a buffer handle. (KMS, DMABUF, etc.)
>   * There was a note during the talk that DMABUF doesn't have any
> metadata. Well, I just told you that it has, but it's private to
> amdgpu and possibly accessible to other kernel drivers too.

OK.  I believe someone pointed this out during my talk or afterwards as 
well.  Some drivers are using this method, but there seems to be some 
debate over whether this is the preferred general design.  Others have 
told me this isn't the right mechanism to store this sort of metadata, 
but I'm not familiar with the specific counter arguments.

>   * We can build upon this idea. I think the worst thing to do would
> be to add metadata handling to driver-agnostic userspace APIs. Really,
> driver-agnostic APIs shouldn't know about that, because they can't
> understand all the hw-specific information encoded in the metadata.
> Also, when you want to change the metadata format, you only have to
> update the affected drivers, not userspace APIs.

How does this kernel-side metadata interact with userspace driver 
suballocation, or application-managed suballocation in APIs such as Vulkan?

Thanks,
-James

> 3) Internal AMDGPU metadata storage format
> - The header contains: Vendor ID, PCI ID, and version number.
> - The header is followed by PCI-ID-specific data. The PCI ID and the
> version number define the format.
> - If the consumer runs on a different device, it must read the header
> and parse the metadata based on that. It implies that the
> driver-specific consumer code needs to know about all potential
> producer devices.
>
>
> Bottom line: DMABUF handles alone are fully sufficient for sharing
> buffers/textures between devices and processes from the AMDGPU point
> of view.
>
> HW driver implementation: The driver doesn't know anything about the
> users of exported or imported buffers. It only acts based on the few
> flags described in section 1. So far that's all we've needed.
>
>
> *** Use cases ***
>
> 1) DRI (producer: application; consumer: X server)
> - The producer receives these flags: READ, EXPLICIT_FLUSH. The X
> server will treat the shared "texture" as read-only. EXPLICIT_FLUSH
> ensures the texture can be compressed, and "flush_resource" will be
> called as part of SwapBuffers and "glFlush: GL_FRONT".
> - The X server can run on a different device. In that case, the window
> system API passes the "LINEAR" flag to the driver during allocation.
> That's suboptimal and fixable.
>
>
> 2) OpenGL-OpenCL interop (OpenGL always exports handles, OpenCL always
> imports handles)
> - Possible flags: READ, WRITE, READ+WRITE
> - OpenCL doesn't give us any other flags, so we are stuck with those.
> - Inter-device sharing is possible if the consumer understands the
> producer's metadata and tiling layouts.
>
> (amdgpu actually stores 2 different metadata blocks per allocation,
> but the simpler one is too limited and has only 8 bytes)
>
> Marek
>
>
> On Wed, Oct 5, 2016 at 1:47 AM, James Jones  wrote:
>> Hello everyone,
>>
>> As many are aware, we took up the issue of surface/memory a

Unix Device Memory Allocation project

2016-10-04 Thread James Jones
Hello everyone,

As many are aware, we took up the issue of surface/memory allocation at 
XDC this year.  The outcome of that discussion was the beginnings of a 
design proposal for a library that would server as a cross-device, 
cross-process surface allocator.  In the past week I've started to 
condense some of my notes from that discussion down to code & a design 
document.  I've posted the first pieces to a github repository here:

   https://github.com/cubanismo/allocator

This isn't anything close to usable code yet.  Just headers and docs, 
and incomplete ones at that.  However, feel free to check it out if 
you're interested in discussing the design.

Thanks,
-James


[RFC] Explicit synchronization for Nouveau

2014-09-29 Thread James Jones
On 9/29/14 8:42 AM, Jerome Glisse wrote:
> On Mon, Sep 29, 2014 at 09:43:02AM +0200, Daniel Vetter wrote:
>> On Fri, Sep 26, 2014 at 01:00:05PM +0300, Lauri Peltonen wrote:
>>>
>>> Hi guys,
>>>
>>>
>>> I'd like to start a new thread about explicit fence synchronization.  This 
>>> time
>>> with a Nouveau twist. :-)
>>>
>>> First, let me define what I understand by implicit/explicit sync:
>>>
>>> Implicit synchronization
>>> * Fences are attached to buffers
>>> * Kernel manages fences automatically based on buffer read/write access
>>>
>>> Explicit synchronization
>>> * Fences are passed around independently
>>> * Kernel takes and emits fences to/from user space when submitting work
>>>
>>> Implicit synchronization is already implemented in open source drivers, and
>>> works well for most use cases.  I don't seek to change any of that.  My
>>> proposal aims at allowing some drm drivers to operate in explicit sync mode 
>>> to
>>> get maximal performance, while still remaining fully compatible with the
>>> implicit paradigm.
>>
>> Yeah, pretty much what we have in mind on the i915 side too. I didn't look
>> too closely at your patches, so just a few high level comments on your rfc
>> here.
>>
>>> I will try to explain why I think we should support the explicit model as 
>>> well.
>>>
>>>
>>> 1. Bindless graphics
>>>
>>> Bindless graphics is a central concept when trying to reduce the OpenGL 
>>> driver
>>> overhead.  The idea is that the application can bind a large set of buffers 
>>> to
>>> the working set up front using extensions such as GL_ARB_bindless_texture, 
>>> and
>>> they remain resident until the application releases them (note that compute
>>> APIs have typically similar semantics).  These working sets can be huge,
>>> hundreds or even thousands of buffers, so we would like to opt out from the
>>> per-submit overhead of acquiring locks, waiting for fences, and storing 
>>> fences.
>>> Automatically synchronizing these working sets in kernel will also prevent
>>> parallelism between channels that are sharing the working set (in fact 
>>> sharing
>>> just one buffer from the working set will cause the jobs of the two 
>>> channels to
>>> be serialized).
>>>
>>> 2. Evolution of graphics APIs
>>>
>>> The graphics API evolution seems to be going to a direction where game 
>>> engine
>>> and middleware vendors demand more control over work submission and
>>> synchronization.  We expect that this trend will continue, and more and more
>>> synchronization decisions will be pushed to the API level.  OpenGL and EGL
>>> already provide good explicit command stream level synchronization 
>>> primitives:
>>> glFenceSync and EGL_KHR_wait_sync.  Their use is also encouraged - for 
>>> example
>>> EGL_KHR_image_base spec clearly states that the application is responsible 
>>> for
>>> synchronizing accesses to EGLImages.  If the API that is exposed to 
>>> developers
>>> gives the control over synchronization to the developer, then implicit waits
>>> that are inserted by the kernel are unnecessary and unexpected, and can
>>> severely hurt performance.  It also makes it easy for the developer to write
>>> code that happens to work on Linux because of implicit sync, but will fail 
>>> on
>>> other platforms.
>>>
>>> 3. Suballocation
>>>
>>> Using user space suballocation can help reduce the overhead when a large 
>>> number
>>> of small textures are used.  Synchronizing suballocated surfaces implicitly 
>>> in
>>> kernel doesn't make sense - many channels should be able to access the same
>>> kernel-level buffer object simultaneously.
>>>
>>> 4. Buffer sharing complications
>>>
>>> This is not really an argument for explicit sync as such, but I'd like to 
>>> point
>>> out that sharing buffers across SoC engines is often much more complex than
>>> just exporting and importing a dma-buf and waiting for the dma-buf fences.
>>> Sometimes we need to do color format or tiling layout conversion.  
>>> Sometimes,
>>> at least on Tegra, we need to decompress buffers when we pass them from the 
>>> GPU
>>> to an engine that doesn't support framebuffer compression.  These things are
>>> not uncommon, particularly when we have SoC's that combine licensed IP 
>>> blocks
>>> from different vendors.  My point is that user space is already heavily
>>> involved when sharing buffers between drivers, and giving it some more 
>>> control
>>> over synchronization is not adding that much complexity.
>>>
>>>
>>> Because of the above arguments, I think it makes sense to let some user 
>>> space
>>> drm drivers opt out from implicit synchronization, while allowing them to 
>>> still
>>> remain fully compatible with the rest of the drm world that uses implicit
>>> synchronization.  In practice, this would require three things:
>>>
>>> (1) Support passing fences (that are not tied to buffer objects) between 
>>> kernel
>>>  and user space.
>>>
>>> (2) Stop automatically storing fences to the buffers that user space wants 
>>> to
>>>