Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 8/9/23 18:53, Boris Brezillon wrote: +enum drm_panthor_sync_op_flags { + /** @DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK: Synchronization handle type mask. */ + DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK = 0xff, + + /** @DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_SYNCOBJ: Synchronization object type. */ + DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_SYNCOBJ = 0, + + /** +* @DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ: Timeline synchronization +* object type. +*/ + DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ = 1, + + /** @DRM_PANTHOR_SYNC_OP_WAIT: Wait operation. */ + DRM_PANTHOR_SYNC_OP_WAIT = 0 << 31, + + /** @DRM_PANTHOR_SYNC_OP_SIGNAL: Signal operation. */ + DRM_PANTHOR_SYNC_OP_SIGNAL = 1 << 31, +}; We get an issue with --pedantic here: warning: enumerator value for 'DRM_PANTHOR_SYNC_OP_SIGNAL' is not an integer constant expression [-Wpedantic] Would be god to get rid of this, so user space can include this header without disabling pedantic. Either we can stop using the top most bit or a cast value like "(int)(1U << 31)" + /** +* @DRM_PANTHOR_VM_BIND_OP_TYPE_MASK: Mask used to determine the type of operation. +*/ + DRM_PANTHOR_VM_BIND_OP_TYPE_MASK = 0xf << 28, Same issue for this member. Either not use the top most bit or cast value like "(int)(0xfU << 28)" avoids the pedantic warning. -- Regards, Ketil Johnsen
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 04/09/2023 17:25, Robin Murphy wrote: > On 2023-09-04 17:16, Boris Brezillon wrote: >> On Mon, 4 Sep 2023 16:22:19 +0100 >> Steven Price wrote: >> >>> On 04/09/2023 10:26, Boris Brezillon wrote: On Mon, 4 Sep 2023 08:42:08 +0100 Steven Price wrote: > On 01/09/2023 17:10, Boris Brezillon wrote: >> On Wed, 9 Aug 2023 18:53:15 +0200 >> Boris Brezillon wrote: >> >>> +/** >>> + * DOC: MMIO regions exposed to userspace. >>> + * >>> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET >>> + * >>> + * File offset for all MMIO regions being exposed to userspace. >>> Don't use >>> + * this value directly, use DRM_PANTHOR_USER__OFFSET >>> values instead. >>> + * >>> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET >>> + * >>> + * File offset for the LATEST_FLUSH_ID register. The Userspace >>> driver controls >>> + * GPU cache flushling through CS instructions, but the flush >>> reduction >>> + * mechanism requires a flush_id. This flush_id could be queried >>> with an >>> + * ioctl, but Arm provides a well-isolated register page >>> containing only this >>> + * read-only register, so let's expose this page through a >>> static mmap offset >>> + * and allow direct mapping of this MMIO region so we can avoid the >>> + * user <-> kernel round-trip. >>> + */ >>> +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) >> >> I'm playing with a 32-bit kernel/userspace, and this is problematic, >> because vm_pgoff is limited to 32-bit there, meaning we can only >> map up >> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let >> userspace set the mmio range? > > Hmm, I was rather hoping we could ignore 32 bit these days ;) But > while > I can't see why anyone would be running a 32 bit kernel, I guess 32 > bit > user space is likely to still be needed. Uh, I just hit a new problem with 32-bit kernels: the io-pgtable interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any chance you could advise me on what to do here? 1. assume this limitation is here for a good reason, and limit the GPU VA space to 32-bits on 32-bit kernels or 2. update the interface to make iova an u64 >>> >>> I'm not sure I can answer the question from a technical perspective, >>> hopefully Robin will be able to. >> >> Had a quick chat with Robin, and he's recommending going for #1 too. >> >>> >>> But why do we care about 32-bit kernels on a platform which is new >>> enough to have a CSF-GPU (and by extension a recent 64-bit CPU)? >> >> Apparently the memory you save by switching to a 32-bit kernel matters >> to some people. To clarify, the CPU is aarch64, but they want to use it >> in 32-bit mode. >> >>> >>> Given the other limitations present in a 32-bit kernel I'd be tempted to >>> say '1' just for simplicity. Especially since apparently we've lived >>> with this for panfrost which presumably has the same limitation (even >>> though all Bifrost/Midgard GPUs have at least 33 bits of VA space). >> >> Well, Panfrost is simpler in that you don't have this kernel VA range, >> and, IIRC, we are using the old format that naturally limits the GPU VA >> space to 4G. > > FWIW the legacy pagetable format itself should be fine going up to > however many bits the GPU supports, however there were various ISA > limitations around crossing 4GB boundaries, and the easiest way to avoid > having to think about those was to just not use more than 4GB of VA at > all (minus chunks at the ends for similar weird ISA reasons). Exactly right. The legacy pagetable format supports the full range of VA. Indeed kbase used the legacy format for a long time. However the ISA has special handling for addresses where bits 31:12 are all 0 or all 1, so we have to avoid executable code landing on these regions. kbase has a modified version of 'unmapped_area_topdown'[1] to handle these additional restrictions. Steve [1] kbase_unmapped_area_topdown() https://android.googlesource.com/kernel/google-modules/gpu/+/refs/tags/android-12.0.0_r0.42/mali_kbase/thirdparty/mali_kbase_mmap.c#97
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 2023-09-04 17:16, Boris Brezillon wrote: On Mon, 4 Sep 2023 16:22:19 +0100 Steven Price wrote: On 04/09/2023 10:26, Boris Brezillon wrote: On Mon, 4 Sep 2023 08:42:08 +0100 Steven Price wrote: On 01/09/2023 17:10, Boris Brezillon wrote: On Wed, 9 Aug 2023 18:53:15 +0200 Boris Brezillon wrote: +/** + * DOC: MMIO regions exposed to userspace. + * + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET + * + * File offset for all MMIO regions being exposed to userspace. Don't use + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. + * + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET + * + * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls + * GPU cache flushling through CS instructions, but the flush reduction + * mechanism requires a flush_id. This flush_id could be queried with an + * ioctl, but Arm provides a well-isolated register page containing only this + * read-only register, so let's expose this page through a static mmap offset + * and allow direct mapping of this MMIO region so we can avoid the + * user <-> kernel round-trip. + */ +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) I'm playing with a 32-bit kernel/userspace, and this is problematic, because vm_pgoff is limited to 32-bit there, meaning we can only map up to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let userspace set the mmio range? Hmm, I was rather hoping we could ignore 32 bit these days ;) But while I can't see why anyone would be running a 32 bit kernel, I guess 32 bit user space is likely to still be needed. Uh, I just hit a new problem with 32-bit kernels: the io-pgtable interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any chance you could advise me on what to do here? 1. assume this limitation is here for a good reason, and limit the GPU VA space to 32-bits on 32-bit kernels or 2. update the interface to make iova an u64 I'm not sure I can answer the question from a technical perspective, hopefully Robin will be able to. Had a quick chat with Robin, and he's recommending going for #1 too. But why do we care about 32-bit kernels on a platform which is new enough to have a CSF-GPU (and by extension a recent 64-bit CPU)? Apparently the memory you save by switching to a 32-bit kernel matters to some people. To clarify, the CPU is aarch64, but they want to use it in 32-bit mode. Given the other limitations present in a 32-bit kernel I'd be tempted to say '1' just for simplicity. Especially since apparently we've lived with this for panfrost which presumably has the same limitation (even though all Bifrost/Midgard GPUs have at least 33 bits of VA space). Well, Panfrost is simpler in that you don't have this kernel VA range, and, IIRC, we are using the old format that naturally limits the GPU VA space to 4G. FWIW the legacy pagetable format itself should be fine going up to however many bits the GPU supports, however there were various ISA limitations around crossing 4GB boundaries, and the easiest way to avoid having to think about those was to just not use more than 4GB of VA at all (minus chunks at the ends for similar weird ISA reasons). Cheers, Robin.
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On Mon, 4 Sep 2023 16:22:19 +0100 Steven Price wrote: > On 04/09/2023 10:26, Boris Brezillon wrote: > > On Mon, 4 Sep 2023 08:42:08 +0100 > > Steven Price wrote: > > > >> On 01/09/2023 17:10, Boris Brezillon wrote: > >>> On Wed, 9 Aug 2023 18:53:15 +0200 > >>> Boris Brezillon wrote: > >>> > +/** > + * DOC: MMIO regions exposed to userspace. > + * > + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET > + * > + * File offset for all MMIO regions being exposed to userspace. Don't > use > + * this value directly, use DRM_PANTHOR_USER__OFFSET values > instead. > + * > + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET > + * > + * File offset for the LATEST_FLUSH_ID register. The Userspace driver > controls > + * GPU cache flushling through CS instructions, but the flush reduction > + * mechanism requires a flush_id. This flush_id could be queried with an > + * ioctl, but Arm provides a well-isolated register page containing > only this > + * read-only register, so let's expose this page through a static mmap > offset > + * and allow direct mapping of this MMIO region so we can avoid the > + * user <-> kernel round-trip. > + */ > +#define DRM_PANTHOR_USER_MMIO_OFFSET(0x1ull << 56) > >>> > >>> I'm playing with a 32-bit kernel/userspace, and this is problematic, > >>> because vm_pgoff is limited to 32-bit there, meaning we can only map up > >>> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let > >>> userspace set the mmio range? > >> > >> Hmm, I was rather hoping we could ignore 32 bit these days ;) But while > >> I can't see why anyone would be running a 32 bit kernel, I guess 32 bit > >> user space is likely to still be needed. > > > > Uh, I just hit a new problem with 32-bit kernels: the io-pgtable > > interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning > > the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any > > chance you could advise me on what to do here? > > > > 1. assume this limitation is here for a good reason, and limit the GPU > > VA space to 32-bits on 32-bit kernels > > > > or > > > > 2. update the interface to make iova an u64 > > I'm not sure I can answer the question from a technical perspective, > hopefully Robin will be able to. Had a quick chat with Robin, and he's recommending going for #1 too. > > But why do we care about 32-bit kernels on a platform which is new > enough to have a CSF-GPU (and by extension a recent 64-bit CPU)? Apparently the memory you save by switching to a 32-bit kernel matters to some people. To clarify, the CPU is aarch64, but they want to use it in 32-bit mode. > > Given the other limitations present in a 32-bit kernel I'd be tempted to > say '1' just for simplicity. Especially since apparently we've lived > with this for panfrost which presumably has the same limitation (even > though all Bifrost/Midgard GPUs have at least 33 bits of VA space). Well, Panfrost is simpler in that you don't have this kernel VA range, and, IIRC, we are using the old format that naturally limits the GPU VA space to 4G.
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 2023-08-09 17:53, Boris Brezillon wrote: [...] +/** + * struct drm_panthor_vm_create - Arguments passed to DRM_PANTHOR_IOCTL_VM_CREATE + */ +struct drm_panthor_vm_create { + /** @flags: VM flags, MBZ. */ + __u32 flags; + + /** @id: Returned VM ID. */ + __u32 id; + + /** +* @kernel_va_range: Size of the VA space reserved for kernel objects. +* +* If kernel_va_range is zero, we pick half of the VA space for kernel objects. +* +* Kernel VA space is always placed at the top of the supported VA range. +*/ + __u64 kernel_va_range; Off the back of the "IOVA as unsigned long" concern, Boris and I reasoned through the 64-bit vs. 32-bit vs. compat cases on IRC, and it seems like this kernel_va_range argument is a source of much of the pain. Rather than have userspace specify a quantity which it shouldn't care about and depend on assumptions of kernel behaviour to infer the quantity which *is* relevant (i.e. how large the usable range of the VM will actually be), I think it would be considerably more logical for userspace to simply request the size of usable VM it actually wants. Then it would be straightforward and consistent to define the default value in terms of the minimum of half the GPU VA size or TASK_SIZE (the latter being the largest *meaningful* value in all 3 cases), and it's still easy enough for the kernel to deduce for itself whether there's a reasonable amount of space left between the requested limit and ULONG_MAX for it to use. 32-bit kernels should then get at least 1GB to play with, for compat the kernel BOs can get well out of the way into the >32-bit range, and it's only really 64-bit where userspace is liable to see "kernel" VA space impinging on usable process VAs. Even then we're not sure that's a significant concern beyond OpenCL SVM. Thanks, Robin.
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 04/09/2023 10:26, Boris Brezillon wrote: > On Mon, 4 Sep 2023 08:42:08 +0100 > Steven Price wrote: > >> On 01/09/2023 17:10, Boris Brezillon wrote: >>> On Wed, 9 Aug 2023 18:53:15 +0200 >>> Boris Brezillon wrote: >>> +/** + * DOC: MMIO regions exposed to userspace. + * + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET + * + * File offset for all MMIO regions being exposed to userspace. Don't use + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. + * + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET + * + * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls + * GPU cache flushling through CS instructions, but the flush reduction + * mechanism requires a flush_id. This flush_id could be queried with an + * ioctl, but Arm provides a well-isolated register page containing only this + * read-only register, so let's expose this page through a static mmap offset + * and allow direct mapping of this MMIO region so we can avoid the + * user <-> kernel round-trip. + */ +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) >>> >>> I'm playing with a 32-bit kernel/userspace, and this is problematic, >>> because vm_pgoff is limited to 32-bit there, meaning we can only map up >>> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let >>> userspace set the mmio range? >> >> Hmm, I was rather hoping we could ignore 32 bit these days ;) But while >> I can't see why anyone would be running a 32 bit kernel, I guess 32 bit >> user space is likely to still be needed. > > Uh, I just hit a new problem with 32-bit kernels: the io-pgtable > interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning > the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any > chance you could advise me on what to do here? > > 1. assume this limitation is here for a good reason, and limit the GPU > VA space to 32-bits on 32-bit kernels > > or > > 2. update the interface to make iova an u64 I'm not sure I can answer the question from a technical perspective, hopefully Robin will be able to. But why do we care about 32-bit kernels on a platform which is new enough to have a CSF-GPU (and by extension a recent 64-bit CPU)? Given the other limitations present in a 32-bit kernel I'd be tempted to say '1' just for simplicity. Especially since apparently we've lived with this for panfrost which presumably has the same limitation (even though all Bifrost/Midgard GPUs have at least 33 bits of VA space). Steve
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On Mon, 4 Sep 2023 08:42:08 +0100 Steven Price wrote: > On 01/09/2023 17:10, Boris Brezillon wrote: > > On Wed, 9 Aug 2023 18:53:15 +0200 > > Boris Brezillon wrote: > > > >> +/** > >> + * DOC: MMIO regions exposed to userspace. > >> + * > >> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET > >> + * > >> + * File offset for all MMIO regions being exposed to userspace. Don't use > >> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. > >> + * > >> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET > >> + * > >> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver > >> controls > >> + * GPU cache flushling through CS instructions, but the flush reduction > >> + * mechanism requires a flush_id. This flush_id could be queried with an > >> + * ioctl, but Arm provides a well-isolated register page containing only > >> this > >> + * read-only register, so let's expose this page through a static mmap > >> offset > >> + * and allow direct mapping of this MMIO region so we can avoid the > >> + * user <-> kernel round-trip. > >> + */ > >> +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) > > > > I'm playing with a 32-bit kernel/userspace, and this is problematic, > > because vm_pgoff is limited to 32-bit there, meaning we can only map up > > to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let > > userspace set the mmio range? > > Hmm, I was rather hoping we could ignore 32 bit these days ;) But while > I can't see why anyone would be running a 32 bit kernel, I guess 32 bit > user space is likely to still be needed. Uh, I just hit a new problem with 32-bit kernels: the io-pgtable interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any chance you could advise me on what to do here? 1. assume this limitation is here for a good reason, and limit the GPU VA space to 32-bits on 32-bit kernels or 2. update the interface to make iova an u64
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On Mon, 4 Sep 2023 08:42:08 +0100 Steven Price wrote: > On 01/09/2023 17:10, Boris Brezillon wrote: > > On Wed, 9 Aug 2023 18:53:15 +0200 > > Boris Brezillon wrote: > > > >> +/** > >> + * DOC: MMIO regions exposed to userspace. > >> + * > >> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET > >> + * > >> + * File offset for all MMIO regions being exposed to userspace. Don't use > >> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. > >> + * > >> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET > >> + * > >> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver > >> controls > >> + * GPU cache flushling through CS instructions, but the flush reduction > >> + * mechanism requires a flush_id. This flush_id could be queried with an > >> + * ioctl, but Arm provides a well-isolated register page containing only > >> this > >> + * read-only register, so let's expose this page through a static mmap > >> offset > >> + * and allow direct mapping of this MMIO region so we can avoid the > >> + * user <-> kernel round-trip. > >> + */ > >> +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) > > > > I'm playing with a 32-bit kernel/userspace, and this is problematic, > > because vm_pgoff is limited to 32-bit there, meaning we can only map up > > to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let > > userspace set the mmio range? > > Hmm, I was rather hoping we could ignore 32 bit these days ;) But while > I can't see why anyone would be running a 32 bit kernel, I guess 32 bit > user space is likely to still be needed. Well, I can tell you some people are using 32-bit kernels ;-). > > I can't really think of anything better than letting user space set the > MMIO range. Having an ioctl which returned a special fd just for MMIO > would be one option (which would preserve the full 44 bit GPU VA) but > seems somewhat overkill. Yeah, I don't think I like the separate-fd approach. Just feels like it goes against the DRM-way of doing things. And, with 32-bit userspace, we'd be limited by the CPU VA range anyway. Of course it's orthogonal to the max file offset, and just because we can't map all buffers at once, doesn't mean we don't want to be able to address more than 4G of memory. But with 43-bit left (I think I'd prefer if we enforce a log2 value for the mmio offset/size, meaning that the max MMIO range would be 1ull << 43), that means we're still able to address 8TB of memory. I guess that's more than enough for 32-bit users... > Hiding the mmap within an ioctl would of course > be bad as it breaks tools like Valgrind. Don't like this idea either. > > Oh and please do make it a range - user space submission will be adding > to the MMIO range ;) Yeah, that was the plan (I keep usermode submission in the back of my mind ;-)).
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 01/09/2023 17:10, Boris Brezillon wrote: > On Wed, 9 Aug 2023 18:53:15 +0200 > Boris Brezillon wrote: > >> +/** >> + * DOC: MMIO regions exposed to userspace. >> + * >> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET >> + * >> + * File offset for all MMIO regions being exposed to userspace. Don't use >> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. >> + * >> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET >> + * >> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver >> controls >> + * GPU cache flushling through CS instructions, but the flush reduction >> + * mechanism requires a flush_id. This flush_id could be queried with an >> + * ioctl, but Arm provides a well-isolated register page containing only >> this >> + * read-only register, so let's expose this page through a static mmap >> offset >> + * and allow direct mapping of this MMIO region so we can avoid the >> + * user <-> kernel round-trip. >> + */ >> +#define DRM_PANTHOR_USER_MMIO_OFFSET(0x1ull << 56) > > I'm playing with a 32-bit kernel/userspace, and this is problematic, > because vm_pgoff is limited to 32-bit there, meaning we can only map up > to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let > userspace set the mmio range? Hmm, I was rather hoping we could ignore 32 bit these days ;) But while I can't see why anyone would be running a 32 bit kernel, I guess 32 bit user space is likely to still be needed. I can't really think of anything better than letting user space set the MMIO range. Having an ioctl which returned a special fd just for MMIO would be one option (which would preserve the full 44 bit GPU VA) but seems somewhat overkill. Hiding the mmap within an ioctl would of course be bad as it breaks tools like Valgrind. Oh and please do make it a range - user space submission will be adding to the MMIO range ;) Steve >> +#define DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET >> (DRM_PANTHOR_USER_MMIO_OFFSET | 0)
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On Wed, 9 Aug 2023 18:53:15 +0200 Boris Brezillon wrote: > +/** > + * DOC: MMIO regions exposed to userspace. > + * > + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET > + * > + * File offset for all MMIO regions being exposed to userspace. Don't use > + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. > + * > + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET > + * > + * File offset for the LATEST_FLUSH_ID register. The Userspace driver > controls > + * GPU cache flushling through CS instructions, but the flush reduction > + * mechanism requires a flush_id. This flush_id could be queried with an > + * ioctl, but Arm provides a well-isolated register page containing only this > + * read-only register, so let's expose this page through a static mmap offset > + * and allow direct mapping of this MMIO region so we can avoid the > + * user <-> kernel round-trip. > + */ > +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56) I'm playing with a 32-bit kernel/userspace, and this is problematic, because vm_pgoff is limited to 32-bit there, meaning we can only map up to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let userspace set the mmio range? > +#define DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET > (DRM_PANTHOR_USER_MMIO_OFFSET | 0)
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
Hi Boris, On Wed, Aug 09, 2023 at 06:53:15PM +0200, Boris Brezillon wrote: > Panthor follows the lead of other recently submitted drivers with > ioctls allowing us to support modern Vulkan features, like sparse memory > binding: > > - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET), > with the 'exclusive-VM' bit to speed-up BO reservation on job submission > - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND > ioctl is loosely based on the Xe model, and can handle both > asynchronous and synchronous requests > - GPU execution context creation/destruction, tiler heap context creation > and job submission. Those ioctls reflect how the hardware/scheduler > works and are thus driver specific. > > We also have a way to expose IO regions, such that the usermode driver > can directly access specific/well-isolate registers, like the > LATEST_FLUSH register used to implement cache-flush reduction. > > This uAPI intentionally keeps usermode queues out of the scope, which > explains why doorbell registers and command stream ring-buffers are not > directly exposed to userspace. > > v2: > - Rename the driver (pancsf -> panthor) > - Change the license (GPL2 -> MIT + GPL2) > - Split the driver addition commit > - Turn the VM_{MAP,UNMAP} ioctls into a VM_BIND ioctl > - Add the concept of exclusive_vm at BO creation time > - Add missing padding fields > - Add documentation > > Signed-off-by: Boris Brezillon Minor fixes in addition to what Steve has alread flagged. > --- > Documentation/gpu/driver-uapi.rst | 5 + > include/uapi/drm/panthor_drm.h| 862 ++ > 2 files changed, 867 insertions(+) > create mode 100644 include/uapi/drm/panthor_drm.h > > diff --git a/Documentation/gpu/driver-uapi.rst > b/Documentation/gpu/driver-uapi.rst > index c08bcbb95fb3..7a667901830f 100644 > --- a/Documentation/gpu/driver-uapi.rst > +++ b/Documentation/gpu/driver-uapi.rst > @@ -17,3 +17,8 @@ VM_BIND / EXEC uAPI > :doc: Overview > > .. kernel-doc:: include/uapi/drm/nouveau_drm.h > + > +drm/panthor uAPI > + > + > +.. kernel-doc:: include/uapi/drm/panthor_drm.h > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > new file mode 100644 > index ..e217eb5ad198 > --- /dev/null > +++ b/include/uapi/drm/panthor_drm.h > @@ -0,0 +1,862 @@ > +/* SPDX-License-Identifier: MIT */ > +/* Copyright (C) 2023 Collabora ltd. */ > +#ifndef _PANTHOR_DRM_H_ > +#define _PANTHOR_DRM_H_ > + > +#include "drm.h" > + > +#if defined(__cplusplus) > +extern "C" { > +#endif > + > +/** > + * DOC: Introduction > + * > + * This documentation decribes the Panthor IOCTLs. > + * > + * Just a few generic rules about the data passed to the Panthor IOCTLs: > + * > + * - Structures must be aligned on 64-bit/8-byte. If the object is not > + * naturally aligned, a padding field must be added. > + * - Fields must be explicity aligned to their natural type alignment with > + * pad[0..N] fields. > + * - All padding fields will be checked by the driver to make sure they are > + * zeroed. > + * - Flags can be added, but not removed/replaced. > + * - New fields can be added to the main structures (the structures > + * directly passed to the ioctl). Those fiels can be added at the end of > + * the structure, or replace existing padding fields. Any new field being > + * added must preserve the behavior that existed before those fields were > + * added when a value of zero is passed. > + * - New fields can be added to indirect objects (objects pointed by the > + * main structure), iff those objects are passed a size to reflect the > + * size known by the userspace driver (see drm_panthor_obj_array::stride > + * or drm_panthor_dev_query::size). > + * - If the kernel driver is too old to know some fields, those will > + * be ignored (input) and set back to zero (output). > + * - If userspace is too old to know some fields, those will be zeroed > + * (input) before the structure is parsed by the kernel driver. > + * - Each new flag/field addition must come with a driver version update so > + * the userspace driver doesn't have to trial and error to know which > + * flags are supported. > + * - Structures should not contain unions, as this would defeat the > + * extensibility of such structures. > + * - IOCTLs can't be removed or replaced. New IOCTL IDs should be placed > + * at the end of the drm_panthor_ioctl_id enum. > + */ > + > +/** > + * DOC: MMIO regions exposed to userspace. > + * > + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET > + * > + * File offset for all MMIO regions being exposed to userspace. Don't use > + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. > + * > + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET > + * > + * File offset for the LATEST_FLUSH_ID register. The Userspace driver > controls > + * GPU cache flushling through CS instructions, but the flush
Re: [PATCH v2 02/15] drm/panthor: Add uAPI
On 09/08/2023 17:53, Boris Brezillon wrote: > Panthor follows the lead of other recently submitted drivers with > ioctls allowing us to support modern Vulkan features, like sparse memory > binding: > > - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET), > with the 'exclusive-VM' bit to speed-up BO reservation on job submission > - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND > ioctl is loosely based on the Xe model, and can handle both > asynchronous and synchronous requests > - GPU execution context creation/destruction, tiler heap context creation > and job submission. Those ioctls reflect how the hardware/scheduler > works and are thus driver specific. > > We also have a way to expose IO regions, such that the usermode driver > can directly access specific/well-isolate registers, like the > LATEST_FLUSH register used to implement cache-flush reduction. > > This uAPI intentionally keeps usermode queues out of the scope, which > explains why doorbell registers and command stream ring-buffers are not > directly exposed to userspace. > > v2: > - Rename the driver (pancsf -> panthor) > - Change the license (GPL2 -> MIT + GPL2) > - Split the driver addition commit > - Turn the VM_{MAP,UNMAP} ioctls into a VM_BIND ioctl > - Add the concept of exclusive_vm at BO creation time > - Add missing padding fields > - Add documentation > > Signed-off-by: Boris Brezillon Looks good, just documentation typos/corrections below. With those fixed Reviewed-by: Steven Price > --- > Documentation/gpu/driver-uapi.rst | 5 + > include/uapi/drm/panthor_drm.h| 862 ++ > 2 files changed, 867 insertions(+) > create mode 100644 include/uapi/drm/panthor_drm.h > > diff --git a/Documentation/gpu/driver-uapi.rst > b/Documentation/gpu/driver-uapi.rst > index c08bcbb95fb3..7a667901830f 100644 > --- a/Documentation/gpu/driver-uapi.rst > +++ b/Documentation/gpu/driver-uapi.rst > @@ -17,3 +17,8 @@ VM_BIND / EXEC uAPI > :doc: Overview > > .. kernel-doc:: include/uapi/drm/nouveau_drm.h > + > +drm/panthor uAPI > + > + > +.. kernel-doc:: include/uapi/drm/panthor_drm.h > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > new file mode 100644 > index ..e217eb5ad198 > --- /dev/null > +++ b/include/uapi/drm/panthor_drm.h > @@ -0,0 +1,862 @@ > +/* SPDX-License-Identifier: MIT */ > +/* Copyright (C) 2023 Collabora ltd. */ > +#ifndef _PANTHOR_DRM_H_ > +#define _PANTHOR_DRM_H_ > + > +#include "drm.h" > + > +#if defined(__cplusplus) > +extern "C" { > +#endif > + > +/** > + * DOC: Introduction > + * > + * This documentation decribes the Panthor IOCTLs. describes > + * > + * Just a few generic rules about the data passed to the Panthor IOCTLs: > + * > + * - Structures must be aligned on 64-bit/8-byte. If the object is not > + * naturally aligned, a padding field must be added. > + * - Fields must be explicity aligned to their natural type alignment with ^ explicitly > + * pad[0..N] fields. > + * - All padding fields will be checked by the driver to make sure they are > + * zeroed. > + * - Flags can be added, but not removed/replaced. > + * - New fields can be added to the main structures (the structures > + * directly passed to the ioctl). Those fiels can be added at the end of ^ fields > + * the structure, or replace existing padding fields. Any new field being > + * added must preserve the behavior that existed before those fields were > + * added when a value of zero is passed. > + * - New fields can be added to indirect objects (objects pointed by the > + * main structure), iff those objects are passed a size to reflect the > + * size known by the userspace driver (see drm_panthor_obj_array::stride > + * or drm_panthor_dev_query::size). > + * - If the kernel driver is too old to know some fields, those will > + * be ignored (input) and set back to zero (output). I presume this should be "will be ignored if zero (input)" and rejected if non-zero? > + * - If userspace is too old to know some fields, those will be zeroed > + * (input) before the structure is parsed by the kernel driver. > + * - Each new flag/field addition must come with a driver version update so > + * the userspace driver doesn't have to trial and error to know which > + * flags are supported. > + * - Structures should not contain unions, as this would defeat the > + * extensibility of such structures. > + * - IOCTLs can't be removed or replaced. New IOCTL IDs should be placed > + * at the end of the drm_panthor_ioctl_id enum. > + */ > + > +/** > + * DOC: MMIO regions exposed to userspace. > + * > + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET > + * > + * File offset for all MMIO regions being exposed to userspace. Don't use > + * this value directly, use
[PATCH v2 02/15] drm/panthor: Add uAPI
Panthor follows the lead of other recently submitted drivers with ioctls allowing us to support modern Vulkan features, like sparse memory binding: - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET), with the 'exclusive-VM' bit to speed-up BO reservation on job submission - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND ioctl is loosely based on the Xe model, and can handle both asynchronous and synchronous requests - GPU execution context creation/destruction, tiler heap context creation and job submission. Those ioctls reflect how the hardware/scheduler works and are thus driver specific. We also have a way to expose IO regions, such that the usermode driver can directly access specific/well-isolate registers, like the LATEST_FLUSH register used to implement cache-flush reduction. This uAPI intentionally keeps usermode queues out of the scope, which explains why doorbell registers and command stream ring-buffers are not directly exposed to userspace. v2: - Rename the driver (pancsf -> panthor) - Change the license (GPL2 -> MIT + GPL2) - Split the driver addition commit - Turn the VM_{MAP,UNMAP} ioctls into a VM_BIND ioctl - Add the concept of exclusive_vm at BO creation time - Add missing padding fields - Add documentation Signed-off-by: Boris Brezillon --- Documentation/gpu/driver-uapi.rst | 5 + include/uapi/drm/panthor_drm.h| 862 ++ 2 files changed, 867 insertions(+) create mode 100644 include/uapi/drm/panthor_drm.h diff --git a/Documentation/gpu/driver-uapi.rst b/Documentation/gpu/driver-uapi.rst index c08bcbb95fb3..7a667901830f 100644 --- a/Documentation/gpu/driver-uapi.rst +++ b/Documentation/gpu/driver-uapi.rst @@ -17,3 +17,8 @@ VM_BIND / EXEC uAPI :doc: Overview .. kernel-doc:: include/uapi/drm/nouveau_drm.h + +drm/panthor uAPI + + +.. kernel-doc:: include/uapi/drm/panthor_drm.h diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h new file mode 100644 index ..e217eb5ad198 --- /dev/null +++ b/include/uapi/drm/panthor_drm.h @@ -0,0 +1,862 @@ +/* SPDX-License-Identifier: MIT */ +/* Copyright (C) 2023 Collabora ltd. */ +#ifndef _PANTHOR_DRM_H_ +#define _PANTHOR_DRM_H_ + +#include "drm.h" + +#if defined(__cplusplus) +extern "C" { +#endif + +/** + * DOC: Introduction + * + * This documentation decribes the Panthor IOCTLs. + * + * Just a few generic rules about the data passed to the Panthor IOCTLs: + * + * - Structures must be aligned on 64-bit/8-byte. If the object is not + * naturally aligned, a padding field must be added. + * - Fields must be explicity aligned to their natural type alignment with + * pad[0..N] fields. + * - All padding fields will be checked by the driver to make sure they are + * zeroed. + * - Flags can be added, but not removed/replaced. + * - New fields can be added to the main structures (the structures + * directly passed to the ioctl). Those fiels can be added at the end of + * the structure, or replace existing padding fields. Any new field being + * added must preserve the behavior that existed before those fields were + * added when a value of zero is passed. + * - New fields can be added to indirect objects (objects pointed by the + * main structure), iff those objects are passed a size to reflect the + * size known by the userspace driver (see drm_panthor_obj_array::stride + * or drm_panthor_dev_query::size). + * - If the kernel driver is too old to know some fields, those will + * be ignored (input) and set back to zero (output). + * - If userspace is too old to know some fields, those will be zeroed + * (input) before the structure is parsed by the kernel driver. + * - Each new flag/field addition must come with a driver version update so + * the userspace driver doesn't have to trial and error to know which + * flags are supported. + * - Structures should not contain unions, as this would defeat the + * extensibility of such structures. + * - IOCTLs can't be removed or replaced. New IOCTL IDs should be placed + * at the end of the drm_panthor_ioctl_id enum. + */ + +/** + * DOC: MMIO regions exposed to userspace. + * + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET + * + * File offset for all MMIO regions being exposed to userspace. Don't use + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead. + * + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET + * + * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls + * GPU cache flushling through CS instructions, but the flush reduction + * mechanism requires a flush_id. This flush_id could be queried with an + * ioctl, but Arm provides a well-isolated register page containing only this + * read-only register, so let's expose this page through a static mmap offset + * and allow direct mapping of this MMIO region so we can avoid the + * user <-> kernel round-trip. + */ +#define