Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-06 Thread Ketil Johnsen

On 8/9/23 18:53, Boris Brezillon wrote:


+enum drm_panthor_sync_op_flags {
+   /** @DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK: Synchronization handle type 
mask. */
+   DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK = 0xff,
+
+   /** @DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_SYNCOBJ: Synchronization object 
type. */
+   DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_SYNCOBJ = 0,
+
+   /**
+* @DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ: Timeline 
synchronization
+* object type.
+*/
+   DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ = 1,
+
+   /** @DRM_PANTHOR_SYNC_OP_WAIT: Wait operation. */
+   DRM_PANTHOR_SYNC_OP_WAIT = 0 << 31,
+
+   /** @DRM_PANTHOR_SYNC_OP_SIGNAL: Signal operation. */
+   DRM_PANTHOR_SYNC_OP_SIGNAL = 1 << 31,
+};


We get an issue with --pedantic here:

warning: enumerator value for 'DRM_PANTHOR_SYNC_OP_SIGNAL' is not an 
integer constant expression [-Wpedantic]


Would be god to get rid of this, so user space can include this header 
without disabling pedantic. Either we can stop using the top most bit or 
a cast value like "(int)(1U << 31)"



+   /**
+* @DRM_PANTHOR_VM_BIND_OP_TYPE_MASK: Mask used to determine the type 
of operation.
+*/
+   DRM_PANTHOR_VM_BIND_OP_TYPE_MASK = 0xf << 28,


Same issue for this member. Either not use the top most bit or cast 
value like "(int)(0xfU << 28)" avoids the pedantic warning.


--
Regards,
Ketil Johnsen


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-06 Thread Steven Price
On 04/09/2023 17:25, Robin Murphy wrote:
> On 2023-09-04 17:16, Boris Brezillon wrote:
>> On Mon, 4 Sep 2023 16:22:19 +0100
>> Steven Price  wrote:
>>
>>> On 04/09/2023 10:26, Boris Brezillon wrote:
 On Mon, 4 Sep 2023 08:42:08 +0100
 Steven Price  wrote:
   
> On 01/09/2023 17:10, Boris Brezillon wrote:
>> On Wed,  9 Aug 2023 18:53:15 +0200
>> Boris Brezillon  wrote:
>> 
>>> +/**
>>> + * DOC: MMIO regions exposed to userspace.
>>> + *
>>> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
>>> + *
>>> + * File offset for all MMIO regions being exposed to userspace.
>>> Don't use
>>> + * this value directly, use DRM_PANTHOR_USER__OFFSET
>>> values instead.
>>> + *
>>> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
>>> + *
>>> + * File offset for the LATEST_FLUSH_ID register. The Userspace
>>> driver controls
>>> + * GPU cache flushling through CS instructions, but the flush
>>> reduction
>>> + * mechanism requires a flush_id. This flush_id could be queried
>>> with an
>>> + * ioctl, but Arm provides a well-isolated register page
>>> containing only this
>>> + * read-only register, so let's expose this page through a
>>> static mmap offset
>>> + * and allow direct mapping of this MMIO region so we can avoid the
>>> + * user <-> kernel round-trip.
>>> + */
>>> +#define DRM_PANTHOR_USER_MMIO_OFFSET    (0x1ull << 56)
>>
>> I'm playing with a 32-bit kernel/userspace, and this is problematic,
>> because vm_pgoff is limited to 32-bit there, meaning we can only
>> map up
>> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
>> userspace set the mmio range?
>
> Hmm, I was rather hoping we could ignore 32 bit these days ;) But
> while
> I can't see why anyone would be running a 32 bit kernel, I guess 32
> bit
> user space is likely to still be needed.

 Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
 interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
 the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
 chance you could advise me on what to do here?

 1. assume this limitation is here for a good reason, and limit the GPU
 VA space to 32-bits on 32-bit kernels

 or

 2. update the interface to make iova an u64
>>>
>>> I'm not sure I can answer the question from a technical perspective,
>>> hopefully Robin will be able to.
>>
>> Had a quick chat with Robin, and he's recommending going for #1 too.
>>
>>>
>>> But why do we care about 32-bit kernels on a platform which is new
>>> enough to have a CSF-GPU (and by extension a recent 64-bit CPU)?
>>
>> Apparently the memory you save by switching to a 32-bit kernel matters
>> to some people. To clarify, the CPU is aarch64, but they want to use it
>> in 32-bit mode.
>>
>>>
>>> Given the other limitations present in a 32-bit kernel I'd be tempted to
>>> say '1' just for simplicity. Especially since apparently we've lived
>>> with this for panfrost which presumably has the same limitation (even
>>> though all Bifrost/Midgard GPUs have at least 33 bits of VA space).
>>
>> Well, Panfrost is simpler in that you don't have this kernel VA range,
>> and, IIRC, we are using the old format that naturally limits the GPU VA
>> space to 4G.
> 
> FWIW the legacy pagetable format itself should be fine going up to
> however many bits the GPU supports, however there were various ISA
> limitations around crossing 4GB boundaries, and the easiest way to avoid
> having to think about those was to just not use more than 4GB of VA at
> all (minus chunks at the ends for similar weird ISA reasons).

Exactly right. The legacy pagetable format supports the full range of
VA. Indeed kbase used the legacy format for a long time.

However the ISA has special handling for addresses where bits 31:12 are
all 0 or all 1, so we have to avoid executable code landing on these
regions. kbase has a modified version of 'unmapped_area_topdown'[1] to
handle these additional restrictions.

Steve

[1] kbase_unmapped_area_topdown()
https://android.googlesource.com/kernel/google-modules/gpu/+/refs/tags/android-12.0.0_r0.42/mali_kbase/thirdparty/mali_kbase_mmap.c#97


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Robin Murphy

On 2023-09-04 17:16, Boris Brezillon wrote:

On Mon, 4 Sep 2023 16:22:19 +0100
Steven Price  wrote:


On 04/09/2023 10:26, Boris Brezillon wrote:

On Mon, 4 Sep 2023 08:42:08 +0100
Steven Price  wrote:
   

On 01/09/2023 17:10, Boris Brezillon wrote:

On Wed,  9 Aug 2023 18:53:15 +0200
Boris Brezillon  wrote:
 

+/**
+ * DOC: MMIO regions exposed to userspace.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
+ *
+ * File offset for all MMIO regions being exposed to userspace. Don't use
+ * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
+ *
+ * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls
+ * GPU cache flushling through CS instructions, but the flush reduction
+ * mechanism requires a flush_id. This flush_id could be queried with an
+ * ioctl, but Arm provides a well-isolated register page containing only this
+ * read-only register, so let's expose this page through a static mmap offset
+ * and allow direct mapping of this MMIO region so we can avoid the
+ * user <-> kernel round-trip.
+ */
+#define DRM_PANTHOR_USER_MMIO_OFFSET   (0x1ull << 56)


I'm playing with a 32-bit kernel/userspace, and this is problematic,
because vm_pgoff is limited to 32-bit there, meaning we can only map up
to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
userspace set the mmio range?


Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
user space is likely to still be needed.


Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
chance you could advise me on what to do here?

1. assume this limitation is here for a good reason, and limit the GPU
VA space to 32-bits on 32-bit kernels

or

2. update the interface to make iova an u64


I'm not sure I can answer the question from a technical perspective,
hopefully Robin will be able to.


Had a quick chat with Robin, and he's recommending going for #1 too.



But why do we care about 32-bit kernels on a platform which is new
enough to have a CSF-GPU (and by extension a recent 64-bit CPU)?


Apparently the memory you save by switching to a 32-bit kernel matters
to some people. To clarify, the CPU is aarch64, but they want to use it
in 32-bit mode.



Given the other limitations present in a 32-bit kernel I'd be tempted to
say '1' just for simplicity. Especially since apparently we've lived
with this for panfrost which presumably has the same limitation (even
though all Bifrost/Midgard GPUs have at least 33 bits of VA space).


Well, Panfrost is simpler in that you don't have this kernel VA range,
and, IIRC, we are using the old format that naturally limits the GPU VA
space to 4G.


FWIW the legacy pagetable format itself should be fine going up to 
however many bits the GPU supports, however there were various ISA 
limitations around crossing 4GB boundaries, and the easiest way to avoid 
having to think about those was to just not use more than 4GB of VA at 
all (minus chunks at the ends for similar weird ISA reasons).


Cheers,
Robin.


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Boris Brezillon
On Mon, 4 Sep 2023 16:22:19 +0100
Steven Price  wrote:

> On 04/09/2023 10:26, Boris Brezillon wrote:
> > On Mon, 4 Sep 2023 08:42:08 +0100
> > Steven Price  wrote:
> >   
> >> On 01/09/2023 17:10, Boris Brezillon wrote:  
> >>> On Wed,  9 Aug 2023 18:53:15 +0200
> >>> Boris Brezillon  wrote:
> >>> 
>  +/**
>  + * DOC: MMIO regions exposed to userspace.
>  + *
>  + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
>  + *
>  + * File offset for all MMIO regions being exposed to userspace. Don't 
>  use
>  + * this value directly, use DRM_PANTHOR_USER__OFFSET values 
>  instead.
>  + *
>  + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
>  + *
>  + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
>  controls
>  + * GPU cache flushling through CS instructions, but the flush reduction
>  + * mechanism requires a flush_id. This flush_id could be queried with an
>  + * ioctl, but Arm provides a well-isolated register page containing 
>  only this
>  + * read-only register, so let's expose this page through a static mmap 
>  offset
>  + * and allow direct mapping of this MMIO region so we can avoid the
>  + * user <-> kernel round-trip.
>  + */
>  +#define DRM_PANTHOR_USER_MMIO_OFFSET(0x1ull << 56)
> >>>
> >>> I'm playing with a 32-bit kernel/userspace, and this is problematic,
> >>> because vm_pgoff is limited to 32-bit there, meaning we can only map up
> >>> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
> >>> userspace set the mmio range?
> >>
> >> Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
> >> I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
> >> user space is likely to still be needed.  
> > 
> > Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
> > interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
> > the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
> > chance you could advise me on what to do here?
> > 
> > 1. assume this limitation is here for a good reason, and limit the GPU
> > VA space to 32-bits on 32-bit kernels
> > 
> > or
> > 
> > 2. update the interface to make iova an u64  
> 
> I'm not sure I can answer the question from a technical perspective,
> hopefully Robin will be able to.

Had a quick chat with Robin, and he's recommending going for #1 too.

> 
> But why do we care about 32-bit kernels on a platform which is new
> enough to have a CSF-GPU (and by extension a recent 64-bit CPU)?

Apparently the memory you save by switching to a 32-bit kernel matters
to some people. To clarify, the CPU is aarch64, but they want to use it
in 32-bit mode.

> 
> Given the other limitations present in a 32-bit kernel I'd be tempted to
> say '1' just for simplicity. Especially since apparently we've lived
> with this for panfrost which presumably has the same limitation (even
> though all Bifrost/Midgard GPUs have at least 33 bits of VA space).

Well, Panfrost is simpler in that you don't have this kernel VA range,
and, IIRC, we are using the old format that naturally limits the GPU VA
space to 4G.


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Robin Murphy

On 2023-08-09 17:53, Boris Brezillon wrote:
[...]

+/**
+ * struct drm_panthor_vm_create - Arguments passed to 
DRM_PANTHOR_IOCTL_VM_CREATE
+ */
+struct drm_panthor_vm_create {
+   /** @flags: VM flags, MBZ. */
+   __u32 flags;
+
+   /** @id: Returned VM ID. */
+   __u32 id;
+
+   /**
+* @kernel_va_range: Size of the VA space reserved for kernel objects.
+*
+* If kernel_va_range is zero, we pick half of the VA space for kernel 
objects.
+*
+* Kernel VA space is always placed at the top of the supported VA 
range.
+*/
+   __u64 kernel_va_range;


Off the back of the "IOVA as unsigned long" concern, Boris and I 
reasoned through the 64-bit vs. 32-bit vs. compat cases on IRC, and it 
seems like this kernel_va_range argument is a source of much of the pain.


Rather than have userspace specify a quantity which it shouldn't care 
about and depend on assumptions of kernel behaviour to infer the 
quantity which *is* relevant (i.e. how large the usable range of the VM 
will actually be), I think it would be considerably more logical for 
userspace to simply request the size of usable VM it actually wants. 
Then it would be straightforward and consistent to define the default 
value in terms of the minimum of half the GPU VA size or TASK_SIZE (the 
latter being the largest *meaningful* value in all 3 cases), and it's 
still easy enough for the kernel to deduce for itself whether there's a 
reasonable amount of space left between the requested limit and 
ULONG_MAX for it to use. 32-bit kernels should then get at least 1GB to 
play with, for compat the kernel BOs can get well out of the way into 
the >32-bit range, and it's only really 64-bit where userspace is liable 
to see "kernel" VA space impinging on usable process VAs. Even then 
we're not sure that's a significant concern beyond OpenCL SVM.


Thanks,
Robin.


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Steven Price
On 04/09/2023 10:26, Boris Brezillon wrote:
> On Mon, 4 Sep 2023 08:42:08 +0100
> Steven Price  wrote:
> 
>> On 01/09/2023 17:10, Boris Brezillon wrote:
>>> On Wed,  9 Aug 2023 18:53:15 +0200
>>> Boris Brezillon  wrote:
>>>   
 +/**
 + * DOC: MMIO regions exposed to userspace.
 + *
 + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
 + *
 + * File offset for all MMIO regions being exposed to userspace. Don't use
 + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
 + *
 + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
 + *
 + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
 controls
 + * GPU cache flushling through CS instructions, but the flush reduction
 + * mechanism requires a flush_id. This flush_id could be queried with an
 + * ioctl, but Arm provides a well-isolated register page containing only 
 this
 + * read-only register, so let's expose this page through a static mmap 
 offset
 + * and allow direct mapping of this MMIO region so we can avoid the
 + * user <-> kernel round-trip.
 + */
 +#define DRM_PANTHOR_USER_MMIO_OFFSET  (0x1ull << 56)  
>>>
>>> I'm playing with a 32-bit kernel/userspace, and this is problematic,
>>> because vm_pgoff is limited to 32-bit there, meaning we can only map up
>>> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
>>> userspace set the mmio range?  
>>
>> Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
>> I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
>> user space is likely to still be needed.
> 
> Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
> interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
> the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
> chance you could advise me on what to do here?
> 
> 1. assume this limitation is here for a good reason, and limit the GPU
> VA space to 32-bits on 32-bit kernels
> 
> or
> 
> 2. update the interface to make iova an u64

I'm not sure I can answer the question from a technical perspective,
hopefully Robin will be able to.

But why do we care about 32-bit kernels on a platform which is new
enough to have a CSF-GPU (and by extension a recent 64-bit CPU)?

Given the other limitations present in a 32-bit kernel I'd be tempted to
say '1' just for simplicity. Especially since apparently we've lived
with this for panfrost which presumably has the same limitation (even
though all Bifrost/Midgard GPUs have at least 33 bits of VA space).

Steve



Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Boris Brezillon
On Mon, 4 Sep 2023 08:42:08 +0100
Steven Price  wrote:

> On 01/09/2023 17:10, Boris Brezillon wrote:
> > On Wed,  9 Aug 2023 18:53:15 +0200
> > Boris Brezillon  wrote:
> >   
> >> +/**
> >> + * DOC: MMIO regions exposed to userspace.
> >> + *
> >> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
> >> + *
> >> + * File offset for all MMIO regions being exposed to userspace. Don't use
> >> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
> >> + *
> >> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
> >> + *
> >> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
> >> controls
> >> + * GPU cache flushling through CS instructions, but the flush reduction
> >> + * mechanism requires a flush_id. This flush_id could be queried with an
> >> + * ioctl, but Arm provides a well-isolated register page containing only 
> >> this
> >> + * read-only register, so let's expose this page through a static mmap 
> >> offset
> >> + * and allow direct mapping of this MMIO region so we can avoid the
> >> + * user <-> kernel round-trip.
> >> + */
> >> +#define DRM_PANTHOR_USER_MMIO_OFFSET  (0x1ull << 56)  
> > 
> > I'm playing with a 32-bit kernel/userspace, and this is problematic,
> > because vm_pgoff is limited to 32-bit there, meaning we can only map up
> > to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
> > userspace set the mmio range?  
> 
> Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
> I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
> user space is likely to still be needed.

Uh, I just hit a new problem with 32-bit kernels: the io-pgtable
interface (io_pgtable_ops) passes device VAs as unsigned longs, meaning
the GPU VA space is limited to 4G on a 32-bit build :-(. Robin, any
chance you could advise me on what to do here?

1. assume this limitation is here for a good reason, and limit the GPU
VA space to 32-bits on 32-bit kernels

or

2. update the interface to make iova an u64


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Boris Brezillon
On Mon, 4 Sep 2023 08:42:08 +0100
Steven Price  wrote:

> On 01/09/2023 17:10, Boris Brezillon wrote:
> > On Wed,  9 Aug 2023 18:53:15 +0200
> > Boris Brezillon  wrote:
> >   
> >> +/**
> >> + * DOC: MMIO regions exposed to userspace.
> >> + *
> >> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
> >> + *
> >> + * File offset for all MMIO regions being exposed to userspace. Don't use
> >> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
> >> + *
> >> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
> >> + *
> >> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
> >> controls
> >> + * GPU cache flushling through CS instructions, but the flush reduction
> >> + * mechanism requires a flush_id. This flush_id could be queried with an
> >> + * ioctl, but Arm provides a well-isolated register page containing only 
> >> this
> >> + * read-only register, so let's expose this page through a static mmap 
> >> offset
> >> + * and allow direct mapping of this MMIO region so we can avoid the
> >> + * user <-> kernel round-trip.
> >> + */
> >> +#define DRM_PANTHOR_USER_MMIO_OFFSET  (0x1ull << 56)  
> > 
> > I'm playing with a 32-bit kernel/userspace, and this is problematic,
> > because vm_pgoff is limited to 32-bit there, meaning we can only map up
> > to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
> > userspace set the mmio range?  
> 
> Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
> I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
> user space is likely to still be needed.

Well, I can tell you some people are using 32-bit kernels ;-).

> 
> I can't really think of anything better than letting user space set the
> MMIO range. Having an ioctl which returned a special fd just for MMIO
> would be one option (which would preserve the full 44 bit GPU VA) but
> seems somewhat overkill.

Yeah, I don't think I like the separate-fd approach. Just feels like it
goes against the DRM-way of doing things. And, with 32-bit userspace,
we'd be limited by the CPU VA range anyway. Of course it's orthogonal
to the max file offset, and just because we can't map all buffers at
once, doesn't mean we don't want to be able to address more than 4G of
memory. But with 43-bit left (I think I'd prefer if we enforce a log2
value for the mmio offset/size, meaning that the max MMIO range would be
1ull << 43), that means we're still able to address 8TB of memory. I
guess that's more than enough for 32-bit users...

> Hiding the mmap within an ioctl would of course
> be bad as it breaks tools like Valgrind.

Don't like this idea either.

> 
> Oh and please do make it a range - user space submission will be adding
> to the MMIO range ;)

Yeah, that was the plan (I keep usermode submission in the back of my
mind ;-)).


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-04 Thread Steven Price
On 01/09/2023 17:10, Boris Brezillon wrote:
> On Wed,  9 Aug 2023 18:53:15 +0200
> Boris Brezillon  wrote:
> 
>> +/**
>> + * DOC: MMIO regions exposed to userspace.
>> + *
>> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
>> + *
>> + * File offset for all MMIO regions being exposed to userspace. Don't use
>> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
>> + *
>> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
>> + *
>> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
>> controls
>> + * GPU cache flushling through CS instructions, but the flush reduction
>> + * mechanism requires a flush_id. This flush_id could be queried with an
>> + * ioctl, but Arm provides a well-isolated register page containing only 
>> this
>> + * read-only register, so let's expose this page through a static mmap 
>> offset
>> + * and allow direct mapping of this MMIO region so we can avoid the
>> + * user <-> kernel round-trip.
>> + */
>> +#define DRM_PANTHOR_USER_MMIO_OFFSET(0x1ull << 56)
> 
> I'm playing with a 32-bit kernel/userspace, and this is problematic,
> because vm_pgoff is limited to 32-bit there, meaning we can only map up
> to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
> userspace set the mmio range?

Hmm, I was rather hoping we could ignore 32 bit these days ;) But while
I can't see why anyone would be running a 32 bit kernel, I guess 32 bit
user space is likely to still be needed.

I can't really think of anything better than letting user space set the
MMIO range. Having an ioctl which returned a special fd just for MMIO
would be one option (which would preserve the full 44 bit GPU VA) but
seems somewhat overkill. Hiding the mmap within an ioctl would of course
be bad as it breaks tools like Valgrind.

Oh and please do make it a range - user space submission will be adding
to the MMIO range ;)

Steve

>> +#define DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET   
>> (DRM_PANTHOR_USER_MMIO_OFFSET | 0)



Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-01 Thread Boris Brezillon
On Wed,  9 Aug 2023 18:53:15 +0200
Boris Brezillon  wrote:

> +/**
> + * DOC: MMIO regions exposed to userspace.
> + *
> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
> + *
> + * File offset for all MMIO regions being exposed to userspace. Don't use
> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
> + *
> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
> + *
> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
> controls
> + * GPU cache flushling through CS instructions, but the flush reduction
> + * mechanism requires a flush_id. This flush_id could be queried with an
> + * ioctl, but Arm provides a well-isolated register page containing only this
> + * read-only register, so let's expose this page through a static mmap offset
> + * and allow direct mapping of this MMIO region so we can avoid the
> + * user <-> kernel round-trip.
> + */
> +#define DRM_PANTHOR_USER_MMIO_OFFSET (0x1ull << 56)

I'm playing with a 32-bit kernel/userspace, and this is problematic,
because vm_pgoff is limited to 32-bit there, meaning we can only map up
to (1ull << (PAGE_SHIFT + 32)) - 1. Should we add a DEV_QUERY to let
userspace set the mmio range?

> +#define DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
> (DRM_PANTHOR_USER_MMIO_OFFSET | 0)


Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-09-01 Thread Liviu Dudau
Hi Boris,

On Wed, Aug 09, 2023 at 06:53:15PM +0200, Boris Brezillon wrote:
> Panthor follows the lead of other recently submitted drivers with
> ioctls allowing us to support modern Vulkan features, like sparse memory
> binding:
> 
> - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET),
>   with the 'exclusive-VM' bit to speed-up BO reservation on job submission
> - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND
>   ioctl is loosely based on the Xe model, and can handle both
>   asynchronous and synchronous requests
> - GPU execution context creation/destruction, tiler heap context creation
>   and job submission. Those ioctls reflect how the hardware/scheduler
>   works and are thus driver specific.
> 
> We also have a way to expose IO regions, such that the usermode driver
> can directly access specific/well-isolate registers, like the
> LATEST_FLUSH register used to implement cache-flush reduction.
> 
> This uAPI intentionally keeps usermode queues out of the scope, which
> explains why doorbell registers and command stream ring-buffers are not
> directly exposed to userspace.
> 
> v2:
> - Rename the driver (pancsf -> panthor)
> - Change the license (GPL2 -> MIT + GPL2)
> - Split the driver addition commit
> - Turn the VM_{MAP,UNMAP} ioctls into a VM_BIND ioctl
> - Add the concept of exclusive_vm at BO creation time
> - Add missing padding fields
> - Add documentation
> 
> Signed-off-by: Boris Brezillon 

Minor fixes in addition to what Steve has alread flagged.

> ---
>  Documentation/gpu/driver-uapi.rst |   5 +
>  include/uapi/drm/panthor_drm.h| 862 ++
>  2 files changed, 867 insertions(+)
>  create mode 100644 include/uapi/drm/panthor_drm.h
> 
> diff --git a/Documentation/gpu/driver-uapi.rst 
> b/Documentation/gpu/driver-uapi.rst
> index c08bcbb95fb3..7a667901830f 100644
> --- a/Documentation/gpu/driver-uapi.rst
> +++ b/Documentation/gpu/driver-uapi.rst
> @@ -17,3 +17,8 @@ VM_BIND / EXEC uAPI
>  :doc: Overview
>  
>  .. kernel-doc:: include/uapi/drm/nouveau_drm.h
> +
> +drm/panthor uAPI
> +
> +
> +.. kernel-doc:: include/uapi/drm/panthor_drm.h
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> new file mode 100644
> index ..e217eb5ad198
> --- /dev/null
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -0,0 +1,862 @@
> +/* SPDX-License-Identifier: MIT */
> +/* Copyright (C) 2023 Collabora ltd. */
> +#ifndef _PANTHOR_DRM_H_
> +#define _PANTHOR_DRM_H_
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +/**
> + * DOC: Introduction
> + *
> + * This documentation decribes the Panthor IOCTLs.
> + *
> + * Just a few generic rules about the data passed to the Panthor IOCTLs:
> + *
> + * - Structures must be aligned on 64-bit/8-byte. If the object is not
> + *   naturally aligned, a padding field must be added.
> + * - Fields must be explicity aligned to their natural type alignment with
> + *   pad[0..N] fields.
> + * - All padding fields will be checked by the driver to make sure they are
> + *   zeroed.
> + * - Flags can be added, but not removed/replaced.
> + * - New fields can be added to the main structures (the structures
> + *   directly passed to the ioctl). Those fiels can be added at the end of
> + *   the structure, or replace existing padding fields. Any new field being
> + *   added must preserve the behavior that existed before those fields were
> + *   added when a value of zero is passed.
> + * - New fields can be added to indirect objects (objects pointed by the
> + *   main structure), iff those objects are passed a size to reflect the
> + *   size known by the userspace driver (see drm_panthor_obj_array::stride
> + *   or drm_panthor_dev_query::size).
> + * - If the kernel driver is too old to know some fields, those will
> + *   be ignored (input) and set back to zero (output).
> + * - If userspace is too old to know some fields, those will be zeroed
> + *   (input) before the structure is parsed by the kernel driver.
> + * - Each new flag/field addition must come with a driver version update so
> + *   the userspace driver doesn't have to trial and error to know which
> + *   flags are supported.
> + * - Structures should not contain unions, as this would defeat the
> + *   extensibility of such structures.
> + * - IOCTLs can't be removed or replaced. New IOCTL IDs should be placed
> + *   at the end of the drm_panthor_ioctl_id enum.
> + */
> +
> +/**
> + * DOC: MMIO regions exposed to userspace.
> + *
> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
> + *
> + * File offset for all MMIO regions being exposed to userspace. Don't use
> + * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
> + *
> + * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
> + *
> + * File offset for the LATEST_FLUSH_ID register. The Userspace driver 
> controls
> + * GPU cache flushling through CS instructions, but the flush 

Re: [PATCH v2 02/15] drm/panthor: Add uAPI

2023-08-11 Thread Steven Price
On 09/08/2023 17:53, Boris Brezillon wrote:
> Panthor follows the lead of other recently submitted drivers with
> ioctls allowing us to support modern Vulkan features, like sparse memory
> binding:
> 
> - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET),
>   with the 'exclusive-VM' bit to speed-up BO reservation on job submission
> - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND
>   ioctl is loosely based on the Xe model, and can handle both
>   asynchronous and synchronous requests
> - GPU execution context creation/destruction, tiler heap context creation
>   and job submission. Those ioctls reflect how the hardware/scheduler
>   works and are thus driver specific.
> 
> We also have a way to expose IO regions, such that the usermode driver
> can directly access specific/well-isolate registers, like the
> LATEST_FLUSH register used to implement cache-flush reduction.
> 
> This uAPI intentionally keeps usermode queues out of the scope, which
> explains why doorbell registers and command stream ring-buffers are not
> directly exposed to userspace.
> 
> v2:
> - Rename the driver (pancsf -> panthor)
> - Change the license (GPL2 -> MIT + GPL2)
> - Split the driver addition commit
> - Turn the VM_{MAP,UNMAP} ioctls into a VM_BIND ioctl
> - Add the concept of exclusive_vm at BO creation time
> - Add missing padding fields
> - Add documentation
> 
> Signed-off-by: Boris Brezillon 

Looks good, just documentation typos/corrections below. With those fixed

Reviewed-by: Steven Price 

> ---
>  Documentation/gpu/driver-uapi.rst |   5 +
>  include/uapi/drm/panthor_drm.h| 862 ++
>  2 files changed, 867 insertions(+)
>  create mode 100644 include/uapi/drm/panthor_drm.h
> 
> diff --git a/Documentation/gpu/driver-uapi.rst 
> b/Documentation/gpu/driver-uapi.rst
> index c08bcbb95fb3..7a667901830f 100644
> --- a/Documentation/gpu/driver-uapi.rst
> +++ b/Documentation/gpu/driver-uapi.rst
> @@ -17,3 +17,8 @@ VM_BIND / EXEC uAPI
>  :doc: Overview
>  
>  .. kernel-doc:: include/uapi/drm/nouveau_drm.h
> +
> +drm/panthor uAPI
> +
> +
> +.. kernel-doc:: include/uapi/drm/panthor_drm.h
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> new file mode 100644
> index ..e217eb5ad198
> --- /dev/null
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -0,0 +1,862 @@
> +/* SPDX-License-Identifier: MIT */
> +/* Copyright (C) 2023 Collabora ltd. */
> +#ifndef _PANTHOR_DRM_H_
> +#define _PANTHOR_DRM_H_
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +/**
> + * DOC: Introduction
> + *
> + * This documentation decribes the Panthor IOCTLs.
  describes
> + *
> + * Just a few generic rules about the data passed to the Panthor IOCTLs:
> + *
> + * - Structures must be aligned on 64-bit/8-byte. If the object is not
> + *   naturally aligned, a padding field must be added.
> + * - Fields must be explicity aligned to their natural type alignment with
   ^ explicitly

> + *   pad[0..N] fields.
> + * - All padding fields will be checked by the driver to make sure they are
> + *   zeroed.
> + * - Flags can be added, but not removed/replaced.
> + * - New fields can be added to the main structures (the structures
> + *   directly passed to the ioctl). Those fiels can be added at the end of
 ^ fields

> + *   the structure, or replace existing padding fields. Any new field being
> + *   added must preserve the behavior that existed before those fields were
> + *   added when a value of zero is passed.
> + * - New fields can be added to indirect objects (objects pointed by the
> + *   main structure), iff those objects are passed a size to reflect the
> + *   size known by the userspace driver (see drm_panthor_obj_array::stride
> + *   or drm_panthor_dev_query::size).
> + * - If the kernel driver is too old to know some fields, those will
> + *   be ignored (input) and set back to zero (output).

I presume this should be "will be ignored if zero (input)" and rejected
if non-zero?

> + * - If userspace is too old to know some fields, those will be zeroed
> + *   (input) before the structure is parsed by the kernel driver.
> + * - Each new flag/field addition must come with a driver version update so
> + *   the userspace driver doesn't have to trial and error to know which
> + *   flags are supported.
> + * - Structures should not contain unions, as this would defeat the
> + *   extensibility of such structures.
> + * - IOCTLs can't be removed or replaced. New IOCTL IDs should be placed
> + *   at the end of the drm_panthor_ioctl_id enum.
> + */
> +
> +/**
> + * DOC: MMIO regions exposed to userspace.
> + *
> + * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
> + *
> + * File offset for all MMIO regions being exposed to userspace. Don't use
> + * this value directly, use 

[PATCH v2 02/15] drm/panthor: Add uAPI

2023-08-09 Thread Boris Brezillon
Panthor follows the lead of other recently submitted drivers with
ioctls allowing us to support modern Vulkan features, like sparse memory
binding:

- Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET),
  with the 'exclusive-VM' bit to speed-up BO reservation on job submission
- VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND
  ioctl is loosely based on the Xe model, and can handle both
  asynchronous and synchronous requests
- GPU execution context creation/destruction, tiler heap context creation
  and job submission. Those ioctls reflect how the hardware/scheduler
  works and are thus driver specific.

We also have a way to expose IO regions, such that the usermode driver
can directly access specific/well-isolate registers, like the
LATEST_FLUSH register used to implement cache-flush reduction.

This uAPI intentionally keeps usermode queues out of the scope, which
explains why doorbell registers and command stream ring-buffers are not
directly exposed to userspace.

v2:
- Rename the driver (pancsf -> panthor)
- Change the license (GPL2 -> MIT + GPL2)
- Split the driver addition commit
- Turn the VM_{MAP,UNMAP} ioctls into a VM_BIND ioctl
- Add the concept of exclusive_vm at BO creation time
- Add missing padding fields
- Add documentation

Signed-off-by: Boris Brezillon 
---
 Documentation/gpu/driver-uapi.rst |   5 +
 include/uapi/drm/panthor_drm.h| 862 ++
 2 files changed, 867 insertions(+)
 create mode 100644 include/uapi/drm/panthor_drm.h

diff --git a/Documentation/gpu/driver-uapi.rst 
b/Documentation/gpu/driver-uapi.rst
index c08bcbb95fb3..7a667901830f 100644
--- a/Documentation/gpu/driver-uapi.rst
+++ b/Documentation/gpu/driver-uapi.rst
@@ -17,3 +17,8 @@ VM_BIND / EXEC uAPI
 :doc: Overview
 
 .. kernel-doc:: include/uapi/drm/nouveau_drm.h
+
+drm/panthor uAPI
+
+
+.. kernel-doc:: include/uapi/drm/panthor_drm.h
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
new file mode 100644
index ..e217eb5ad198
--- /dev/null
+++ b/include/uapi/drm/panthor_drm.h
@@ -0,0 +1,862 @@
+/* SPDX-License-Identifier: MIT */
+/* Copyright (C) 2023 Collabora ltd. */
+#ifndef _PANTHOR_DRM_H_
+#define _PANTHOR_DRM_H_
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+/**
+ * DOC: Introduction
+ *
+ * This documentation decribes the Panthor IOCTLs.
+ *
+ * Just a few generic rules about the data passed to the Panthor IOCTLs:
+ *
+ * - Structures must be aligned on 64-bit/8-byte. If the object is not
+ *   naturally aligned, a padding field must be added.
+ * - Fields must be explicity aligned to their natural type alignment with
+ *   pad[0..N] fields.
+ * - All padding fields will be checked by the driver to make sure they are
+ *   zeroed.
+ * - Flags can be added, but not removed/replaced.
+ * - New fields can be added to the main structures (the structures
+ *   directly passed to the ioctl). Those fiels can be added at the end of
+ *   the structure, or replace existing padding fields. Any new field being
+ *   added must preserve the behavior that existed before those fields were
+ *   added when a value of zero is passed.
+ * - New fields can be added to indirect objects (objects pointed by the
+ *   main structure), iff those objects are passed a size to reflect the
+ *   size known by the userspace driver (see drm_panthor_obj_array::stride
+ *   or drm_panthor_dev_query::size).
+ * - If the kernel driver is too old to know some fields, those will
+ *   be ignored (input) and set back to zero (output).
+ * - If userspace is too old to know some fields, those will be zeroed
+ *   (input) before the structure is parsed by the kernel driver.
+ * - Each new flag/field addition must come with a driver version update so
+ *   the userspace driver doesn't have to trial and error to know which
+ *   flags are supported.
+ * - Structures should not contain unions, as this would defeat the
+ *   extensibility of such structures.
+ * - IOCTLs can't be removed or replaced. New IOCTL IDs should be placed
+ *   at the end of the drm_panthor_ioctl_id enum.
+ */
+
+/**
+ * DOC: MMIO regions exposed to userspace.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_MMIO_OFFSET
+ *
+ * File offset for all MMIO regions being exposed to userspace. Don't use
+ * this value directly, use DRM_PANTHOR_USER__OFFSET values instead.
+ *
+ * .. c:macro:: DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET
+ *
+ * File offset for the LATEST_FLUSH_ID register. The Userspace driver controls
+ * GPU cache flushling through CS instructions, but the flush reduction
+ * mechanism requires a flush_id. This flush_id could be queried with an
+ * ioctl, but Arm provides a well-isolated register page containing only this
+ * read-only register, so let's expose this page through a static mmap offset
+ * and allow direct mapping of this MMIO region so we can avoid the
+ * user <-> kernel round-trip.
+ */
+#define