Re: [RFC] Host1x/TegraDRM UAPI (drm_tegra_submit_command)

2020-06-30 Thread Mikko Perttunen

On 6/29/20 3:00 AM, Dmitry Osipenko wrote:

28.06.2020 13:28, Mikko Perttunen пишет:

On 6/28/20 1:32 AM, Dmitry Osipenko wrote:

23.06.2020 15:09, Mikko Perttunen пишет:

/* Command is an opcode gather from a GEM handle */
#define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0
/* Command is an opcode gather from a user pointer */
#define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR    1
/* Command is a wait for syncpt fence completion */
#define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCPT    2
/* Command is a wait for SYNC_FILE FD completion */
#define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNC_FILE 3
/* Command is a wait for DRM syncobj completion */
#define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCOBJ   4

/*
   * Allow driver to skip command execution if engine
   * was not accessed by another channel between
   * submissions.
   */
#define DRM_TEGRA_SUBMIT_CONTEXT_SETUP    (1<<0)

struct drm_tegra_submit_command {
  __u16 type;
  __u16 flags;


Shouldn't the "packed" attribute be needed if a non-32bit aligned fields
are used?


I guess we should change these to u32 for consistency.




  union {
  struct {
  /* GEM handle */
  __u32 handle;

  /*
   * Offset into GEM object in bytes.
   * Must be aligned by 4.
   */
  __u64 offset;


64bits for a gather offset is a bit too much, in most cases gathers are
under 4K.

u32 should be more than enough (maybe even u16 if offset is given in a
dword granularity).


I guess this can be changed to u32, though I don't think there is any
particularly pressing reason not to use u64.

In any case, I think we concluded that we'll drop the GEM gather command
for now.


Sure, I commented this just in a case, for the future :)


Yep, OK :)






  /*
   * Length of gather in bytes.
   * Must be aligned by 4.
   */
  __u64 length;


u32/16


  } gather;



  struct {
  __u32 reserved[1];

  /*
   * Pointer to gather data.
   * Must be aligned by 4 bytes.
   */
  __u64 base;
  /*
   * Length of gather in bytes.
   * Must be aligned by 4.
   */
  __u64 length;
  } gather_uptr;


What about to inline the UPTR gather data and relocs into the
drm_tegra_submit_command[] buffer:

struct drm_tegra_submit_command {
 struct {
     u16 num_words;
     u16 num_relocs;

     gather_data[];
     drm_tegra_submit_relocation relocs[];
 } gather_uptr;
};

struct drm_tegra_channel_submit {
  __u32 num_syncpt_incrs;
  __u32 syncpt_idx;

  __u64 commands_ptr;
 __u32 commands_size;
...
};

struct drm_tegra_submit_command example[] = {
 cmd.gather_uptr{},
 ...
 gather_data[],
 gather_relocs[],
 cmd.wait_syncpt{},
 ...
};

This way we will have only a single copy_from_user() for the whole
cmdstream, which should be more efficient to do and nicer from both
userspace and kernel perspectives in regards to forming and parsing the
commands.



I'm not sure I agree it being nicer with regard to forming and parsing
- Can you actually place multiple unsized arrays in a struct without
pointers?


No :) But there are no unsized arrays here. The parser will read first
command and then move pointer to the next command based on size of the
parsed command.


- gather_data's length is a multiple of 4, and so since we have u64's in
drm_tegra_submit_relocation, alignment needs to be taken care of
manually, both when forming and kernel needs to validate it while
parsing. Depending on number of words in the gather, padding would need
to be inserted. We could swap the two around but it still feels more
brittle.


Yes, there will be unaligned reads on ARM64, but I don't think it's a
problem. Isn't it?


It's not a big problem, but I think it would be less error-prone to have 
naturally aligned data.





Also, if we read the gather from a separate address, userspace doesn't
necessarily need to know the length of the gather (and number of relocs)
upfront, so it's easier to have a builder pattern without needing to
copy things later.


Usually there are 2-3 relocs per gather, so userspace could simply
maintain a fixed-sized static buffer for the relocs (say 32 relocs).
Once gather is sliced by userspace, the stashed relocs will be appended
to the comands buffer and next gather will be formed.

The relocs copying doesn't really costs anything for userspace, the
price is incomparable with the cost of UPTR copying of each gather for
the kernel.

Well, the UP

Re: [RFC] Host1x/TegraDRM UAPI (drm_tegra_submit_command)

2020-06-28 Thread Dmitry Osipenko
28.06.2020 13:28, Mikko Perttunen пишет:
> On 6/28/20 1:32 AM, Dmitry Osipenko wrote:
>> 23.06.2020 15:09, Mikko Perttunen пишет:
>>> /* Command is an opcode gather from a GEM handle */
>>> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0
>>> /* Command is an opcode gather from a user pointer */
>>> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR    1
>>> /* Command is a wait for syncpt fence completion */
>>> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCPT    2
>>> /* Command is a wait for SYNC_FILE FD completion */
>>> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNC_FILE 3
>>> /* Command is a wait for DRM syncobj completion */
>>> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCOBJ   4
>>>
>>> /*
>>>   * Allow driver to skip command execution if engine
>>>   * was not accessed by another channel between
>>>   * submissions.
>>>   */
>>> #define DRM_TEGRA_SUBMIT_CONTEXT_SETUP    (1<<0)
>>>
>>> struct drm_tegra_submit_command {
>>>  __u16 type;
>>>  __u16 flags;
>>
>> Shouldn't the "packed" attribute be needed if a non-32bit aligned fields
>> are used?
> 
> I guess we should change these to u32 for consistency.
> 
>>
>>>  union {
>>>  struct {
>>>  /* GEM handle */
>>>  __u32 handle;
>>>
>>>  /*
>>>   * Offset into GEM object in bytes.
>>>   * Must be aligned by 4.
>>>   */
>>>  __u64 offset;
>>
>> 64bits for a gather offset is a bit too much, in most cases gathers are
>> under 4K.
>>
>> u32 should be more than enough (maybe even u16 if offset is given in a
>> dword granularity).
> 
> I guess this can be changed to u32, though I don't think there is any
> particularly pressing reason not to use u64.
> 
> In any case, I think we concluded that we'll drop the GEM gather command
> for now.

Sure, I commented this just in a case, for the future :)

>>
>>>  /*
>>>   * Length of gather in bytes.
>>>   * Must be aligned by 4.
>>>   */
>>>  __u64 length;
>>
>> u32/16
>>
>>>  } gather;
>>
>>>  struct {
>>>  __u32 reserved[1];
>>>
>>>  /*
>>>   * Pointer to gather data.
>>>   * Must be aligned by 4 bytes.
>>>   */
>>>  __u64 base;
>>>  /*
>>>   * Length of gather in bytes.
>>>   * Must be aligned by 4.
>>>   */
>>>  __u64 length;
>>>  } gather_uptr;
>>
>> What about to inline the UPTR gather data and relocs into the
>> drm_tegra_submit_command[] buffer:
>>
>> struct drm_tegra_submit_command {
>> struct {
>>     u16 num_words;
>>     u16 num_relocs;
>>
>>     gather_data[];
>>     drm_tegra_submit_relocation relocs[];
>> } gather_uptr;
>> };
>>
>> struct drm_tegra_channel_submit {
>>  __u32 num_syncpt_incrs;
>>  __u32 syncpt_idx;
>>
>>  __u64 commands_ptr;
>> __u32 commands_size;
>> ...
>> };
>>
>> struct drm_tegra_submit_command example[] = {
>> cmd.gather_uptr{},
>> ...
>> gather_data[],
>> gather_relocs[],
>> cmd.wait_syncpt{},
>> ...
>> };
>>
>> This way we will have only a single copy_from_user() for the whole
>> cmdstream, which should be more efficient to do and nicer from both
>> userspace and kernel perspectives in regards to forming and parsing the
>> commands.
>>
> 
> I'm not sure I agree it being nicer with regard to forming and parsing
> - Can you actually place multiple unsized arrays in a struct without
> pointers?

No :) But there are no unsized arrays here. The parser will read first
command and then move pointer to the next command based on size of the
parsed command.

> - gather_data's length is a multiple of 4, and so since we have u64's in
> drm_tegra_submit_relocation, alignment needs to be taken care of
> manually, both when forming and kernel needs to validate it while
> parsing. Depending on number of words in the gather, padding would need
> to be inserted. We could swap the two around but it still feels more
> brittle.

Yes, there will be unaligned reads on ARM64, but I don't think it's a
problem. Isn't it?

> Also, if we read the gather from a separate address, userspace doesn't
> necessarily need to know the length of the gather (and number of relocs)
> upfront, so it's easier to have a builder pattern without needing to
> copy things later.

Usually there are 2-3 relocs per gather, so userspace could simply
maintain a fixed-sized static buffer for the relocs (say 32 relocs).
Once gather is sliced by userspace, the stashed relocs will be appended
to the comands buffer and next gather will be formed.

The reloc

Re: [RFC] Host1x/TegraDRM UAPI (drm_tegra_submit_command)

2020-06-28 Thread Dmitry Osipenko
23.06.2020 15:09, Mikko Perttunen пишет:
> /* Command is an opcode gather from a GEM handle */
> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0
> /* Command is an opcode gather from a user pointer */
> #define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR    1
> /* Command is a wait for syncpt fence completion */
> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCPT    2
> /* Command is a wait for SYNC_FILE FD completion */
> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNC_FILE 3
> /* Command is a wait for DRM syncobj completion */
> #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCOBJ   4
> 
> /*
>  * Allow driver to skip command execution if engine
>  * was not accessed by another channel between
>  * submissions.
>  */
> #define DRM_TEGRA_SUBMIT_CONTEXT_SETUP    (1<<0)
> 
> struct drm_tegra_submit_command {
>     __u16 type;
>     __u16 flags;

Shouldn't the "packed" attribute be needed if a non-32bit aligned fields
are used?

>     union {
>     struct {
>     /* GEM handle */
>     __u32 handle;
> 
>     /*
>  * Offset into GEM object in bytes.
>  * Must be aligned by 4.
>  */
>     __u64 offset;

64bits for a gather offset is a bit too much, in most cases gathers are
under 4K.

u32 should be more than enough (maybe even u16 if offset is given in a
dword granularity).

>     /*
>  * Length of gather in bytes.
>  * Must be aligned by 4.
>  */
>     __u64 length;

u32/16

>     } gather;

>     struct {
>     __u32 reserved[1];
> 
>     /*
>  * Pointer to gather data.
>  * Must be aligned by 4 bytes.
>  */
>     __u64 base;
>     /*
>  * Length of gather in bytes.
>  * Must be aligned by 4.
>  */
>     __u64 length;
>     } gather_uptr;

What about to inline the UPTR gather data and relocs into the
drm_tegra_submit_command[] buffer:

struct drm_tegra_submit_command {
struct {
u16 num_words;
u16 num_relocs;

gather_data[];
drm_tegra_submit_relocation relocs[];
} gather_uptr;
};

struct drm_tegra_channel_submit {
__u32 num_syncpt_incrs;
__u32 syncpt_idx;

__u64 commands_ptr;
__u32 commands_size;
...
};

struct drm_tegra_submit_command example[] = {
cmd.gather_uptr{},
...
gather_data[],
gather_relocs[],
cmd.wait_syncpt{},
...
};

This way we will have only a single copy_from_user() for the whole
cmdstream, which should be more efficient to do and nicer from both
userspace and kernel perspectives in regards to forming and parsing the
commands.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Host1x/TegraDRM UAPI (drm_tegra_submit_command)

2020-06-28 Thread Mikko Perttunen

On 6/28/20 1:32 AM, Dmitry Osipenko wrote:

23.06.2020 15:09, Mikko Perttunen пишет:

/* Command is an opcode gather from a GEM handle */
#define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0
/* Command is an opcode gather from a user pointer */
#define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR    1
/* Command is a wait for syncpt fence completion */
#define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCPT    2
/* Command is a wait for SYNC_FILE FD completion */
#define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNC_FILE 3
/* Command is a wait for DRM syncobj completion */
#define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCOBJ   4

/*
  * Allow driver to skip command execution if engine
  * was not accessed by another channel between
  * submissions.
  */
#define DRM_TEGRA_SUBMIT_CONTEXT_SETUP    (1<<0)

struct drm_tegra_submit_command {
     __u16 type;
     __u16 flags;


Shouldn't the "packed" attribute be needed if a non-32bit aligned fields
are used?


I guess we should change these to u32 for consistency.




     union {
     struct {
     /* GEM handle */
     __u32 handle;

     /*
  * Offset into GEM object in bytes.
  * Must be aligned by 4.
  */
     __u64 offset;


64bits for a gather offset is a bit too much, in most cases gathers are
under 4K.

u32 should be more than enough (maybe even u16 if offset is given in a
dword granularity).


I guess this can be changed to u32, though I don't think there is any 
particularly pressing reason not to use u64.


In any case, I think we concluded that we'll drop the GEM gather command 
for now.





     /*
  * Length of gather in bytes.
  * Must be aligned by 4.
  */
     __u64 length;


u32/16


     } gather;



     struct {
     __u32 reserved[1];

     /*
  * Pointer to gather data.
  * Must be aligned by 4 bytes.
  */
     __u64 base;
     /*
  * Length of gather in bytes.
  * Must be aligned by 4.
  */
     __u64 length;
     } gather_uptr;


What about to inline the UPTR gather data and relocs into the
drm_tegra_submit_command[] buffer:

struct drm_tegra_submit_command {
struct {
u16 num_words;
u16 num_relocs;

gather_data[];
drm_tegra_submit_relocation relocs[];
} gather_uptr;
};

struct drm_tegra_channel_submit {
 __u32 num_syncpt_incrs;
 __u32 syncpt_idx;

 __u64 commands_ptr;
__u32 commands_size;
...
};

struct drm_tegra_submit_command example[] = {
cmd.gather_uptr{},
...
gather_data[],
gather_relocs[],
cmd.wait_syncpt{},
...
};

This way we will have only a single copy_from_user() for the whole
cmdstream, which should be more efficient to do and nicer from both
userspace and kernel perspectives in regards to forming and parsing the
commands.



I'm not sure I agree it being nicer with regard to forming and parsing
- Can you actually place multiple unsized arrays in a struct without 
pointers?
- gather_data's length is a multiple of 4, and so since we have u64's in 
drm_tegra_submit_relocation, alignment needs to be taken care of 
manually, both when forming and kernel needs to validate it while 
parsing. Depending on number of words in the gather, padding would need 
to be inserted. We could swap the two around but it still feels more 
brittle.


Also, if we read the gather from a separate address, userspace doesn't 
necessarily need to know the length of the gather (and number of relocs) 
upfront, so it's easier to have a builder pattern without needing to 
copy things later.


If we allow direct page mappings for gathers later, a separate address 
would make things also a little bit easier. For direct page mappings 
userspace would need to keep the opcode data unchanged while the job is 
being executed, so userspace could keep an arena where the gathers are 
placed, and refer to segments of that, instead of having to keep the 
drm_tegra_submit_commands alive.


Mikko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel