Re: [Intel-xe] [PATCH v5] Documentation/gpu: Add a VM_BIND async draft document

2023-08-11 Thread Thomas Hellström



On 8/10/23 22:51, Zanoni, Paulo R wrote:

On Thu, 2023-08-10 at 21:49 +0200, Thomas Hellström wrote:

Hi, Paulo.

Thanks for reviewing. I'd like to try give some answers below.

On 8/4/23 23:30, Zanoni, Paulo R wrote:

On Fri, 2023-08-04 at 16:02 -0400, Rodrigo Vivi wrote:

On Wed, Jul 19, 2023 at 07:50:21PM +, Zanoni, Paulo R wrote:

On Sat, 2023-07-15 at 17:45 +0200, Thomas Hellström wrote:

Add a motivation for and description of asynchronous VM_BIND operation

I think I may have missed some other documentation, which would explain
some of my questions below, so please be patient with my
misunderstandings. But here's a review from the POV of a UMD person.



v2:
- Fix typos (Nirmoy Das)
- Improve the description of a memory fence (Oak Zeng)
- Add a reference to the document in the Xe RFC.
- Add pointers to sample uAPI suggestions
v3:
- Address review comments (Danilo Krummrich)
- Formatting fixes
v4:
- Address typos (Francois Dugast)
- Explain why in-fences are not allowed for VM_BIND operations for long-
running workloads (Matthew Brost)
v5:
- More typo- and style fixing
- Further clarify the implications of disallowing in-fences for VM_BIND
operations for long-running workloads (Matthew Brost)

Signed-off-by: Thomas Hellström 
Acked-by: Nirmoy Das 
---
   Documentation/gpu/drm-vm-bind-async.rst | 171 
   Documentation/gpu/rfc/xe.rst|   4 +-
   2 files changed, 173 insertions(+), 2 deletions(-)
   create mode 100644 Documentation/gpu/drm-vm-bind-async.rst

diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
b/Documentation/gpu/drm-vm-bind-async.rst
new file mode 100644
index ..d2b02a38198a
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-async.rst
@@ -0,0 +1,171 @@
+
+Asynchronous VM_BIND
+
+
+Nomenclature:
+=
+
+* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
+
+* ``gpu_vm``: A GPU address space. Typically per process, but can be shared by
+  multiple processes.
+
+* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
+  an IOCTL. The operations include mapping and unmapping system- or
+  VRAM memory.
+
+* ``syncobj``: A container that abstracts synchronization objects. The
+  synchronization objects can be either generic, like dma-fences or
+  driver specific. A syncobj typically indicates the type of the
+  underlying synchronization object.
+
+* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
+  for these before starting.
+
+* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
+  signals these when the bind operation is complete.
+
+* ``memory fence``: A synchronization object, different from a dma-fence.

Since you've mentioned it twice in this document already, for
completeness would you mind also giving a definition for dma-fence in
what it relates/contrasts to the rest of the text?

Maybe worth a link to the dma-fence doc itself?
(somehow making sphinx to point out to driver-api/dma-buf.html#dma-fences)

Will respin and add a link to this. This document indeed assumes the
reader is somewhat familiar with the dma-fence concept.


But the differences are written below Paulo:


+  A memory fence uses the value of a specified memory location to determine
+  signaled status. A memory fence can be awaited and signaled by both
+  the GPU and CPU. Memory fences are sometimes referred to as
+  user-fences, userspace-fences or gpu futexes and do not necessarily obey
+  the dma-fence rule of signaling within a "reasonable amount of time".
+  The kernel should thus avoid waiting for memory fences with locks held.

^


+
+* ``long-running workload``: A workload that may take more than the
+  current stipulated dma-fence maximum signal delay to complete and

Where is this delay defined? Is this the same as the gpuhang timer?

dma-fence defines it in a very "cool" way: "reasonable amount of time":
https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fences

so, in contrast, long-running workload is *anything* above that
"reasonable amount of time".

That answers it but doesn't exactly answer it either. In practice, how
much is that "reasonable amount of time"? This is documentation, it
should avoid using vague statements or hypothetical cases. The
documentation you posted suggested this may be the same as the GPU hang
timeout, but doesn't give a definitive answer (because multiple drivers
may define it differently, apparently). In practice, what do drivers
do? As a user-space developer, how long can I wait before things fail?
Is there a way to figure out, maybe query a parameter somewhere? Which
driver waits the least? Which driver waits the most?  Is 10 seconds
"reasonable amount of time"? My grandma thinks 2 weeks is a reasonable
amount of time when waiting for things.

We can't do much more than point to the dma-fence document here, since
we can't have a vague definition in the main 

Re: [Intel-xe] [PATCH v5] Documentation/gpu: Add a VM_BIND async draft document

2023-08-10 Thread Zanoni, Paulo R
On Thu, 2023-08-10 at 21:49 +0200, Thomas Hellström wrote:
> Hi, Paulo.
> 
> Thanks for reviewing. I'd like to try give some answers below.
> 
> On 8/4/23 23:30, Zanoni, Paulo R wrote:
> > On Fri, 2023-08-04 at 16:02 -0400, Rodrigo Vivi wrote:
> > > On Wed, Jul 19, 2023 at 07:50:21PM +, Zanoni, Paulo R wrote:
> > > > On Sat, 2023-07-15 at 17:45 +0200, Thomas Hellström wrote:
> > > > > Add a motivation for and description of asynchronous VM_BIND operation
> > > > I think I may have missed some other documentation, which would explain
> > > > some of my questions below, so please be patient with my
> > > > misunderstandings. But here's a review from the POV of a UMD person.
> > > > 
> > > > 
> > > > > v2:
> > > > > - Fix typos (Nirmoy Das)
> > > > > - Improve the description of a memory fence (Oak Zeng)
> > > > > - Add a reference to the document in the Xe RFC.
> > > > > - Add pointers to sample uAPI suggestions
> > > > > v3:
> > > > > - Address review comments (Danilo Krummrich)
> > > > > - Formatting fixes
> > > > > v4:
> > > > > - Address typos (Francois Dugast)
> > > > > - Explain why in-fences are not allowed for VM_BIND operations for 
> > > > > long-
> > > > >running workloads (Matthew Brost)
> > > > > v5:
> > > > > - More typo- and style fixing
> > > > > - Further clarify the implications of disallowing in-fences for 
> > > > > VM_BIND
> > > > >operations for long-running workloads (Matthew Brost)
> > > > > 
> > > > > Signed-off-by: Thomas Hellström 
> > > > > Acked-by: Nirmoy Das 
> > > > > ---
> > > > >   Documentation/gpu/drm-vm-bind-async.rst | 171 
> > > > > 
> > > > >   Documentation/gpu/rfc/xe.rst|   4 +-
> > > > >   2 files changed, 173 insertions(+), 2 deletions(-)
> > > > >   create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
> > > > > 
> > > > > diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
> > > > > b/Documentation/gpu/drm-vm-bind-async.rst
> > > > > new file mode 100644
> > > > > index ..d2b02a38198a
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/drm-vm-bind-async.rst
> > > > > @@ -0,0 +1,171 @@
> > > > > +
> > > > > +Asynchronous VM_BIND
> > > > > +
> > > > > +
> > > > > +Nomenclature:
> > > > > +=
> > > > > +
> > > > > +* ``VRAM``: On-device memory. Sometimes referred to as device local 
> > > > > memory.
> > > > > +
> > > > > +* ``gpu_vm``: A GPU address space. Typically per process, but can be 
> > > > > shared by
> > > > > +  multiple processes.
> > > > > +
> > > > > +* ``VM_BIND``: An operation or a list of operations to modify a 
> > > > > gpu_vm using
> > > > > +  an IOCTL. The operations include mapping and unmapping system- or
> > > > > +  VRAM memory.
> > > > > +
> > > > > +* ``syncobj``: A container that abstracts synchronization objects. 
> > > > > The
> > > > > +  synchronization objects can be either generic, like dma-fences or
> > > > > +  driver specific. A syncobj typically indicates the type of the
> > > > > +  underlying synchronization object.
> > > > > +
> > > > > +* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation 
> > > > > waits
> > > > > +  for these before starting.
> > > > > +
> > > > > +* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
> > > > > +  signals these when the bind operation is complete.
> > > > > +
> > > > > +* ``memory fence``: A synchronization object, different from a 
> > > > > dma-fence.
> > > > Since you've mentioned it twice in this document already, for
> > > > completeness would you mind also giving a definition for dma-fence in
> > > > what it relates/contrasts to the rest of the text?
> > > Maybe worth a link to the dma-fence doc itself?
> > > (somehow making sphinx to point out to driver-api/dma-buf.html#dma-fences)
> 
> Will respin and add a link to this. This document indeed assumes the 
> reader is somewhat familiar with the dma-fence concept.
> 
> > > 
> > > But the differences are written below Paulo:
> > > 
> > > > 
> > > > > +  A memory fence uses the value of a specified memory location to 
> > > > > determine
> > > > > +  signaled status. A memory fence can be awaited and signaled by both
> > > > > +  the GPU and CPU. Memory fences are sometimes referred to as
> > > > > +  user-fences, userspace-fences or gpu futexes and do not 
> > > > > necessarily obey
> > > > > +  the dma-fence rule of signaling within a "reasonable amount of 
> > > > > time".
> > > > > +  The kernel should thus avoid waiting for memory fences with locks 
> > > > > held.
> > > ^
> > > 
> > > > > +
> > > > > +* ``long-running workload``: A workload that may take more than the
> > > > > +  current stipulated dma-fence maximum signal delay to complete and
> > > > Where is this delay defined? Is this the same as the gpuhang timer?
> > > dma-fence defines it in a very "cool" way: "reasonable amount of time":
> > > https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fences

Re: [Intel-xe] [PATCH v5] Documentation/gpu: Add a VM_BIND async draft document

2023-08-10 Thread Thomas Hellström

Hi, Paulo.

Thanks for reviewing. I'd like to try give some answers below.

On 8/4/23 23:30, Zanoni, Paulo R wrote:

On Fri, 2023-08-04 at 16:02 -0400, Rodrigo Vivi wrote:

On Wed, Jul 19, 2023 at 07:50:21PM +, Zanoni, Paulo R wrote:

On Sat, 2023-07-15 at 17:45 +0200, Thomas Hellström wrote:

Add a motivation for and description of asynchronous VM_BIND operation

I think I may have missed some other documentation, which would explain
some of my questions below, so please be patient with my
misunderstandings. But here's a review from the POV of a UMD person.



v2:
- Fix typos (Nirmoy Das)
- Improve the description of a memory fence (Oak Zeng)
- Add a reference to the document in the Xe RFC.
- Add pointers to sample uAPI suggestions
v3:
- Address review comments (Danilo Krummrich)
- Formatting fixes
v4:
- Address typos (Francois Dugast)
- Explain why in-fences are not allowed for VM_BIND operations for long-
   running workloads (Matthew Brost)
v5:
- More typo- and style fixing
- Further clarify the implications of disallowing in-fences for VM_BIND
   operations for long-running workloads (Matthew Brost)

Signed-off-by: Thomas Hellström 
Acked-by: Nirmoy Das 
---
  Documentation/gpu/drm-vm-bind-async.rst | 171 
  Documentation/gpu/rfc/xe.rst|   4 +-
  2 files changed, 173 insertions(+), 2 deletions(-)
  create mode 100644 Documentation/gpu/drm-vm-bind-async.rst

diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
b/Documentation/gpu/drm-vm-bind-async.rst
new file mode 100644
index ..d2b02a38198a
--- /dev/null
+++ b/Documentation/gpu/drm-vm-bind-async.rst
@@ -0,0 +1,171 @@
+
+Asynchronous VM_BIND
+
+
+Nomenclature:
+=
+
+* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
+
+* ``gpu_vm``: A GPU address space. Typically per process, but can be shared by
+  multiple processes.
+
+* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
+  an IOCTL. The operations include mapping and unmapping system- or
+  VRAM memory.
+
+* ``syncobj``: A container that abstracts synchronization objects. The
+  synchronization objects can be either generic, like dma-fences or
+  driver specific. A syncobj typically indicates the type of the
+  underlying synchronization object.
+
+* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
+  for these before starting.
+
+* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
+  signals these when the bind operation is complete.
+
+* ``memory fence``: A synchronization object, different from a dma-fence.

Since you've mentioned it twice in this document already, for
completeness would you mind also giving a definition for dma-fence in
what it relates/contrasts to the rest of the text?

Maybe worth a link to the dma-fence doc itself?
(somehow making sphinx to point out to driver-api/dma-buf.html#dma-fences)


Will respin and add a link to this. This document indeed assumes the 
reader is somewhat familiar with the dma-fence concept.




But the differences are written below Paulo:




+  A memory fence uses the value of a specified memory location to determine
+  signaled status. A memory fence can be awaited and signaled by both
+  the GPU and CPU. Memory fences are sometimes referred to as
+  user-fences, userspace-fences or gpu futexes and do not necessarily obey
+  the dma-fence rule of signaling within a "reasonable amount of time".
+  The kernel should thus avoid waiting for memory fences with locks held.

^


+
+* ``long-running workload``: A workload that may take more than the
+  current stipulated dma-fence maximum signal delay to complete and

Where is this delay defined? Is this the same as the gpuhang timer?

dma-fence defines it in a very "cool" way: "reasonable amount of time":
https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fences

so, in contrast, long-running workload is *anything* above that
"reasonable amount of time".

That answers it but doesn't exactly answer it either. In practice, how
much is that "reasonable amount of time"? This is documentation, it
should avoid using vague statements or hypothetical cases. The
documentation you posted suggested this may be the same as the GPU hang
timeout, but doesn't give a definitive answer (because multiple drivers
may define it differently, apparently). In practice, what do drivers
do? As a user-space developer, how long can I wait before things fail?
Is there a way to figure out, maybe query a parameter somewhere? Which
driver waits the least? Which driver waits the most?  Is 10 seconds
"reasonable amount of time"? My grandma thinks 2 weeks is a reasonable
amount of time when waiting for things.


We can't do much more than point to the dma-fence document here, since 
we can't have a vague definition in the main document which is 
overridden in another document describing something completely different.


FWIW a 

Re: [Intel-xe] [PATCH v5] Documentation/gpu: Add a VM_BIND async draft document

2023-08-04 Thread Zanoni, Paulo R
On Fri, 2023-08-04 at 16:02 -0400, Rodrigo Vivi wrote:
> On Wed, Jul 19, 2023 at 07:50:21PM +, Zanoni, Paulo R wrote:
> > On Sat, 2023-07-15 at 17:45 +0200, Thomas Hellström wrote:
> > > Add a motivation for and description of asynchronous VM_BIND operation
> > 
> > I think I may have missed some other documentation, which would explain
> > some of my questions below, so please be patient with my
> > misunderstandings. But here's a review from the POV of a UMD person.
> > 
> > 
> > > 
> > > v2:
> > > - Fix typos (Nirmoy Das)
> > > - Improve the description of a memory fence (Oak Zeng)
> > > - Add a reference to the document in the Xe RFC.
> > > - Add pointers to sample uAPI suggestions
> > > v3:
> > > - Address review comments (Danilo Krummrich)
> > > - Formatting fixes
> > > v4:
> > > - Address typos (Francois Dugast)
> > > - Explain why in-fences are not allowed for VM_BIND operations for long-
> > >   running workloads (Matthew Brost)
> > > v5:
> > > - More typo- and style fixing
> > > - Further clarify the implications of disallowing in-fences for VM_BIND
> > >   operations for long-running workloads (Matthew Brost)
> > > 
> > > Signed-off-by: Thomas Hellström 
> > > Acked-by: Nirmoy Das 
> > > ---
> > >  Documentation/gpu/drm-vm-bind-async.rst | 171 
> > >  Documentation/gpu/rfc/xe.rst|   4 +-
> > >  2 files changed, 173 insertions(+), 2 deletions(-)
> > >  create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
> > > 
> > > diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
> > > b/Documentation/gpu/drm-vm-bind-async.rst
> > > new file mode 100644
> > > index ..d2b02a38198a
> > > --- /dev/null
> > > +++ b/Documentation/gpu/drm-vm-bind-async.rst
> > > @@ -0,0 +1,171 @@
> > > +
> > > +Asynchronous VM_BIND
> > > +
> > > +
> > > +Nomenclature:
> > > +=
> > > +
> > > +* ``VRAM``: On-device memory. Sometimes referred to as device local 
> > > memory.
> > > +
> > > +* ``gpu_vm``: A GPU address space. Typically per process, but can be 
> > > shared by
> > > +  multiple processes.
> > > +
> > > +* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm 
> > > using
> > > +  an IOCTL. The operations include mapping and unmapping system- or
> > > +  VRAM memory.
> > > +
> > > +* ``syncobj``: A container that abstracts synchronization objects. The
> > > +  synchronization objects can be either generic, like dma-fences or
> > > +  driver specific. A syncobj typically indicates the type of the
> > > +  underlying synchronization object.
> > > +
> > > +* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation 
> > > waits
> > > +  for these before starting.
> > > +
> > > +* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
> > > +  signals these when the bind operation is complete.
> > > +
> > > +* ``memory fence``: A synchronization object, different from a dma-fence.
> > 
> > Since you've mentioned it twice in this document already, for
> > completeness would you mind also giving a definition for dma-fence in
> > what it relates/contrasts to the rest of the text?
> 
> Maybe worth a link to the dma-fence doc itself?
> (somehow making sphinx to point out to driver-api/dma-buf.html#dma-fences)
> 
> But the differences are written below Paulo:
> 
> > 
> > 
> > > +  A memory fence uses the value of a specified memory location to 
> > > determine
> > > +  signaled status. A memory fence can be awaited and signaled by both
> > > +  the GPU and CPU. Memory fences are sometimes referred to as
> > > +  user-fences, userspace-fences or gpu futexes and do not necessarily 
> > > obey
> > > +  the dma-fence rule of signaling within a "reasonable amount of time".
> > > +  The kernel should thus avoid waiting for memory fences with locks held.
> 
> ^
> 
> > > +
> > > +* ``long-running workload``: A workload that may take more than the
> > > +  current stipulated dma-fence maximum signal delay to complete and
> > 
> > Where is this delay defined? Is this the same as the gpuhang timer?
> 
> dma-fence defines it in a very "cool" way: "reasonable amount of time":
> https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fences
> 
> so, in contrast, long-running workload is *anything* above that
> "reasonable amount of time".

That answers it but doesn't exactly answer it either. In practice, how
much is that "reasonable amount of time"? This is documentation, it
should avoid using vague statements or hypothetical cases. The
documentation you posted suggested this may be the same as the GPU hang
timeout, but doesn't give a definitive answer (because multiple drivers
may define it differently, apparently). In practice, what do drivers
do? As a user-space developer, how long can I wait before things fail?
Is there a way to figure out, maybe query a parameter somewhere? Which
driver waits the least? Which driver waits the most?  Is 10 seconds
"reasonable amount of time"? My 

Re: [Intel-xe] [PATCH v5] Documentation/gpu: Add a VM_BIND async draft document

2023-08-04 Thread Rodrigo Vivi
On Wed, Jul 19, 2023 at 07:50:21PM +, Zanoni, Paulo R wrote:
> On Sat, 2023-07-15 at 17:45 +0200, Thomas Hellström wrote:
> > Add a motivation for and description of asynchronous VM_BIND operation
> 
> I think I may have missed some other documentation, which would explain
> some of my questions below, so please be patient with my
> misunderstandings. But here's a review from the POV of a UMD person.
> 
> 
> > 
> > v2:
> > - Fix typos (Nirmoy Das)
> > - Improve the description of a memory fence (Oak Zeng)
> > - Add a reference to the document in the Xe RFC.
> > - Add pointers to sample uAPI suggestions
> > v3:
> > - Address review comments (Danilo Krummrich)
> > - Formatting fixes
> > v4:
> > - Address typos (Francois Dugast)
> > - Explain why in-fences are not allowed for VM_BIND operations for long-
> >   running workloads (Matthew Brost)
> > v5:
> > - More typo- and style fixing
> > - Further clarify the implications of disallowing in-fences for VM_BIND
> >   operations for long-running workloads (Matthew Brost)
> > 
> > Signed-off-by: Thomas Hellström 
> > Acked-by: Nirmoy Das 
> > ---
> >  Documentation/gpu/drm-vm-bind-async.rst | 171 
> >  Documentation/gpu/rfc/xe.rst|   4 +-
> >  2 files changed, 173 insertions(+), 2 deletions(-)
> >  create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
> > 
> > diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
> > b/Documentation/gpu/drm-vm-bind-async.rst
> > new file mode 100644
> > index ..d2b02a38198a
> > --- /dev/null
> > +++ b/Documentation/gpu/drm-vm-bind-async.rst
> > @@ -0,0 +1,171 @@
> > +
> > +Asynchronous VM_BIND
> > +
> > +
> > +Nomenclature:
> > +=
> > +
> > +* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
> > +
> > +* ``gpu_vm``: A GPU address space. Typically per process, but can be 
> > shared by
> > +  multiple processes.
> > +
> > +* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm 
> > using
> > +  an IOCTL. The operations include mapping and unmapping system- or
> > +  VRAM memory.
> > +
> > +* ``syncobj``: A container that abstracts synchronization objects. The
> > +  synchronization objects can be either generic, like dma-fences or
> > +  driver specific. A syncobj typically indicates the type of the
> > +  underlying synchronization object.
> > +
> > +* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
> > +  for these before starting.
> > +
> > +* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
> > +  signals these when the bind operation is complete.
> > +
> > +* ``memory fence``: A synchronization object, different from a dma-fence.
> 
> Since you've mentioned it twice in this document already, for
> completeness would you mind also giving a definition for dma-fence in
> what it relates/contrasts to the rest of the text?

Maybe worth a link to the dma-fence doc itself?
(somehow making sphinx to point out to driver-api/dma-buf.html#dma-fences)

But the differences are written below Paulo:

> 
> 
> > +  A memory fence uses the value of a specified memory location to determine
> > +  signaled status. A memory fence can be awaited and signaled by both
> > +  the GPU and CPU. Memory fences are sometimes referred to as
> > +  user-fences, userspace-fences or gpu futexes and do not necessarily obey
> > +  the dma-fence rule of signaling within a "reasonable amount of time".
> > +  The kernel should thus avoid waiting for memory fences with locks held.

^

> > +
> > +* ``long-running workload``: A workload that may take more than the
> > +  current stipulated dma-fence maximum signal delay to complete and
> 
> Where is this delay defined? Is this the same as the gpuhang timer?

dma-fence defines it in a very "cool" way: "reasonable amount of time":
https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#dma-fences

so, in contrast, long-running workload is *anything* above that
"reasonable amount of time".

> 
> 
> > +  which therefore needs to set the gpu_vm or the GPU execution context in
> > +  a certain mode that disallows completion dma-fences.
> > +
> > +* ``exec function``: An exec function is a function that revalidates all
> > +  affected gpu_vmas, submits a GPU command batch and registers the
> > +  dma_fence representing the GPU command's activity with all affected
> > +  dma_resvs. For completeness, although not covered by this document,
> > +  it's worth mentioning that an exec function may also be the
> > +  revalidation worker that is used by some drivers in compute /
> > +  long-running mode.
> > +
> > +* ``bind context``: A context identifier used for the VM_BIND
> > +  operation. VM_BIND operations that use the same bind context can be
> > +  assumed, where it matters, to complete in order of submission. No such
> > +  assumptions can be made for VM_BIND operations using separate bind 
> > contexts.
> > +
> > +* ``UMD``: 

Re: [Intel-xe] [PATCH v5] Documentation/gpu: Add a VM_BIND async draft document

2023-07-19 Thread Zanoni, Paulo R
On Sat, 2023-07-15 at 17:45 +0200, Thomas Hellström wrote:
> Add a motivation for and description of asynchronous VM_BIND operation

I think I may have missed some other documentation, which would explain
some of my questions below, so please be patient with my
misunderstandings. But here's a review from the POV of a UMD person.


> 
> v2:
> - Fix typos (Nirmoy Das)
> - Improve the description of a memory fence (Oak Zeng)
> - Add a reference to the document in the Xe RFC.
> - Add pointers to sample uAPI suggestions
> v3:
> - Address review comments (Danilo Krummrich)
> - Formatting fixes
> v4:
> - Address typos (Francois Dugast)
> - Explain why in-fences are not allowed for VM_BIND operations for long-
>   running workloads (Matthew Brost)
> v5:
> - More typo- and style fixing
> - Further clarify the implications of disallowing in-fences for VM_BIND
>   operations for long-running workloads (Matthew Brost)
> 
> Signed-off-by: Thomas Hellström 
> Acked-by: Nirmoy Das 
> ---
>  Documentation/gpu/drm-vm-bind-async.rst | 171 
>  Documentation/gpu/rfc/xe.rst|   4 +-
>  2 files changed, 173 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
> b/Documentation/gpu/drm-vm-bind-async.rst
> new file mode 100644
> index ..d2b02a38198a
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-async.rst
> @@ -0,0 +1,171 @@
> +
> +Asynchronous VM_BIND
> +
> +
> +Nomenclature:
> +=
> +
> +* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
> +
> +* ``gpu_vm``: A GPU address space. Typically per process, but can be shared 
> by
> +  multiple processes.
> +
> +* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
> +  an IOCTL. The operations include mapping and unmapping system- or
> +  VRAM memory.
> +
> +* ``syncobj``: A container that abstracts synchronization objects. The
> +  synchronization objects can be either generic, like dma-fences or
> +  driver specific. A syncobj typically indicates the type of the
> +  underlying synchronization object.
> +
> +* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
> +  for these before starting.
> +
> +* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
> +  signals these when the bind operation is complete.
> +
> +* ``memory fence``: A synchronization object, different from a dma-fence.

Since you've mentioned it twice in this document already, for
completeness would you mind also giving a definition for dma-fence in
what it relates/contrasts to the rest of the text?


> +  A memory fence uses the value of a specified memory location to determine
> +  signaled status. A memory fence can be awaited and signaled by both
> +  the GPU and CPU. Memory fences are sometimes referred to as
> +  user-fences, userspace-fences or gpu futexes and do not necessarily obey
> +  the dma-fence rule of signaling within a "reasonable amount of time".
> +  The kernel should thus avoid waiting for memory fences with locks held.
> +
> +* ``long-running workload``: A workload that may take more than the
> +  current stipulated dma-fence maximum signal delay to complete and

Where is this delay defined? Is this the same as the gpuhang timer?


> +  which therefore needs to set the gpu_vm or the GPU execution context in
> +  a certain mode that disallows completion dma-fences.
> +
> +* ``exec function``: An exec function is a function that revalidates all
> +  affected gpu_vmas, submits a GPU command batch and registers the
> +  dma_fence representing the GPU command's activity with all affected
> +  dma_resvs. For completeness, although not covered by this document,
> +  it's worth mentioning that an exec function may also be the
> +  revalidation worker that is used by some drivers in compute /
> +  long-running mode.
> +
> +* ``bind context``: A context identifier used for the VM_BIND
> +  operation. VM_BIND operations that use the same bind context can be
> +  assumed, where it matters, to complete in order of submission. No such
> +  assumptions can be made for VM_BIND operations using separate bind 
> contexts.
> +
> +* ``UMD``: User-mode driver.
> +
> +* ``KMD``: Kernel-mode driver.
> +
> +
> +Synchronous / Asynchronous VM_BIND operation
> +
> +
> +Synchronous VM_BIND
> +___
> +With Synchronous VM_BIND, the VM_BIND operations all complete before the
> +IOCTL returns. A synchronous VM_BIND takes neither in-fences nor
> +out-fences. Synchronous VM_BIND may block and wait for GPU operations;
> +for example swap-in or clearing, or even previous binds.
> +
> +Asynchronous VM_BIND
> +
> +Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the
> +IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs
>