Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Jason Ekstrand
On Mon, Mar 16, 2020 at 6:39 PM Roman Gilg  wrote:
>
> On Wed, Mar 11, 2020 at 8:21 PM Jason Ekstrand  wrote:
> >
> > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  
> > wrote:
> > >
> > > All,
> > >
> > > Sorry for casting such a broad net with this one. I'm sure most people
> > > who reply will get at least one mailing list rejection.  However, this
> > > is an issue that affects a LOT of components and that's why it's
> > > thorny to begin with.  Please pardon the length of this e-mail as
> > > well; I promise there's a concrete point/proposal at the end.
> > >
> > >
> > > Explicit synchronization is the future of graphics and media.  At
> > > least, that seems to be the consensus among all the graphics people
> > > I've talked to.  I had a chat with one of the lead Android graphics
> > > engineers recently who told me that doing explicit sync from the start
> > > was one of the best engineering decisions Android ever made.  It's
> > > also the direction being taken by more modern APIs such as Vulkan.
> > >
> > >
> > > ## What are implicit and explicit synchronization?
> > >
> > > For those that aren't familiar with this space, GPUs, media encoders,
> > > etc. are massively parallel and synchronization of some form is
> > > required to ensure that everything happens in the right order and
> > > avoid data races.  Implicit synchronization is when bits of work (3D,
> > > compute, video encode, etc.) are implicitly based on the absolute
> > > CPU-time order in which API calls occur.  Explicit synchronization is
> > > when the client (whatever that means in any given context) provides
> > > the dependency graph explicitly via some sort of synchronization
> > > primitives.  If you're still confused, consider the following
> > > examples:
> > >
> > > With OpenGL and EGL, almost everything is implicit sync.  Say you have
> > > two OpenGL contexts sharing an image where one writes to it and the
> > > other textures from it.  The way the OpenGL spec works, the client has
> > > to make the API calls to render to the image before (in CPU time) it
> > > makes the API calls which texture from the image.  As long as it does
> > > this (and maybe inserts a glFlush?), the driver will ensure that the
> > > rendering completes before the texturing happens and you get correct
> > > contents.
> > >
> > > Implicit synchronization can also happen across processes.  Wayland,
> > > for instance, is currently built on implicit sync where the client
> > > does their rendering and then does a hand-off (via wl_surface::commit)
> > > to tell the compositor it's done at which point the compositor can now
> > > texture from the surface.  The hand-off ensures that the client's
> > > OpenGL API calls happen before the server's OpenGL API calls.
> > >
> > > A good example of explicit synchronization is the Vulkan API.  There,
> > > a client (or multiple clients) can simultaneously build command
> > > buffers in different threads where one of those command buffers
> > > renders to an image and the other textures from it and then submit
> > > both of them at the same time with instructions to the driver for
> > > which order to execute them in.  The execution order is described via
> > > the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> > > extension, you can even submit the work which does the texturing
> > > BEFORE the work which does the rendering and the driver will sort it
> > > out.
> > >
> > > The #1 problem with implicit synchronization (which explicit solves)
> > > is that it leads to a lot of over-synchronization both in client space
> > > and in driver/device space.  The client has to synchronize a lot more
> > > because it has to ensure that the API calls happen in a particular
> > > order.  The driver/device have to synchronize a lot more because they
> > > never know what is going to end up being a synchronization point as an
> > > API call on another thread/process may occur at any time.  As we move
> > > to more and more multi-threaded programming this synchronization (on
> > > the client-side especially) becomes more and more painful.
> > >
> > >
> > > ## Current status in Linux
> > >
> > > Implicit synchronization in Linux works via a the kernel's internal
> > > dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> > > which represents the "done" status for some bit of work.  Typically,
> > > dma_fences are created as a by-product of someone submitting some bit
> > > of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> > > set of dma_fences on it representing shared (read) and exclusive
> > > (write) access to the object.  When work is submitted which, for
> > > instance renders to the dma_buf, it's queued waiting on all the fences
> > > on the dma_buf and and a dma_fence is created representing the end of
> > > said rendering work and it's installed as the dma_buf's exclusive
> > > fence.  This way, the kernel can manage all its internal queues (3D
> > > 

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Roman Gilg
On Wed, Mar 11, 2020 at 8:21 PM Jason Ekstrand  wrote:
>
> On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  wrote:
> >
> > All,
> >
> > Sorry for casting such a broad net with this one. I'm sure most people
> > who reply will get at least one mailing list rejection.  However, this
> > is an issue that affects a LOT of components and that's why it's
> > thorny to begin with.  Please pardon the length of this e-mail as
> > well; I promise there's a concrete point/proposal at the end.
> >
> >
> > Explicit synchronization is the future of graphics and media.  At
> > least, that seems to be the consensus among all the graphics people
> > I've talked to.  I had a chat with one of the lead Android graphics
> > engineers recently who told me that doing explicit sync from the start
> > was one of the best engineering decisions Android ever made.  It's
> > also the direction being taken by more modern APIs such as Vulkan.
> >
> >
> > ## What are implicit and explicit synchronization?
> >
> > For those that aren't familiar with this space, GPUs, media encoders,
> > etc. are massively parallel and synchronization of some form is
> > required to ensure that everything happens in the right order and
> > avoid data races.  Implicit synchronization is when bits of work (3D,
> > compute, video encode, etc.) are implicitly based on the absolute
> > CPU-time order in which API calls occur.  Explicit synchronization is
> > when the client (whatever that means in any given context) provides
> > the dependency graph explicitly via some sort of synchronization
> > primitives.  If you're still confused, consider the following
> > examples:
> >
> > With OpenGL and EGL, almost everything is implicit sync.  Say you have
> > two OpenGL contexts sharing an image where one writes to it and the
> > other textures from it.  The way the OpenGL spec works, the client has
> > to make the API calls to render to the image before (in CPU time) it
> > makes the API calls which texture from the image.  As long as it does
> > this (and maybe inserts a glFlush?), the driver will ensure that the
> > rendering completes before the texturing happens and you get correct
> > contents.
> >
> > Implicit synchronization can also happen across processes.  Wayland,
> > for instance, is currently built on implicit sync where the client
> > does their rendering and then does a hand-off (via wl_surface::commit)
> > to tell the compositor it's done at which point the compositor can now
> > texture from the surface.  The hand-off ensures that the client's
> > OpenGL API calls happen before the server's OpenGL API calls.
> >
> > A good example of explicit synchronization is the Vulkan API.  There,
> > a client (or multiple clients) can simultaneously build command
> > buffers in different threads where one of those command buffers
> > renders to an image and the other textures from it and then submit
> > both of them at the same time with instructions to the driver for
> > which order to execute them in.  The execution order is described via
> > the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> > extension, you can even submit the work which does the texturing
> > BEFORE the work which does the rendering and the driver will sort it
> > out.
> >
> > The #1 problem with implicit synchronization (which explicit solves)
> > is that it leads to a lot of over-synchronization both in client space
> > and in driver/device space.  The client has to synchronize a lot more
> > because it has to ensure that the API calls happen in a particular
> > order.  The driver/device have to synchronize a lot more because they
> > never know what is going to end up being a synchronization point as an
> > API call on another thread/process may occur at any time.  As we move
> > to more and more multi-threaded programming this synchronization (on
> > the client-side especially) becomes more and more painful.
> >
> >
> > ## Current status in Linux
> >
> > Implicit synchronization in Linux works via a the kernel's internal
> > dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> > which represents the "done" status for some bit of work.  Typically,
> > dma_fences are created as a by-product of someone submitting some bit
> > of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> > set of dma_fences on it representing shared (read) and exclusive
> > (write) access to the object.  When work is submitted which, for
> > instance renders to the dma_buf, it's queued waiting on all the fences
> > on the dma_buf and and a dma_fence is created representing the end of
> > said rendering work and it's installed as the dma_buf's exclusive
> > fence.  This way, the kernel can manage all its internal queues (3D
> > rendering, display, video encode, etc.) and know which things to
> > submit in what order.
> >
> > For the last few years, we've had sync_file in the kernel and it's
> > plumbed into some drivers.  A sync_file is just a wrapper around a
> 

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Jason Ekstrand
On Mon, Mar 16, 2020 at 4:15 PM Laurent Pinchart
 wrote:
>
> Hi Jason,
>
> On Mon, Mar 16, 2020 at 10:06:07AM -0500, Jason Ekstrand wrote:
> > On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart wrote:
> > > On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > >> (I know I'm going to be spammed by so many mailing list ...)
> > >>
> > >> Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > >>> On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  
> > >>> wrote:
> >  All,
> > 
> >  Sorry for casting such a broad net with this one. I'm sure most people
> >  who reply will get at least one mailing list rejection.  However, this
> >  is an issue that affects a LOT of components and that's why it's
> >  thorny to begin with.  Please pardon the length of this e-mail as
> >  well; I promise there's a concrete point/proposal at the end.
> > 
> > 
> >  Explicit synchronization is the future of graphics and media.  At
> >  least, that seems to be the consensus among all the graphics people
> >  I've talked to.  I had a chat with one of the lead Android graphics
> >  engineers recently who told me that doing explicit sync from the start
> >  was one of the best engineering decisions Android ever made.  It's
> >  also the direction being taken by more modern APIs such as Vulkan.
> > 
> > 
> >  ## What are implicit and explicit synchronization?
> > 
> >  For those that aren't familiar with this space, GPUs, media encoders,
> >  etc. are massively parallel and synchronization of some form is
> >  required to ensure that everything happens in the right order and
> >  avoid data races.  Implicit synchronization is when bits of work (3D,
> >  compute, video encode, etc.) are implicitly based on the absolute
> >  CPU-time order in which API calls occur.  Explicit synchronization is
> >  when the client (whatever that means in any given context) provides
> >  the dependency graph explicitly via some sort of synchronization
> >  primitives.  If you're still confused, consider the following
> >  examples:
> > 
> >  With OpenGL and EGL, almost everything is implicit sync.  Say you have
> >  two OpenGL contexts sharing an image where one writes to it and the
> >  other textures from it.  The way the OpenGL spec works, the client has
> >  to make the API calls to render to the image before (in CPU time) it
> >  makes the API calls which texture from the image.  As long as it does
> >  this (and maybe inserts a glFlush?), the driver will ensure that the
> >  rendering completes before the texturing happens and you get correct
> >  contents.
> > 
> >  Implicit synchronization can also happen across processes.  Wayland,
> >  for instance, is currently built on implicit sync where the client
> >  does their rendering and then does a hand-off (via wl_surface::commit)
> >  to tell the compositor it's done at which point the compositor can now
> >  texture from the surface.  The hand-off ensures that the client's
> >  OpenGL API calls happen before the server's OpenGL API calls.
> > 
> >  A good example of explicit synchronization is the Vulkan API.  There,
> >  a client (or multiple clients) can simultaneously build command
> >  buffers in different threads where one of those command buffers
> >  renders to an image and the other textures from it and then submit
> >  both of them at the same time with instructions to the driver for
> >  which order to execute them in.  The execution order is described via
> >  the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> >  extension, you can even submit the work which does the texturing
> >  BEFORE the work which does the rendering and the driver will sort it
> >  out.
> > 
> >  The #1 problem with implicit synchronization (which explicit solves)
> >  is that it leads to a lot of over-synchronization both in client space
> >  and in driver/device space.  The client has to synchronize a lot more
> >  because it has to ensure that the API calls happen in a particular
> >  order.  The driver/device have to synchronize a lot more because they
> >  never know what is going to end up being a synchronization point as an
> >  API call on another thread/process may occur at any time.  As we move
> >  to more and more multi-threaded programming this synchronization (on
> >  the client-side especially) becomes more and more painful.
> > 
> > 
> >  ## Current status in Linux
> > 
> >  Implicit synchronization in Linux works via a the kernel's internal
> >  dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> >  which represents the "done" status for some bit of work.  Typically,
> >  dma_fences are created as a by-product of someone submitting some bit
> > 

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Laurent Pinchart
Hi Jason,

On Mon, Mar 16, 2020 at 10:06:07AM -0500, Jason Ekstrand wrote:
> On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart wrote:
> > On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> >> (I know I'm going to be spammed by so many mailing list ...)
> >>
> >> Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> >>> On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  
> >>> wrote:
>  All,
> 
>  Sorry for casting such a broad net with this one. I'm sure most people
>  who reply will get at least one mailing list rejection.  However, this
>  is an issue that affects a LOT of components and that's why it's
>  thorny to begin with.  Please pardon the length of this e-mail as
>  well; I promise there's a concrete point/proposal at the end.
> 
> 
>  Explicit synchronization is the future of graphics and media.  At
>  least, that seems to be the consensus among all the graphics people
>  I've talked to.  I had a chat with one of the lead Android graphics
>  engineers recently who told me that doing explicit sync from the start
>  was one of the best engineering decisions Android ever made.  It's
>  also the direction being taken by more modern APIs such as Vulkan.
> 
> 
>  ## What are implicit and explicit synchronization?
> 
>  For those that aren't familiar with this space, GPUs, media encoders,
>  etc. are massively parallel and synchronization of some form is
>  required to ensure that everything happens in the right order and
>  avoid data races.  Implicit synchronization is when bits of work (3D,
>  compute, video encode, etc.) are implicitly based on the absolute
>  CPU-time order in which API calls occur.  Explicit synchronization is
>  when the client (whatever that means in any given context) provides
>  the dependency graph explicitly via some sort of synchronization
>  primitives.  If you're still confused, consider the following
>  examples:
> 
>  With OpenGL and EGL, almost everything is implicit sync.  Say you have
>  two OpenGL contexts sharing an image where one writes to it and the
>  other textures from it.  The way the OpenGL spec works, the client has
>  to make the API calls to render to the image before (in CPU time) it
>  makes the API calls which texture from the image.  As long as it does
>  this (and maybe inserts a glFlush?), the driver will ensure that the
>  rendering completes before the texturing happens and you get correct
>  contents.
> 
>  Implicit synchronization can also happen across processes.  Wayland,
>  for instance, is currently built on implicit sync where the client
>  does their rendering and then does a hand-off (via wl_surface::commit)
>  to tell the compositor it's done at which point the compositor can now
>  texture from the surface.  The hand-off ensures that the client's
>  OpenGL API calls happen before the server's OpenGL API calls.
> 
>  A good example of explicit synchronization is the Vulkan API.  There,
>  a client (or multiple clients) can simultaneously build command
>  buffers in different threads where one of those command buffers
>  renders to an image and the other textures from it and then submit
>  both of them at the same time with instructions to the driver for
>  which order to execute them in.  The execution order is described via
>  the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
>  extension, you can even submit the work which does the texturing
>  BEFORE the work which does the rendering and the driver will sort it
>  out.
> 
>  The #1 problem with implicit synchronization (which explicit solves)
>  is that it leads to a lot of over-synchronization both in client space
>  and in driver/device space.  The client has to synchronize a lot more
>  because it has to ensure that the API calls happen in a particular
>  order.  The driver/device have to synchronize a lot more because they
>  never know what is going to end up being a synchronization point as an
>  API call on another thread/process may occur at any time.  As we move
>  to more and more multi-threaded programming this synchronization (on
>  the client-side especially) becomes more and more painful.
> 
> 
>  ## Current status in Linux
> 
>  Implicit synchronization in Linux works via a the kernel's internal
>  dma_buf and dma_fence data structures.  A dma_fence is a tiny object
>  which represents the "done" status for some bit of work.  Typically,
>  dma_fences are created as a by-product of someone submitting some bit
>  of work (say, 3D rendering) to the kernel.  The dma_buf object has a
>  set of dma_fences on it representing shared (read) and exclusive
>  (write) access to the object.  When work is submitted which, for
>  

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Marek Olšák
On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer  wrote:

> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> > The synchronization works because the Mesa driver waits for idle (drains
> > the GFX pipeline) at the end of command buffers and there is only 1
> > graphics queue, so everything is ordered.
> >
> > The GFX pipeline runs asynchronously to the command buffer, meaning the
> > command buffer only starts draws and doesn't wait for completion. If the
> > Mesa driver didn't wait at the end of the command buffer, the command
> > buffer would finish and a different process could start execution of its
> > own command buffer while shaders of the previous process are still
> running.
> >
> > If the Mesa driver submits a command buffer internally (because it's
> full),
> > it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> > ended and a new one started.
> >
> > The waiting at the end of command buffers happens only when the flush is
> > external (Swap buffers, glFlush).
> >
> > It's a performance problem, because the GFX queue is blocked until the
> GFX
> > pipeline is drained at the end of every frame at least.
> >
> > So explicit fences for SwapBuffers would help.
>
> Not sure what difference it would make, since the same thing needs to be
> done for explicit fences as well, doesn't it?
>

No. Explicit fences don't require userspace to wait for idle in the command
buffer. Fences are signalled when the last draw is complete and caches are
flushed. Before that happens, any command buffer that is not dependent on
the fence can start execution. There is never a need for the GPU to be idle
if there is enough independent work to do.

Marek
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Tomek Bury
> That's not true; you can post back a sync token every time the client
> buffer is used by the compositor.
Technically, yes but it's very cumbersome and invasive to the point
where it becomes impractical. Explicit sync is much cleaner solution.

> For instance, Mesa adds the `wl_drm` extension, which is
> used for bidirectional communication between the EGL implementations
> in the client and compositor address spaces, without modifying either.
Broadcom driver adds "wl_nexus" extension which servers similar
purpose for both EGL and Vulkan WSI

> OK. As it stands, everyone else has the kernel mechanism (e.g. via
> dmabuf resv), so in this case if you are reinventing the underlying
> platform in a proprietary stack, you get to solve the same problems
> yourselves.
That's an important point. In the explicit synchronisation scenario
the sync token is passed with the buffer. It becomes irrelevant where
the token originated from, as long as it's a commonly used type of
token, i.e. dma_fence in kernel space or sync_fd in user space. That
allows for greater flexibility and works with and without dma
reservation objects.

Cheers,
Tomek
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Daniel Stone
Hi,

On Mon, 16 Mar 2020 at 15:33, Tomek Bury  wrote:
> > GL and GLES are not relevant. What is relevant is EGL, which defines
> > interfaces to make things work on the native platform.
> Yes and no. This is what EGL spec says about sharing a texture between 
> contexts:

Contexts are different though ...

> There are similar statements with regards to the lack of
> synchronisation guarantees for EGL images or between GL and native
> rendering, etc.

This also isn't about native rendering.

> But the main thing here is that EGL and Vulkan differ
> significantly.

Sure, I totally agree.

> The eglSwapBuffers() is expected to post an unspecified
> "back buffer" to the display system using some internal driver magic.
> EGL driver is then expected to obtain another back buffer at some
> unspecified point in the future.

Yes, this is rather the point: EGL doesn't specify platform-related
'black magic' to make things just work, because that's part of the
platform implementation details. And, as things stand, on Linux one of
those things is implicit synchronisation, unless the desired end state
of your driver is no synchronisation.

This thread is a discussion about changing that.

> > If you are using EGL_WL_bind_wayland_display, then one of the things
> > it is explicitly allowed/expected to do is to create a Wayland
> > protocol interface between client and compositor, which can be used to
> > pass buffer handles and metadata in a platform-specific way. Adding
> > synchronisation is also possible.
> Only one-way synchronisation is possible with this mechanism. There's
> a standard protocol for recycling buffers - wl_buffer_release() so
> buffer hand-over from the compositor to client remains unsynchronised
> - see below.

That's not true; you can post back a sync token every time the client
buffer is used by the compositor.

> > > The most troublesome part was Wayland buffer release mechanism, as it 
> > > only involves a CPU signalling over Wayland IPC, without any 3D driver 
> > > involvement. The choices were: explicit synchronisation extension or a 
> > > buffer copy in the compositor (i.e. compositor textures from the copy, so 
> > > the client can re-write the original), or some implicit synchronisation 
> > > in kernel space (but that wasn't an option in Broadcom driver).
> >
> > You can add your own explicit synchronisation extension.
> I could but that requires implementing in in the driver and in a
> number of compositors, therefore a standard extension
> zwp_linux_explicit_synchronization_v1 is much better choice here than
> a custom one.

EGL_WL_bind_wayland_display is explicitly designed to allow each
driver to implement its own private extensions without modifying
compositors. For instance, Mesa adds the `wl_drm` extension, which is
used for bidirectional communication between the EGL implementations
in the client and compositor address spaces, without modifying either.

> > In every cross-process and cross-subsystem usecase, synchronisation is
> > obviously required. The two options for this are to implement kernel
> > support for implicit synchronisation (as everyone else has done),
> That would require major changes in driver architecture or a 2nd
> mechanisms doing the same thing but in kernel space - both are
> non-starters.

OK. As it stands, everyone else has the kernel mechanism (e.g. via
dmabuf resv), so in this case if you are reinventing the underlying
platform in a proprietary stack, you get to solve the same problems
yourselves.

Cheers,
Daniel
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Jason Ekstrand
On Mon, Mar 16, 2020 at 10:33 AM Tomek Bury  wrote:
>
> > GL and GLES are not relevant. What is relevant is EGL, which defines
> > interfaces to make things work on the native platform.
> Yes and no. This is what EGL spec says about sharing a texture between 
> contexts:
>
> "OpenGL and OpenGL ES makes no attempt to synchronize access to
> texture objects. If a texture object is bound to more than one
> context, then it is up to the programmer to ensure that the contents
> of the object are not being changed via one context while another
> context is using the texture object for rendering. The results of
> changing a texture object while another context is using it are
> undefined."
>
> There are similar statements with regards to the lack of
> synchronisation guarantees for EGL images or between GL and native
> rendering, etc. But the main thing here is that EGL and Vulkan differ
> significantly. The eglSwapBuffers() is expected to post an unspecified
> "back buffer" to the display system using some internal driver magic.
> EGL driver is then expected to obtain another back buffer at some
> unspecified point in the future. Vulkan on the other hand is very
> specific and explicit. The vkQueuePresentKHR() is expected to post a
> specific vkImage with an explicit set of set of semaphores. Another
> image is obtained through vkAcquireNextImageKHR() and it's the
> application's decision whether it wants a fence, a semaphore, both or
> none with the acquired buffer. The implicit synchronisation doesn't
> mix well with Vulkan drivers and requires a lot of extra plumbing  in
> the WSI code.

Yes, and that (the Vulkan issues in particular) is what I'm trying to
fix. :-)  (among other things...)  Assuming the kernel patch I linked
to, your usermode driver could stuff fences in the dma-buf without
having that be part of your kernel driver.  This assumes, of course,
that your kernel driver supports sync_file.

> > If you are using EGL_WL_bind_wayland_display, then one of the things
> > it is explicitly allowed/expected to do is to create a Wayland
> > protocol interface between client and compositor, which can be used to
> > pass buffer handles and metadata in a platform-specific way. Adding
> > synchronisation is also possible.
> Only one-way synchronisation is possible with this mechanism. There's
> a standard protocol for recycling buffers - wl_buffer_release() so
> buffer hand-over from the compositor to client remains unsynchronised
>
> - see below.
>
> > > The most troublesome part was Wayland buffer release mechanism, as it 
> > > only involves a CPU signalling over Wayland IPC, without any 3D driver 
> > > involvement. The choices were: explicit synchronisation extension or a 
> > > buffer copy in the compositor (i.e. compositor textures from the copy, so 
> > > the client can re-write the original), or some implicit synchronisation 
> > > in kernel space (but that wasn't an option in Broadcom driver).
> >
> > You can add your own explicit synchronisation extension.
> I could but that requires implementing in in the driver and in a
> number of compositors, therefore a standard extension
> zwp_linux_explicit_synchronization_v1 is much better choice here than
> a custom one.

I think you may be missing what Daniel is saying.  Wayland allows you
to do basically anything you want within your client and server-side
EGL implementations.  That could include the server-side EGL sending
an event with a fence every single time a flush operation happens in
the server-side GL/GLES implementation. (Could be glFlush, glFinish,
eglSwapBuffers, or other things).  Since wayland protocol events are
ordered, the client-side EGL implementation would get the most recent
flush event before it got the wl_buffer::release.  I fully agree that
it's rather cumbersome though.

> > In every cross-process and cross-subsystem usecase, synchronisation is
> > obviously required. The two options for this are to implement kernel
> > support for implicit synchronisation (as everyone else has done),
> That would require major changes in driver architecture or a 2nd
> mechanisms doing the same thing but in kernel space - both are
> non-starters.
>
> > or implement generic support for explicit synchronisation (as we have
> > been working on with implementations inside Weston and Exosphere at
> > least),
> The zwp_linux_explicit_synchronization_v1 is a good step forward. I'm
> using this extension as a main synchronisation mechanism in EGL and
> Vulkan driver whenever available. I remember that Gustavo Padovan was
> working on explicit sync support in the display system some time ago.
> I hope it got merged into kernel by now, but I don't know to what
> extend it's actually being used.

It is supported by KMS/atomic.  Legacy KMS, however, does not support it.

> > or implement private support for explicit synchronisation,
> If everything else fails, that would be the last resort scenario, but
> far from ideal and very costly in terms of 

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Tomek Bury
> vkAcquireNextImageKHR() [...] it's the application's decision whether it 
> wants a fence, a semaphore, both or none
Correction: "or none" is not allowed
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Tomek Bury
> GL and GLES are not relevant. What is relevant is EGL, which defines
> interfaces to make things work on the native platform.
Yes and no. This is what EGL spec says about sharing a texture between contexts:

"OpenGL and OpenGL ES makes no attempt to synchronize access to
texture objects. If a texture object is bound to more than one
context, then it is up to the programmer to ensure that the contents
of the object are not being changed via one context while another
context is using the texture object for rendering. The results of
changing a texture object while another context is using it are
undefined."

There are similar statements with regards to the lack of
synchronisation guarantees for EGL images or between GL and native
rendering, etc. But the main thing here is that EGL and Vulkan differ
significantly. The eglSwapBuffers() is expected to post an unspecified
"back buffer" to the display system using some internal driver magic.
EGL driver is then expected to obtain another back buffer at some
unspecified point in the future. Vulkan on the other hand is very
specific and explicit. The vkQueuePresentKHR() is expected to post a
specific vkImage with an explicit set of set of semaphores. Another
image is obtained through vkAcquireNextImageKHR() and it's the
application's decision whether it wants a fence, a semaphore, both or
none with the acquired buffer. The implicit synchronisation doesn't
mix well with Vulkan drivers and requires a lot of extra plumbing  in
the WSI code.

> If you are using EGL_WL_bind_wayland_display, then one of the things
> it is explicitly allowed/expected to do is to create a Wayland
> protocol interface between client and compositor, which can be used to
> pass buffer handles and metadata in a platform-specific way. Adding
> synchronisation is also possible.
Only one-way synchronisation is possible with this mechanism. There's
a standard protocol for recycling buffers - wl_buffer_release() so
buffer hand-over from the compositor to client remains unsynchronised
- see below.

> > The most troublesome part was Wayland buffer release mechanism, as it only 
> > involves a CPU signalling over Wayland IPC, without any 3D driver 
> > involvement. The choices were: explicit synchronisation extension or a 
> > buffer copy in the compositor (i.e. compositor textures from the copy, so 
> > the client can re-write the original), or some implicit synchronisation in 
> > kernel space (but that wasn't an option in Broadcom driver).
>
> You can add your own explicit synchronisation extension.
I could but that requires implementing in in the driver and in a
number of compositors, therefore a standard extension
zwp_linux_explicit_synchronization_v1 is much better choice here than
a custom one.

> In every cross-process and cross-subsystem usecase, synchronisation is
> obviously required. The two options for this are to implement kernel
> support for implicit synchronisation (as everyone else has done),
That would require major changes in driver architecture or a 2nd
mechanisms doing the same thing but in kernel space - both are
non-starters.

> or implement generic support for explicit synchronisation (as we have
> been working on with implementations inside Weston and Exosphere at
> least),
The zwp_linux_explicit_synchronization_v1 is a good step forward. I'm
using this extension as a main synchronisation mechanism in EGL and
Vulkan driver whenever available. I remember that Gustavo Padovan was
working on explicit sync support in the display system some time ago.
I hope it got merged into kernel by now, but I don't know to what
extend it's actually being used.

> or implement private support for explicit synchronisation,
If everything else fails, that would be the last resort scenario, but
far from ideal and very costly in terms of implementation and
maintenance as it would require maintaining custom patches for various
3rd party components or littering them with multiple custom explicit
synchronisation schemes.

> or do nothing and then be surprised at the lack of synchronisation.
Thank you, but no, thank you :)

Cheers,
Tomek
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Jason Ekstrand
On Mon, Mar 16, 2020 at 5:20 AM Laurent Pinchart
 wrote:
>
> On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > (I know I'm going to be spammed by so many mailing list ...)
> >
> > Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  
> > > wrote:
> > > > All,
> > > >
> > > > Sorry for casting such a broad net with this one. I'm sure most people
> > > > who reply will get at least one mailing list rejection.  However, this
> > > > is an issue that affects a LOT of components and that's why it's
> > > > thorny to begin with.  Please pardon the length of this e-mail as
> > > > well; I promise there's a concrete point/proposal at the end.
> > > >
> > > >
> > > > Explicit synchronization is the future of graphics and media.  At
> > > > least, that seems to be the consensus among all the graphics people
> > > > I've talked to.  I had a chat with one of the lead Android graphics
> > > > engineers recently who told me that doing explicit sync from the start
> > > > was one of the best engineering decisions Android ever made.  It's
> > > > also the direction being taken by more modern APIs such as Vulkan.
> > > >
> > > >
> > > > ## What are implicit and explicit synchronization?
> > > >
> > > > For those that aren't familiar with this space, GPUs, media encoders,
> > > > etc. are massively parallel and synchronization of some form is
> > > > required to ensure that everything happens in the right order and
> > > > avoid data races.  Implicit synchronization is when bits of work (3D,
> > > > compute, video encode, etc.) are implicitly based on the absolute
> > > > CPU-time order in which API calls occur.  Explicit synchronization is
> > > > when the client (whatever that means in any given context) provides
> > > > the dependency graph explicitly via some sort of synchronization
> > > > primitives.  If you're still confused, consider the following
> > > > examples:
> > > >
> > > > With OpenGL and EGL, almost everything is implicit sync.  Say you have
> > > > two OpenGL contexts sharing an image where one writes to it and the
> > > > other textures from it.  The way the OpenGL spec works, the client has
> > > > to make the API calls to render to the image before (in CPU time) it
> > > > makes the API calls which texture from the image.  As long as it does
> > > > this (and maybe inserts a glFlush?), the driver will ensure that the
> > > > rendering completes before the texturing happens and you get correct
> > > > contents.
> > > >
> > > > Implicit synchronization can also happen across processes.  Wayland,
> > > > for instance, is currently built on implicit sync where the client
> > > > does their rendering and then does a hand-off (via wl_surface::commit)
> > > > to tell the compositor it's done at which point the compositor can now
> > > > texture from the surface.  The hand-off ensures that the client's
> > > > OpenGL API calls happen before the server's OpenGL API calls.
> > > >
> > > > A good example of explicit synchronization is the Vulkan API.  There,
> > > > a client (or multiple clients) can simultaneously build command
> > > > buffers in different threads where one of those command buffers
> > > > renders to an image and the other textures from it and then submit
> > > > both of them at the same time with instructions to the driver for
> > > > which order to execute them in.  The execution order is described via
> > > > the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> > > > extension, you can even submit the work which does the texturing
> > > > BEFORE the work which does the rendering and the driver will sort it
> > > > out.
> > > >
> > > > The #1 problem with implicit synchronization (which explicit solves)
> > > > is that it leads to a lot of over-synchronization both in client space
> > > > and in driver/device space.  The client has to synchronize a lot more
> > > > because it has to ensure that the API calls happen in a particular
> > > > order.  The driver/device have to synchronize a lot more because they
> > > > never know what is going to end up being a synchronization point as an
> > > > API call on another thread/process may occur at any time.  As we move
> > > > to more and more multi-threaded programming this synchronization (on
> > > > the client-side especially) becomes more and more painful.
> > > >
> > > >
> > > > ## Current status in Linux
> > > >
> > > > Implicit synchronization in Linux works via a the kernel's internal
> > > > dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> > > > which represents the "done" status for some bit of work.  Typically,
> > > > dma_fences are created as a by-product of someone submitting some bit
> > > > of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> > > > set of dma_fences on it representing shared (read) and exclusive
> > > > (write) access to the object.  When work is submitted which, for
> > > > 

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Daniel Stone
Hi Tomek,

On Mon, 16 Mar 2020 at 12:55, Tomek Bury  wrote:
> I've been wrestling with the sync problems in Wayland some time ago, but only 
> with regards to 3D drivers.
>
> The guarantee given by the GL/GLES spec is limited to a single graphics 
> context. If the same buffer is accessed by 2 contexts the outcome is 
> unspecified. The cross-context and cross-process synchronisation is not 
> guaranteed. It happens to work on Mesa, because the read/write locking is 
> implemented in the kernel space, but it didn't work on Broadcom driver, which 
> has read-write interlocks in user space.

GL and GLES are not relevant. What is relevant is EGL, which defines
interfaces to make things work on the native platform. EGL doesn't
define any kind of synchronisation model for the Wayland, X11, or
GBM/KMS platforms - but it's one of the things which has to work. It
doesn't say that the implementation must make sure that the requested
format is displayable, but you sort of take it for granted that if you
ask EGL to display something it will do so.

Synchronisation is one of those mechanisms which is left to the
platform to implement under the hood. In the absence of platform
support for explicit synchronisation, the synchronisation must be
implicit.

>  A Vulkan client makes it even worse because of conflicting requirements: 
> Vulkan's vkQueuePresentKHR() passes in a number of semaphores but disallows 
> waiting. Wayland WSI requires wl_surface_commit() to be called from 
> vkQueuePresentKHR() which does require a wait, unless a synchronisation 
> primitive representing Vulkan samaphores is passed between Vulkan client and 
> the compositor.

If you are using EGL_WL_bind_wayland_display, then one of the things
it is explicitly allowed/expected to do is to create a Wayland
protocol interface between client and compositor, which can be used to
pass buffer handles and metadata in a platform-specific way. Adding
synchronisation is also possible.

> The most troublesome part was Wayland buffer release mechanism, as it only 
> involves a CPU signalling over Wayland IPC, without any 3D driver 
> involvement. The choices were: explicit synchronisation extension or a buffer 
> copy in the compositor (i.e. compositor textures from the copy, so the client 
> can re-write the original), or some implicit synchronisation in kernel space 
> (but that wasn't an option in Broadcom driver).

You can add your own explicit synchronisation extension.

In every cross-process and cross-subsystem usecase, synchronisation is
obviously required. The two options for this are to implement kernel
support for implicit synchronisation (as everyone else has done), or
implement generic support for explicit synchronisation (as we have
been working on with implementations inside Weston and Exosphere at
least), or implement private support for explicit synchronisation, or
do nothing and then be surprised at the lack of synchronisation.

Cheers,
Daniel
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Tomek Bury
>  As long as we can fall back to not using fences then we should be fine.
Buffers written by the camera are trivial because you control what
happens - just don't attach fence, so that the capture can be used
immediately. For recycled buffers there's an extra bit of work to do
because won't  be up to camera driver to decide whether the buffer
comes back with or without fence.
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Laurent Pinchart
Hi Tomek,

On Mon, Mar 16, 2020 at 12:55:27PM +, Tomek Bury wrote:
> Hi Jason,
> 
> I've been wrestling with the sync problems in Wayland some time ago, but only
> with regards to 3D drivers.
> 
> The guarantee given by the GL/GLES spec is limited to a single graphics
> context. If the same buffer is accessed by 2 contexts the outcome is
> unspecified. The cross-context and cross-process synchronisation is not
> guaranteed. It happens to work on Mesa, because the read/write locking is
> implemented in the kernel space, but it didn't work on Broadcom driver, which
> has read-write interlocks in user space.
> 
>  A Vulkan client makes it even worse because of conflicting requirements:
> Vulkan's vkQueuePresentKHR() passes in a number of semaphores but disallows
> waiting. Wayland WSI requires wl_surface_commit() to be called from
> vkQueuePresentKHR() which does require a wait, unless a synchronisation
> primitive representing Vulkan samaphores is passed between Vulkan client and
> the compositor.
> 
> The most troublesome part was Wayland buffer release mechanism, as it only
> involves a CPU signalling over Wayland IPC, without any 3D driver involvement.
> The choices were: explicit synchronisation extension or a buffer copy in the
> compositor (i.e. compositor textures from the copy, so the client can re-write
> the original), or some implicit synchronisation in kernel space (but that
> wasn't an option in Broadcom driver).
> 
> With regards to V4L2, I believe it could easily work the same way as 3D
> drivers, i.e. pass a buffer+fence pair to the next stage. The encode always
> succeeds, but for capture or decode, the main problem is the uncertain 
> outcome,
> I believe? If we're fine with rendering or displaying an occasional broken
> frame, then buffer+fence pair would work too. The broken frame will go into 
> the
> pipeline, but application can drain the pipeline and start over once the
> capture works again.
> 
> To answer some points raised by Laurent (although I'm unfamiliar with the
> camera drivers):
> 
> > you don't know until capture complete in which buffer the frame has
> > been captured
>
> Surely you do, you only don't know in advance if the capture will be 
> successful

You do in kernelspace, but not in userspace at the moment, due to buffer
recycling.

> > but if an error occurs during capture, they can be recycled internally and
> > put to the back of the queue.
>
> That would have to change in order to use explicit synchronisation. Every
> started capture becomes immediately available as a buffer+fence pair. Fence is
> signalled once the capture is finished (successfully or otherwise). The buffer
> must not be reused until it's released, possibly with another fence - in that
> case the buffer must not be reused until the release fence is signalled.

We could certainly change this at least in some cases, but it would
break existing userspace that doesn't expect incorrect frames.

I'm however not sure we could change this behaviour in every case, there
may be hardware that can't provide a guarantee on the order in which
buffers will be used. I'm aware this wouldn't be compatible with
explicit synchronization, and that's my point: camera hardware may not
always support explicit synchronization. As long as we can fall back to
not using fences then we should be fine.

-- 
Regards,

Laurent Pinchart
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel


Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Tomek Bury
Hi Jason,

I've been wrestling with the sync problems in Wayland some time ago, but
only with regards to 3D drivers.

The guarantee given by the GL/GLES spec is limited to a single graphics
context. If the same buffer is accessed by 2 contexts the outcome is
unspecified. The cross-context and cross-process synchronisation is not
guaranteed. It happens to work on Mesa, because the read/write locking is
implemented in the kernel space, but it didn't work on Broadcom driver,
which has read-write interlocks in user space.

 A Vulkan client makes it even worse because of conflicting requirements:
Vulkan's vkQueuePresentKHR() passes in a number of semaphores but disallows
waiting. Wayland WSI requires wl_surface_commit() to be called from
vkQueuePresentKHR() which does require a wait, unless a synchronisation
primitive representing Vulkan samaphores is passed between Vulkan client
and the compositor.

The most troublesome part was Wayland buffer release mechanism, as it only
involves a CPU signalling over Wayland IPC, without any 3D driver
involvement. The choices were: explicit synchronisation extension or a
buffer copy in the compositor (i.e. compositor textures from the copy, so
the client can re-write the original), or some implicit synchronisation in
kernel space (but that wasn't an option in Broadcom driver).

With regards to V4L2, I believe it could easily work the same way as 3D
drivers, i.e. pass a buffer+fence pair to the next stage. The encode always
succeeds, but for capture or decode, the main problem is the uncertain
outcome, I believe? If we're fine with rendering or displaying an
occasional broken frame, then buffer+fence pair would work too. The broken
frame will go into the pipeline, but application can drain the pipeline and
start over once the capture works again.

To answer some points raised by Laurent (although I'm unfamiliar with the
camera drivers):

> you don't know until capture complete in which buffer the frame has
been captured
Surely you do, you only don't know in advance if the capture will be
successful

> but if an error occurs during capture, they can be recycled internally
and put to the back of the queue.
That would have to change in order to use explicit synchronisation. Every
started capture becomes immediately available as a buffer+fence pair. Fence
is signalled once the capture is finished (successfully or otherwise). The
buffer must not be reused until it's released, possibly with another fence
- in that case the buffer must not be reused until the release fence is
signalled.

Cheers,
Tomek

On Mon, 16 Mar 2020 at 10:20, Laurent Pinchart <
laurent.pinch...@ideasonboard.com> wrote:

> On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> > (I know I'm going to be spammed by so many mailing list ...)
> >
> > Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand 
> wrote:
> > > > All,
> > > >
> > > > Sorry for casting such a broad net with this one. I'm sure most
> people
> > > > who reply will get at least one mailing list rejection.  However,
> this
> > > > is an issue that affects a LOT of components and that's why it's
> > > > thorny to begin with.  Please pardon the length of this e-mail as
> > > > well; I promise there's a concrete point/proposal at the end.
> > > >
> > > >
> > > > Explicit synchronization is the future of graphics and media.  At
> > > > least, that seems to be the consensus among all the graphics people
> > > > I've talked to.  I had a chat with one of the lead Android graphics
> > > > engineers recently who told me that doing explicit sync from the
> start
> > > > was one of the best engineering decisions Android ever made.  It's
> > > > also the direction being taken by more modern APIs such as Vulkan.
> > > >
> > > >
> > > > ## What are implicit and explicit synchronization?
> > > >
> > > > For those that aren't familiar with this space, GPUs, media encoders,
> > > > etc. are massively parallel and synchronization of some form is
> > > > required to ensure that everything happens in the right order and
> > > > avoid data races.  Implicit synchronization is when bits of work (3D,
> > > > compute, video encode, etc.) are implicitly based on the absolute
> > > > CPU-time order in which API calls occur.  Explicit synchronization is
> > > > when the client (whatever that means in any given context) provides
> > > > the dependency graph explicitly via some sort of synchronization
> > > > primitives.  If you're still confused, consider the following
> > > > examples:
> > > >
> > > > With OpenGL and EGL, almost everything is implicit sync.  Say you
> have
> > > > two OpenGL contexts sharing an image where one writes to it and the
> > > > other textures from it.  The way the OpenGL spec works, the client
> has
> > > > to make the API calls to render to the image before (in CPU time) it
> > > > makes the API calls which texture from the image.  As long as it does
> > > > this 

Re: Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Laurent Pinchart
On Wed, Mar 11, 2020 at 04:18:55PM -0400, Nicolas Dufresne wrote:
> (I know I'm going to be spammed by so many mailing list ...)
> 
> Le mercredi 11 mars 2020 à 14:21 -0500, Jason Ekstrand a écrit :
> > On Wed, Mar 11, 2020 at 12:31 PM Jason Ekstrand  
> > wrote:
> > > All,
> > > 
> > > Sorry for casting such a broad net with this one. I'm sure most people
> > > who reply will get at least one mailing list rejection.  However, this
> > > is an issue that affects a LOT of components and that's why it's
> > > thorny to begin with.  Please pardon the length of this e-mail as
> > > well; I promise there's a concrete point/proposal at the end.
> > > 
> > > 
> > > Explicit synchronization is the future of graphics and media.  At
> > > least, that seems to be the consensus among all the graphics people
> > > I've talked to.  I had a chat with one of the lead Android graphics
> > > engineers recently who told me that doing explicit sync from the start
> > > was one of the best engineering decisions Android ever made.  It's
> > > also the direction being taken by more modern APIs such as Vulkan.
> > > 
> > > 
> > > ## What are implicit and explicit synchronization?
> > > 
> > > For those that aren't familiar with this space, GPUs, media encoders,
> > > etc. are massively parallel and synchronization of some form is
> > > required to ensure that everything happens in the right order and
> > > avoid data races.  Implicit synchronization is when bits of work (3D,
> > > compute, video encode, etc.) are implicitly based on the absolute
> > > CPU-time order in which API calls occur.  Explicit synchronization is
> > > when the client (whatever that means in any given context) provides
> > > the dependency graph explicitly via some sort of synchronization
> > > primitives.  If you're still confused, consider the following
> > > examples:
> > > 
> > > With OpenGL and EGL, almost everything is implicit sync.  Say you have
> > > two OpenGL contexts sharing an image where one writes to it and the
> > > other textures from it.  The way the OpenGL spec works, the client has
> > > to make the API calls to render to the image before (in CPU time) it
> > > makes the API calls which texture from the image.  As long as it does
> > > this (and maybe inserts a glFlush?), the driver will ensure that the
> > > rendering completes before the texturing happens and you get correct
> > > contents.
> > > 
> > > Implicit synchronization can also happen across processes.  Wayland,
> > > for instance, is currently built on implicit sync where the client
> > > does their rendering and then does a hand-off (via wl_surface::commit)
> > > to tell the compositor it's done at which point the compositor can now
> > > texture from the surface.  The hand-off ensures that the client's
> > > OpenGL API calls happen before the server's OpenGL API calls.
> > > 
> > > A good example of explicit synchronization is the Vulkan API.  There,
> > > a client (or multiple clients) can simultaneously build command
> > > buffers in different threads where one of those command buffers
> > > renders to an image and the other textures from it and then submit
> > > both of them at the same time with instructions to the driver for
> > > which order to execute them in.  The execution order is described via
> > > the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> > > extension, you can even submit the work which does the texturing
> > > BEFORE the work which does the rendering and the driver will sort it
> > > out.
> > > 
> > > The #1 problem with implicit synchronization (which explicit solves)
> > > is that it leads to a lot of over-synchronization both in client space
> > > and in driver/device space.  The client has to synchronize a lot more
> > > because it has to ensure that the API calls happen in a particular
> > > order.  The driver/device have to synchronize a lot more because they
> > > never know what is going to end up being a synchronization point as an
> > > API call on another thread/process may occur at any time.  As we move
> > > to more and more multi-threaded programming this synchronization (on
> > > the client-side especially) becomes more and more painful.
> > > 
> > > 
> > > ## Current status in Linux
> > > 
> > > Implicit synchronization in Linux works via a the kernel's internal
> > > dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> > > which represents the "done" status for some bit of work.  Typically,
> > > dma_fences are created as a by-product of someone submitting some bit
> > > of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> > > set of dma_fences on it representing shared (read) and exclusive
> > > (write) access to the object.  When work is submitted which, for
> > > instance renders to the dma_buf, it's queued waiting on all the fences
> > > on the dma_buf and and a dma_fence is created representing the end of
> > > said rendering work and it's installed as the dma_buf's 

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Michel Dänzer
On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> The synchronization works because the Mesa driver waits for idle (drains
> the GFX pipeline) at the end of command buffers and there is only 1
> graphics queue, so everything is ordered.
> 
> The GFX pipeline runs asynchronously to the command buffer, meaning the
> command buffer only starts draws and doesn't wait for completion. If the
> Mesa driver didn't wait at the end of the command buffer, the command
> buffer would finish and a different process could start execution of its
> own command buffer while shaders of the previous process are still running.
> 
> If the Mesa driver submits a command buffer internally (because it's full),
> it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> ended and a new one started.
> 
> The waiting at the end of command buffers happens only when the flush is
> external (Swap buffers, glFlush).
> 
> It's a performance problem, because the GFX queue is blocked until the GFX
> pipeline is drained at the end of every frame at least.
> 
> So explicit fences for SwapBuffers would help.

Not sure what difference it would make, since the same thing needs to be
done for explicit fences as well, doesn't it?


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel