Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 7:08 PM Jason Ekstrand  wrote:
>
> On Tue, Mar 17, 2020 at 7:16 PM Jacob Lifshay  
> wrote:
> >
> > On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
> > >
> > > Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > > > I think I found a userspace-accessible way to create sync_files and
> > > > dma_fences that would fulfill the requirements:
> > > > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> > > >
> > > > I'm just not sure if that's a good interface to use, since it appears
> > > > to be designed only for debugging. Will have to check for additional
> > > > requirements of signalling an error when the process that created the
> > > > fence is killed.
>
> It is expressly only for debugging and testing.  Exposing such an API
> to userspace would break the finite time guarantees that are relied
> upon to keep sync_file a secure API.

Ok, I was figuring that was probably the case.

> > > Something like that can certainly be lifted for general use if it makes
> > > sense. But then with a software renderer I don't really see how fences
> > > help you at all. With a software renderer you know exactly when the
> > > frame is finished and you can just defer pushing it over to the next
> > > pipeline element until that time. You won't gain any parallelism by
> > > using fences as the CPU is busy doing the rendering and will not run
> > > other stuff concurrently, right?
> >
> > There definitely may be other hardware and/or processes that can
> > process some stuff concurrently with the main application, such as the
> > compositor and or video encoding processes (for video capture).
> > Additionally, from what I understand, sync_file is the standard way to
> > export and import explicit synchronization between processes and
> > between drivers on Linux, so it seems like a good idea to support it
> > from an interoperability standpoint even if it turns out that there
> > aren't any scheduling/timing benefits.
>
> There are different ways that one can handle interoperability,
> however.  One way is to try and make the software rasterizer look as
> much like a GPU as possible:  lots of threads to make things as
> asynchronous as possible, "real" implementations of semaphores and
> fences, etc.

This is basically the route I've picked, though rather than making
lots of native threads, I'm planning on having just one thread per
core and have a work-stealing scheduler (inspired by Rust's rayon
crate) schedule all the individual render/compute jobs, because that
allows making a lot more jobs to allow finer load balancing.

> Another is to let a SW rasterizer be a SW rasterizer: do
> everything immediately, thread only so you can exercise all the CPU
> cores, and minimally implement semaphores and fences well enough to
> maintain compatibility.  If you take the first approach, then we have
> to solve all these problems with letting userspace create unsignaled
> sync_files which it will signal later and figure out how to make it
> safe.  If you take the second approach, you'll only ever have to
> return already signaled sync_files and there's no problem with the
> sync_file finite time guarantees.

The main issue with doing everything immediately is that a lot of the
function calls that games expect to take a very short time (e.g.
vkQueueSubmit) would instead take a much longer time, potentially
causing problems.

One idea for a safe userspace-backed sync_file is to have a step
counter that counts down until the sync_file is ready, where if
userspace doesn't tell it to count any steps in a certain amount of
time, then the sync_file switches to the error state. This way, it
will error shortly after a process deadlocks for some reason, while
still having the finite-time guarantee.

When the sync_file is created, the step counter would be set to the
number of jobs that the fence is waiting on.

It can also be set to pause the timeout to wait until another
sync_file signals, to handle cases where a sync_file is waiting on a
userspace process that is waiting on another sync_file.

The main issue is that the kernel would have to make sure that the
sync_file graph doesn't have loops, maybe by erroring all sync_files
that it finds in the loop.

Does that sound like a good idea?

Jacob
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 10:21 AM Lucas Stach  wrote:
>
> Am Dienstag, den 17.03.2020, 10:12 -0700 schrieb Jacob Lifshay:
> > One related issue with explicit sync using sync_file is that combined
> > CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
> > rendering in userspace (like llvmpipe but for Vulkan and with extra
> > instructions for GPU tasks) but need to synchronize with other
> > drivers/processes is that there should be some way to create an
> > explicit fence/semaphore from userspace and later signal it. This
> > seems to conflict with the requirement for a sync_file to complete in
> > finite time, since the user process could be stopped or killed.
> >
> > Any ideas?
>
> Finite just means "not infinite". If you stop the process that's doing
> part of the pipeline processing you block the pipeline, you get to keep
> the pieces in that case.

Seems reasonable.

> That's one of the issues with implicit sync
> that explicit may solve: a single client taking way too much time to
> render something can block the whole pipeline up until the display
> flip. With explicit sync the compositor can just decide to use the last
> client buffer if the latest buffer isn't ready by some deadline.
>
> With regard to the process getting killed: whatever you sync primitive
> is, you need to make sure to signal the fence (possibly with an error
> condition set) when you are not going to make progress anymore. So
> whatever your means to creating the sync_fd from your software renderer
> is, it needs to signal any outstanding fences on the sync_fd when the
> fd is closed.

I think I found a userspace-accessible way to create sync_files and
dma_fences that would fulfill the requirements:
https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c

I'm just not sure if that's a good interface to use, since it appears
to be designed only for debugging. Will have to check for additional
requirements of signalling an error when the process that created the
fence is killed.

Jacob

>
> Regards,
> Lucas
>
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 11:35 PM Jason Ekstrand  wrote:
>
> On Wed, Mar 18, 2020 at 12:20 AM Jacob Lifshay  
> wrote:
> >
> > The main issue with doing everything immediately is that a lot of the
> > function calls that games expect to take a very short time (e.g.
> > vkQueueSubmit) would instead take a much longer time, potentially
> > causing problems.
>
> Do you have any evidence that it will cause problems?  What I said
> above is what switfshader is doing and they're running real apps and
> I've not heard of it causing any problems.  It's also worth noting
> that you would only really have to stall at sync_file export.  You can
> async as much as you want internally.

Ok, seems worth trying out.

> > One idea for a safe userspace-backed sync_file is to have a step
> > counter that counts down until the sync_file is ready, where if
> > userspace doesn't tell it to count any steps in a certain amount of
> > time, then the sync_file switches to the error state. This way, it
> > will error shortly after a process deadlocks for some reason, while
> > still having the finite-time guarantee.
> >
> > When the sync_file is created, the step counter would be set to the
> > number of jobs that the fence is waiting on.
> >
> > It can also be set to pause the timeout to wait until another
> > sync_file signals, to handle cases where a sync_file is waiting on a
> > userspace process that is waiting on another sync_file.
> >
> > The main issue is that the kernel would have to make sure that the
> > sync_file graph doesn't have loops, maybe by erroring all sync_files
> > that it finds in the loop.
> >
> > Does that sound like a good idea?
>
> Honestly, I don't think you'll ever be able to sell that to the kernel
> community.  All of the deadlock detection would add massive complexity
> to the already non-trivial dma_fence infrastructure and for what
> benefit?  So that a software rasterizer can try to pretend to be more
> like a GPU?  You're going to have some very serious perf numbers
> and/or other proof of necessity if you want to convince the kernel to
> people to accept that level of complexity/risk.  "I designed my
> software to work this way" isn't going to convince anyone of anything
> especially when literally every other software rasterizer I'm aware of
> is immediate and they work just fine.

After some further research, it turns out that it will work to have
all the sync_files that a sync_file needs to depend on specified at
creation, which forces the dependence graph to be a DAG since you
can't depend on a sync_file that isn't yet created, so loops are
impossible by design.

Since kernel deadlock detection isn't actually required, just timeouts
for the case of halted userspace, does this seem feasable?

I'd guess that it'd require maybe 200-300 lines of code in a
self-contained driver similar to the sync_file debugging driver
mentioned previously but with the additional timeout code for safety.

Jacob
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
One related issue with explicit sync using sync_file is that combined
CPUs/GPUs (the CPU cores *are* the GPU cores) that do all the
rendering in userspace (like llvmpipe but for Vulkan and with extra
instructions for GPU tasks) but need to synchronize with other
drivers/processes is that there should be some way to create an
explicit fence/semaphore from userspace and later signal it. This
seems to conflict with the requirement for a sync_file to complete in
finite time, since the user process could be stopped or killed.

Any ideas?

Jacob Lifshay
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-18 Thread Jacob Lifshay
On Tue, Mar 17, 2020 at 11:14 AM Lucas Stach  wrote:
>
> Am Dienstag, den 17.03.2020, 10:59 -0700 schrieb Jacob Lifshay:
> > I think I found a userspace-accessible way to create sync_files and
> > dma_fences that would fulfill the requirements:
> > https://github.com/torvalds/linux/blob/master/drivers/dma-buf/sw_sync.c
> >
> > I'm just not sure if that's a good interface to use, since it appears
> > to be designed only for debugging. Will have to check for additional
> > requirements of signalling an error when the process that created the
> > fence is killed.
>
> Something like that can certainly be lifted for general use if it makes
> sense. But then with a software renderer I don't really see how fences
> help you at all. With a software renderer you know exactly when the
> frame is finished and you can just defer pushing it over to the next
> pipeline element until that time. You won't gain any parallelism by
> using fences as the CPU is busy doing the rendering and will not run
> other stuff concurrently, right?

There definitely may be other hardware and/or processes that can
process some stuff concurrently with the main application, such as the
compositor and or video encoding processes (for video capture).
Additionally, from what I understand, sync_file is the standard way to
export and import explicit synchronization between processes and
between drivers on Linux, so it seems like a good idea to support it
from an interoperability standpoint even if it turns out that there
aren't any scheduling/timing benefits.

Jacob
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-03-02 Thread Jacob Lifshay
One idea for Marge-bot (don't know if you already do this):
Rust-lang has their bot (bors) automatically group together a few merge
requests into a single merge commit, which it then tests, then, then the
tests pass, it merges. This could help reduce CI runs to once a day (or
some other rate). If the tests fail, then it could automatically deduce
which one failed, by recursive subdivision or similar. There's also a
mechanism to adjust priority and grouping behavior when the defaults aren't
sufficient.

Jacob
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel


Re: [RFC] DeepColor Visual Class Extension

2017-09-20 Thread Jacob Lifshay
>
> If we left them out of the connection setup block and relied on a separate
> request to get a list of DeepColor visuals it would alleviate that issue, but
> then we run into the same issue Aaron pointed out with XGetWindowAttributes.
> Non-HDR clients would get what appears to be a bogus visual when querying the
> visual of an HDR window because xlib would be unaware of it.
>
> If you keep track of whether clients support HDR per-connection, you could
fake the HDR windows' visual id for the non-HDR clients, returning a
TrueColor visual id instead. This will avoid the problem of the client
thinking that the visual id is bogus.

Jacob Lifshay
___
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel