Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-12 Thread Jason Ekstrand
On Wed, Jul 12, 2017 at 9:45 AM, Christian König 
wrote:

> Am 12.07.2017 um 17:53 schrieb Jason Ekstrand:
>
> [SNIP]
>
>
>> Is that easier than just waiting in the kernel, I'm not sure how
>> optimised we need this path to be.
>>
>
> I don't think so.  I think it's more-or-less the same code regardless of
> how it's done.  The advantage of doing it in the kernel is that it's
> standardized (we don't have to both go write that userspace code) and it
> doesn't have the problems stated above.
>
>
> Ok, I'm convinced. The next price question is then how do we do it?
>
> I mean writing an IOCTL to wait for a fence to appear is simple, but do we
> also need to wait for a fence to change?
>

I don't think so.  Taking a page from the Vulkan book, I think the "right"
thing to do would be to have a reset ioctl that simply deletes the
dma_fence and replaces it with NULL.  Then the only behavior you care about
is "wait for a fence to appear".  The usage pattern becomes:

 1) Submit work to signal the syncobj
 2) Wait on the syncobj (this may happen before the submit)
 3) Once everyone is done waiting, reset the syncobj

If someone does a wait on a stale syncobj that has completed and not yet
been reset, they return immediately.  Yes, there is a possible race between
2 and 3, especially if 2 is waiting on multiple syncobjs.  Ideally, we'd
make it so that a reset in the middle of someone else's wait wouldn't cause
them to wait until that syncobj gets triggerd a second time.  However, I
don't think it's a deal-breaker as any client that sets the "wait for
submit" flag should know that it's liable to wait forever and set a timeout.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-12 Thread Christian König

Am 12.07.2017 um 17:53 schrieb Jason Ekstrand:

[SNIP]

Is that easier than just waiting in the kernel, I'm not sure how
optimised we need this path to be.


I don't think so.  I think it's more-or-less the same code regardless 
of how it's done.  The advantage of doing it in the kernel is that 
it's standardized (we don't have to both go write that userspace code) 
and it doesn't have the problems stated above.


Ok, I'm convinced. The next price question is then how do we do it?

I mean writing an IOCTL to wait for a fence to appear is simple, but do 
we also need to wait for a fence to change?


As long as waits don't consume fences that might be rather tricky.

Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-12 Thread Jason Ekstrand
On Wed, Jul 12, 2017 at 1:39 AM, Dave Airlie  wrote:

> On 12 July 2017 at 17:39, Christian König  wrote:
> > Am 11.07.2017 um 17:43 schrieb Jason Ekstrand:
> >
> > On Tue, Jul 11, 2017 at 12:17 AM, Christian König <
> deathsim...@vodafone.de>
> > wrote:
> >>
> >> [SNIP]
> >
> >  If we ever want to share fences across processes (which we do),
> >  then this needs to be sorted in the kernel.
> 
>   That would clearly get a NAK from my side, even Microsoft forbids
>   wait before signal because you can easily end up in deadlock
>  situations.
> 
>  Please don't NAK things that are required by the API specification and
>  CTS tests.
> >>>
> >>> There is no requirement for every aspect of the Vulkan API
> specification
> >>> to be mirrored 1:1 in the kernel <-> userspace API. We have to work out
> >>> what makes sense at each level.
> >>
> >>
> >> Exactly, if we have a synchronization problem between two processes that
> >> should be solved in userspace.
> >>
> >> E.g. if process A hasn't submitted it's work to the kernel it should
> flush
> >> it's commands before sending a flip event to the compositor.
> >
> >
> > Ok, I think there is some confusion here on what is being proposed.  Here
> > are some things that are *not* being proposed:
> >
> >  1. This does *not* allow a client to block another client's GPU work
> > indefinitely.  This is entirely for a CPU wait API to allow for a "wait
> for
> > submit" as well as a "wait for finish".
> >
> > Yeah, that is a rather good point.
> >
> >  2. This is *not* for system compositors that need to be robust against
> > malicious clients.
> >
> > I can see the point, but I think the kernel interface should still be
> idiot
> > prove even in the unlikely case the universe suddenly stops to produce
> > idiots.
>

Fair enough.  Maybe I've spent too much time in the Vulkan world where
being an idiot is, theoretically, disallowed.  And, by "disallowed", I mean
that you're free to be one with the understanding that your process may get
straight-up killed on *any* API violation.


> > The expected use for the OPAQUE_FD is two very tightly integrated
> processes
> > which trust each other but need to be able to share synchronization
> > primitives.
> >
> > Well, that raises a really important question: What is the actual use
> case
> > for this if not communication between client and compositor?
>
> VR clients and compositors.
>

Yeah, that.  Wouldn't they want the same security guarantees as your OS
compositor?  One would think, but none of the people making VR compositors
seem to care all that much about the case of malicious clients.  In
general, they run a fairly closed platform where they aren't really running
"arbitrary" apps on their compositor.  Also, they tend to write both the
client and server sides of the VR compositor protocol and the only thing
the app touches is their API.  I'm not going to try too hard to justify
their lack of concern about deadlock, but there it is.


> > Could we do this "in userspace"?  Yes, with added kernel API.  we would
> need
> > some way of strapping a second FD onto a syncobj or combining two FDs
> into
> > one to send across the wire or something like that, then add a shared
> memory
> > segment, and then pile on a bunch of code to do cross-process condition
> > variables and state tracking.  I really don't see how that's a better
> > solution than adding a flag to the kernel API to just do what we want.
> >
> > Way to complicated.
> >
> > My thinking was rather to optionally allow a single page to be mmap()ed
> into
> > the process address space from the fd and then put a futex, pthread_cond
> or
> > X shared memory fence or anything like that into it.
>

A single page + fence sounds a lot like a DRM BO.  One of my original plans
for implementing the feature was to just use a single-page BO and do some
userspace stuff in the mapped page.  There are two problems here:

1) It could be very easy for a malicious client to map the page and then
mess up whatever CPU data structure I use for the semaphore.  I could
probably make it robust but there is an attack vector there that's going to
be tricky.
2) I have no way, on import, to tell the difference between a 4K memory
object and a fence.

Then syncobj came along and promised to solve all my problems...


> Is that easier than just waiting in the kernel, I'm not sure how
> optimised we need this path to be.
>

I don't think so.  I think it's more-or-less the same code regardless of
how it's done.  The advantage of doing it in the kernel is that it's
standardized (we don't have to both go write that userspace code) and it
doesn't have the problems stated above.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-12 Thread Dave Airlie
On 12 July 2017 at 17:39, Christian König  wrote:
> Am 11.07.2017 um 17:43 schrieb Jason Ekstrand:
>
> On Tue, Jul 11, 2017 at 12:17 AM, Christian König 
> wrote:
>>
>> [SNIP]
>
>  If we ever want to share fences across processes (which we do),
>  then this needs to be sorted in the kernel.

  That would clearly get a NAK from my side, even Microsoft forbids
  wait before signal because you can easily end up in deadlock
 situations.

 Please don't NAK things that are required by the API specification and
 CTS tests.
>>>
>>> There is no requirement for every aspect of the Vulkan API specification
>>> to be mirrored 1:1 in the kernel <-> userspace API. We have to work out
>>> what makes sense at each level.
>>
>>
>> Exactly, if we have a synchronization problem between two processes that
>> should be solved in userspace.
>>
>> E.g. if process A hasn't submitted it's work to the kernel it should flush
>> it's commands before sending a flip event to the compositor.
>
>
> Ok, I think there is some confusion here on what is being proposed.  Here
> are some things that are *not* being proposed:
>
>  1. This does *not* allow a client to block another client's GPU work
> indefinitely.  This is entirely for a CPU wait API to allow for a "wait for
> submit" as well as a "wait for finish".
>
> Yeah, that is a rather good point.
>
>  2. This is *not* for system compositors that need to be robust against
> malicious clients.
>
> I can see the point, but I think the kernel interface should still be idiot
> prove even in the unlikely case the universe suddenly stops to produce
> idiots.
>
> The expected use for the OPAQUE_FD is two very tightly integrated processes
> which trust each other but need to be able to share synchronization
> primitives.
>
> Well, that raises a really important question: What is the actual use case
> for this if not communication between client and compositor?

VR clients and compositors.

> Could we do this "in userspace"?  Yes, with added kernel API.  we would need
> some way of strapping a second FD onto a syncobj or combining two FDs into
> one to send across the wire or something like that, then add a shared memory
> segment, and then pile on a bunch of code to do cross-process condition
> variables and state tracking.  I really don't see how that's a better
> solution than adding a flag to the kernel API to just do what we want.
>
> Way to complicated.
>
> My thinking was rather to optionally allow a single page to be mmap()ed into
> the process address space from the fd and then put a futex, pthread_cond or
> X shared memory fence or anything like that into it.
>

Is that easier than just waiting in the kernel, I'm not sure how
optimised we need this path to be.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-12 Thread Christian König

Am 11.07.2017 um 17:43 schrieb Jason Ekstrand:
On Tue, Jul 11, 2017 at 12:17 AM, Christian König 
> wrote:


[SNIP]

 If we ever want to share fences across processes
(which we do),
 then this needs to be sorted in the kernel.

 That would clearly get a NAK from my side, even
Microsoft forbids
 wait before signal because you can easily end up in
deadlock situations.

Please don't NAK things that are required by the API
specification and
CTS tests.

There is no requirement for every aspect of the Vulkan API
specification
to be mirrored 1:1 in the kernel <-> userspace API. We have to
work out
what makes sense at each level.


Exactly, if we have a synchronization problem between two
processes that should be solved in userspace.

E.g. if process A hasn't submitted it's work to the kernel it
should flush it's commands before sending a flip event to the
compositor.


Ok, I think there is some confusion here on what is being proposed.  
Here are some things that are *not* being proposed:


 1. This does *not* allow a client to block another client's GPU work 
indefinitely.  This is entirely for a CPU wait API to allow for a 
"wait for submit" as well as a "wait for finish".

Yeah, that is a rather good point.

 2. This is *not* for system compositors that need to be robust 
against malicious clients.
I can see the point, but I think the kernel interface should still be 
idiot prove even in the unlikely case the universe suddenly stops to 
produce idiots.


The expected use for the OPAQUE_FD is two very tightly integrated 
processes which trust each other but need to be able to share 
synchronization primitives.
Well, that raises a really important question: What is the actual use 
case for this if not communication between client and compositor?


Could we do this "in userspace"?  Yes, with added kernel API.  we 
would need some way of strapping a second FD onto a syncobj or 
combining two FDs into one to send across the wire or something like 
that, then add a shared memory segment, and then pile on a bunch of 
code to do cross-process condition variables and state tracking.  I 
really don't see how that's a better solution than adding a flag to 
the kernel API to just do what we want.

Way to complicated.

My thinking was rather to optionally allow a single page to be mmap()ed 
into the process address space from the fd and then put a futex, 
pthread_cond or X shared memory fence or anything like that into it.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-11 Thread Jason Ekstrand
On Tue, Jul 11, 2017 at 12:22 AM, Daniel Vetter  wrote:

> On Mon, Jul 10, 2017 at 02:09:42PM -0700, Jason Ekstrand wrote:
> > On Mon, Jul 10, 2017 at 9:15 AM, Christian König <
> deathsim...@vodafone.de>
> > wrote:
> >
> > > Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:
> > >
> > > On Mon, Jul 10, 2017 at 8:45 AM, Christian König <
> deathsim...@vodafone.de>
> > > wrote:
> > >
> > >> Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
> > >>
> > >> On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie 
> wrote:
> > >> [SNIP]
> > >> So, reading some CTS tests again, and I think we have a problem here.
> > >> The Vulkan spec allows you to wait on a fence that is in the
> unsignaled
> > >> state.
> > >>
> > >>
> > >> At least on the closed source driver that would be illegal as far as I
> > >> know.
> > >>
> > >
> > > Then they are doing workarounds in userspace.  There are definitely CTS
> > > tests for this:
> > >
> > > https://github.com/KhronosGroup/VK-GL-CTS/blob/
> master/external/vulkancts/
> > > modules/vulkan/synchronization/vktSynchronizationBasicFenceTe
> sts.cpp#L74
> > >
> > >
> > >> You can't wait on a semaphore before the signal operation is send
> down to
> > >> the kerel.
> > >>
> > >
> > > We (Intel) deal with this today by tracking whether or not the fence
> has
> > > been submitted and using a condition variable in userspace to sort it
> all
> > > out.
> > >
> > >
> > > Which sounds exactly like what AMD is doing in it's drivers as well.
> > >
> >
> > Which doesn't work cross-process so...
> >
> > > If we ever want to share fences across processes (which we do), then
> this
> > > needs to be sorted in the kernel.
> > >
> > >
> > > That would clearly get a NAK from my side, even Microsoft forbids wait
> > > before signal because you can easily end up in deadlock situations.
> > >
> >
> > Please don't NAK things that are required by the API specification and
> CTS
> > tests.  That makes it very hard for people like me to get their jobs
> done.
> > :-)
> >
> > Now, as for whether or not it's a good idea.  First off, we do have
> > timeouts an a status querying mechanism so an application can just set a
> > timeout of 1s and do something if it times out.  Second, if the
> application
> > is a compositor or something else that doesn't trust its client, it
> > shouldn't be using the OPAQUE_FD mechanism of Vulkan semaphore/fence
> > sharing anyway.  For those scenarios, they can require the untrusted
> client
> > to use FENCE_FD (sync file) and they have all of the usual guarantees
> about
> > when the work got submitted, etc.
> >
> > Also, I'm more than happy to put this all behind a flag so it's not the
> > default behavior.
>
> Android had a similar requirement to have a fence fd before the fence
> existed in hwc1, before they fixed that in hwc2. But it's probably still
> useful for deeply pipelined renderes with littel memory, aka tiled
> renderers on phones.
>
> The idea we've tossed around is to create a so-called future fence. In the
> kernel if you try to deref a future fence, the usual thing that happens is
> you'll block (interruptibly, which we can because fence lookup might
> fail), _until_ a real fence shows up and can be returned. That implements
> the uapi expectations without risking deadlocks in the kernel, albeit with
> a bit much blocking. Still better than doing the same in userspace (since
> in userspace you probably need to do that when importing the fence, not at
> execbuf time).
>

Yes, I'm aware of the future fence idea.  However, that's not really all
that related.  We're not talking about blocking GPU work here.  We're
talking about the CPU wait API having support for "wait for submit and
signal" behavior instead of just "wait for signal".
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-11 Thread Jason Ekstrand
On Tue, Jul 11, 2017 at 12:17 AM, Christian König 
wrote:

> Am 11.07.2017 um 04:36 schrieb Michel Dänzer:
>
>> On 11/07/17 06:09 AM, Jason Ekstrand wrote:
>>
>>> On Mon, Jul 10, 2017 at 9:15 AM, Christian König
>>> > wrote:
>>>
>>>  Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:
>>>
  On Mon, Jul 10, 2017 at 8:45 AM, Christian König
  > wrote:

  Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:

>  On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie
>  > wrote:
>  [SNIP]
>  So, reading some CTS tests again, and I think we have a
>  problem here.  The Vulkan spec allows you to wait on a fence
>  that is in the unsignaled state.
>
  At least on the closed source driver that would be illegal as
  far as I know.


  Then they are doing workarounds in userspace.  There are
  definitely CTS tests for this:

  https://github.com/KhronosGroup/VK-GL-CTS/blob/master/
 external/vulkancts/modules/vulkan/synchronization/vktSync
 hronizationBasicFenceTests.cpp#L74
  

  You can't wait on a semaphore before the signal operation is
  send down to the kerel.


  We (Intel) deal with this today by tracking whether or not the
  fence has been submitted and using a condition variable in
  userspace to sort it all out.

>>>  Which sounds exactly like what AMD is doing in it's drivers as well.
>>>
>>>
>>> Which doesn't work cross-process so...
>>>
>> Surely it can be made to work by providing suitable kernel APIs to
>> userspace?
>>
>
> Well, that's exactly what Jason proposed to do, but I'm not very keen of
> that.
>
>  If we ever want to share fences across processes (which we do),
  then this needs to be sorted in the kernel.

>>>  That would clearly get a NAK from my side, even Microsoft forbids
>>>  wait before signal because you can easily end up in deadlock
>>> situations.
>>>
>>> Please don't NAK things that are required by the API specification and
>>> CTS tests.
>>>
>> There is no requirement for every aspect of the Vulkan API specification
>> to be mirrored 1:1 in the kernel <-> userspace API. We have to work out
>> what makes sense at each level.
>>
>
> Exactly, if we have a synchronization problem between two processes that
> should be solved in userspace.
>
> E.g. if process A hasn't submitted it's work to the kernel it should flush
> it's commands before sending a flip event to the compositor.
>

Ok, I think there is some confusion here on what is being proposed.  Here
are some things that are *not* being proposed:

 1. This does *not* allow a client to block another client's GPU work
indefinitely.  This is entirely for a CPU wait API to allow for a "wait for
submit" as well as a "wait for finish".
 2. This is *not* for system compositors that need to be robust against
malicious clients.

The expected use for the OPAQUE_FD is two very tightly integrated processes
which trust each other but need to be able to share synchronization
primitives.  One of the things they need to be able to do (as per the
Vulkan API) with those synchronization primitives is a "wait for submit and
finish" operation.  I'm happy for the kernel to have separate APIs for
"wait for submit" and "wait for finish" if that's more palatable but i
don't really see why there is such a strong reaction to the "wait for
submit and finish" behavior.

Could we do this "in userspace"?  Yes, with added kernel API.  we would
need some way of strapping a second FD onto a syncobj or combining two FDs
into one to send across the wire or something like that, then add a shared
memory segment, and then pile on a bunch of code to do cross-process
condition variables and state tracking.  I really don't see how that's a
better solution than adding a flag to the kernel API to just do what we
want.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-11 Thread Daniel Vetter
On Mon, Jul 10, 2017 at 02:09:42PM -0700, Jason Ekstrand wrote:
> On Mon, Jul 10, 2017 at 9:15 AM, Christian König 
> wrote:
> 
> > Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:
> >
> > On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
> > wrote:
> >
> >> Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
> >>
> >> On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie  wrote:
> >> [SNIP]
> >> So, reading some CTS tests again, and I think we have a problem here.
> >> The Vulkan spec allows you to wait on a fence that is in the unsignaled
> >> state.
> >>
> >>
> >> At least on the closed source driver that would be illegal as far as I
> >> know.
> >>
> >
> > Then they are doing workarounds in userspace.  There are definitely CTS
> > tests for this:
> >
> > https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/
> > modules/vulkan/synchronization/vktSynchronizationBasicFenceTests.cpp#L74
> >
> >
> >> You can't wait on a semaphore before the signal operation is send down to
> >> the kerel.
> >>
> >
> > We (Intel) deal with this today by tracking whether or not the fence has
> > been submitted and using a condition variable in userspace to sort it all
> > out.
> >
> >
> > Which sounds exactly like what AMD is doing in it's drivers as well.
> >
> 
> Which doesn't work cross-process so...
> 
> > If we ever want to share fences across processes (which we do), then this
> > needs to be sorted in the kernel.
> >
> >
> > That would clearly get a NAK from my side, even Microsoft forbids wait
> > before signal because you can easily end up in deadlock situations.
> >
> 
> Please don't NAK things that are required by the API specification and CTS
> tests.  That makes it very hard for people like me to get their jobs done.
> :-)
> 
> Now, as for whether or not it's a good idea.  First off, we do have
> timeouts an a status querying mechanism so an application can just set a
> timeout of 1s and do something if it times out.  Second, if the application
> is a compositor or something else that doesn't trust its client, it
> shouldn't be using the OPAQUE_FD mechanism of Vulkan semaphore/fence
> sharing anyway.  For those scenarios, they can require the untrusted client
> to use FENCE_FD (sync file) and they have all of the usual guarantees about
> when the work got submitted, etc.
> 
> Also, I'm more than happy to put this all behind a flag so it's not the
> default behavior.

Android had a similar requirement to have a fence fd before the fence
existed in hwc1, before they fixed that in hwc2. But it's probably still
useful for deeply pipelined renderes with littel memory, aka tiled
renderers on phones.

The idea we've tossed around is to create a so-called future fence. In the
kernel if you try to deref a future fence, the usual thing that happens is
you'll block (interruptibly, which we can because fence lookup might
fail), _until_ a real fence shows up and can be returned. That implements
the uapi expectations without risking deadlocks in the kernel, albeit with
a bit much blocking. Still better than doing the same in userspace (since
in userspace you probably need to do that when importing the fence, not at
execbuf time).

2nd step would then be to give drivers with a robust hand recover logic a
special interface to be able to instantiate hw waits on such a future
fence before the signalling part is queued up. As long as any waiters have
robust hang recovery we still don't have a problem. Everyone else (e.g.
drm display-only drivers, v4l nodes, camera ip, whatever else participates
in the shared buffers and fences stuff) would still block until the real
fence shows up.

Similar idea should work for semaphores too. Gustavo did look into the
future fence stuff iirc, I think there was even an rfc sometimes ago.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-11 Thread Christian König

Am 11.07.2017 um 04:36 schrieb Michel Dänzer:

On 11/07/17 06:09 AM, Jason Ekstrand wrote:

On Mon, Jul 10, 2017 at 9:15 AM, Christian König
> wrote:

 Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:

 On Mon, Jul 10, 2017 at 8:45 AM, Christian König
 > wrote:

 Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:

 On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie
 > wrote:
 [SNIP]
 So, reading some CTS tests again, and I think we have a
 problem here.  The Vulkan spec allows you to wait on a fence
 that is in the unsignaled state.

 At least on the closed source driver that would be illegal as
 far as I know.


 Then they are doing workarounds in userspace.  There are
 definitely CTS tests for this:

 
https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/modules/vulkan/synchronization/vktSynchronizationBasicFenceTests.cpp#L74
 

  


 You can't wait on a semaphore before the signal operation is
 send down to the kerel.


 We (Intel) deal with this today by tracking whether or not the
 fence has been submitted and using a condition variable in
 userspace to sort it all out.

 Which sounds exactly like what AMD is doing in it's drivers as well.


Which doesn't work cross-process so...

Surely it can be made to work by providing suitable kernel APIs to
userspace?


Well, that's exactly what Jason proposed to do, but I'm not very keen of 
that.



 If we ever want to share fences across processes (which we do),
 then this needs to be sorted in the kernel.

 That would clearly get a NAK from my side, even Microsoft forbids
 wait before signal because you can easily end up in deadlock situations.

Please don't NAK things that are required by the API specification and
CTS tests.

There is no requirement for every aspect of the Vulkan API specification
to be mirrored 1:1 in the kernel <-> userspace API. We have to work out
what makes sense at each level.


Exactly, if we have a synchronization problem between two processes that 
should be solved in userspace.


E.g. if process A hasn't submitted it's work to the kernel it should 
flush it's commands before sending a flip event to the compositor.


We can attach something to the fd making it possible for an X shared 
memory fence to be transported with it if that makes live easier.


This way the waiter implementation can still chose what to do and/or 
wait async for the client to have it's flushes completed etc...


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Michel Dänzer
On 11/07/17 06:09 AM, Jason Ekstrand wrote:
> On Mon, Jul 10, 2017 at 9:15 AM, Christian König
> > wrote:
> 
> Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:
>> On Mon, Jul 10, 2017 at 8:45 AM, Christian König
>> > wrote:
>>
>> Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
>>> On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie
>>> > wrote:
>>> [SNIP]
>>> So, reading some CTS tests again, and I think we have a
>>> problem here.  The Vulkan spec allows you to wait on a fence
>>> that is in the unsignaled state.
>>
>> At least on the closed source driver that would be illegal as
>> far as I know.
>>
>>
>> Then they are doing workarounds in userspace.  There are
>> definitely CTS tests for this:
>>
>> 
>> https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/modules/vulkan/synchronization/vktSynchronizationBasicFenceTests.cpp#L74
>> 
>> 
>>  
>>
>> You can't wait on a semaphore before the signal operation is
>> send down to the kerel.
>>
>>
>> We (Intel) deal with this today by tracking whether or not the
>> fence has been submitted and using a condition variable in
>> userspace to sort it all out.
> 
> Which sounds exactly like what AMD is doing in it's drivers as well.
> 
> 
> Which doesn't work cross-process so...

Surely it can be made to work by providing suitable kernel APIs to
userspace?


>> If we ever want to share fences across processes (which we do),
>> then this needs to be sorted in the kernel.
> 
> That would clearly get a NAK from my side, even Microsoft forbids
> wait before signal because you can easily end up in deadlock situations.
> 
> Please don't NAK things that are required by the API specification and
> CTS tests.

There is no requirement for every aspect of the Vulkan API specification
to be mirrored 1:1 in the kernel <-> userspace API. We have to work out
what makes sense at each level.


> That makes it very hard for people like me to get their jobs done. :-)

Jason, that's uncalled for. Christian is also just doing his job here.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Jason Ekstrand
On Mon, Jul 10, 2017 at 9:13 AM, Christian König <deathsim...@vodafone.de>
wrote:

> Am 10.07.2017 um 17:58 schrieb Xie, AlexBin:
>
> I understand this discussion from closes source driver terminology.
>
>
>
> If a process is killed before it sends out the signaling command, will
> some part of the GPU be in a waiting situation forever?
>
>
> Yes, exactly that's the problem here and the reason why that even
> Microsoft forbids that under windows.
>

To be clear, we are only discussing wait-before-submit for client CPU
waits.  The GPU will not be waiting, only some userspace process.  If that
process is a compositor, it should take measures to avoid getting hung up
by bad clients.  What this will *not* cause is the GPU (or kernel GPU
scheduling) to get hung up on waiting for some unsubmitted thing.


> Christian.
>
>
>
>
> Alex Bin Xie
>
> *From:* amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org
> <amd-gfx-boun...@lists.freedesktop.org>] *On Behalf Of *Jason Ekstrand
> *Sent:* Monday, July 10, 2017 11:53 AM
> *To:* Christian König <deathsim...@vodafone.de> <deathsim...@vodafone.de>
> *Cc:* Dave Airlie <airl...@gmail.com> <airl...@gmail.com>; Maling list -
> DRI developers <dri-devel@lists.freedesktop.org>
> <dri-devel@lists.freedesktop.org>; amd-gfx mailing list
> <amd-...@lists.freedesktop.org> <amd-...@lists.freedesktop.org>
> *Subject:* Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)
>
>
>
> On Mon, Jul 10, 2017 at 8:45 AM, Christian König <deathsim...@vodafone.de>
> wrote:
>
> Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
>
> On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie <airl...@gmail.com> wrote:
>
> From: Dave Airlie <airl...@redhat.com>
>
> This interface will allow sync object to be used to back
> Vulkan fences. This API is pretty much the vulkan fence waiting
> API, and I've ported the code from amdgpu.
>
> v2: accept relative timeout, pass remaining time back
> to userspace.
> v3: return to absolute timeouts.
> v4: absolute zero = poll,
> rewrite any/all code to have same operation for arrays
> return -EINVAL for 0 fences.
> v4.1: fixup fences allocation check, use u64_to_user_ptr
> v5: move to sec/nsec, and use timespec64 for calcs.
> v6: use -ETIME and drop the out status flag. (-ETIME
> is suggested by ickle, I can feel a shed painting)
>
> Signed-off-by: Dave Airlie <airl...@redhat.com>
> ---
>  drivers/gpu/drm/drm_internal.h |   2 +
>  drivers/gpu/drm/drm_ioctl.c|   2 +
>  drivers/gpu/drm/drm_syncobj.c  | 142 ++
> +++
>  include/uapi/drm/drm.h |  13 
>  4 files changed, 159 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_
> internal.h
> index 5cecc97..d71b50d 100644
> --- a/drivers/gpu/drm/drm_internal.h
> +++ b/drivers/gpu/drm/drm_internal.h
> @@ -157,3 +157,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device
> *dev, void *data,
>struct drm_file *file_private);
>  int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
>struct drm_file *file_private);
> +int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> +  struct drm_file *file_private);
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index f1e5681..385ce74 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE,
> drm_syncobj_fd_to_handle_ioctl,
>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
> + DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>
>  #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE( drm_ioctls )
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 89441bc..2d5a7a1 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1,5 +1,7 @@
>  /*
>   * Copyright 2017 Red Hat
> + * Parts ported from amdgpu (fence wait code).
> + * Copyright 2016 Advanced Micro Devices, Inc.
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a
>   * copy of this software and associated documentation files (the
> "Software"),
> @@ -31,6 +33,9 @@
>   * that contain an optional fence. The fence can be updated with a new
>   * fence, or be NULL.
>   *
> + * syncobj's can be waited upon, 

Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Jason Ekstrand
On Mon, Jul 10, 2017 at 9:15 AM, Christian König 
wrote:

> Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:
>
> On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
> wrote:
>
>> Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
>>
>> On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie  wrote:
>> [SNIP]
>> So, reading some CTS tests again, and I think we have a problem here.
>> The Vulkan spec allows you to wait on a fence that is in the unsignaled
>> state.
>>
>>
>> At least on the closed source driver that would be illegal as far as I
>> know.
>>
>
> Then they are doing workarounds in userspace.  There are definitely CTS
> tests for this:
>
> https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/
> modules/vulkan/synchronization/vktSynchronizationBasicFenceTests.cpp#L74
>
>
>> You can't wait on a semaphore before the signal operation is send down to
>> the kerel.
>>
>
> We (Intel) deal with this today by tracking whether or not the fence has
> been submitted and using a condition variable in userspace to sort it all
> out.
>
>
> Which sounds exactly like what AMD is doing in it's drivers as well.
>

Which doesn't work cross-process so...

> If we ever want to share fences across processes (which we do), then this
> needs to be sorted in the kernel.
>
>
> That would clearly get a NAK from my side, even Microsoft forbids wait
> before signal because you can easily end up in deadlock situations.
>

Please don't NAK things that are required by the API specification and CTS
tests.  That makes it very hard for people like me to get their jobs done.
:-)

Now, as for whether or not it's a good idea.  First off, we do have
timeouts an a status querying mechanism so an application can just set a
timeout of 1s and do something if it times out.  Second, if the application
is a compositor or something else that doesn't trust its client, it
shouldn't be using the OPAQUE_FD mechanism of Vulkan semaphore/fence
sharing anyway.  For those scenarios, they can require the untrusted client
to use FENCE_FD (sync file) and they have all of the usual guarantees about
when the work got submitted, etc.

Also, I'm more than happy to put this all behind a flag so it's not the
default behavior.

--Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Xie, AlexBin
I understand this discussion from closes source driver terminology.

If a process is killed before it sends out the signaling command, will some 
part of the GPU be in a waiting situation forever?

Alex Bin Xie
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Jason 
Ekstrand
Sent: Monday, July 10, 2017 11:53 AM
To: Christian König <deathsim...@vodafone.de>
Cc: Dave Airlie <airl...@gmail.com>; Maling list - DRI developers 
<dri-devel@lists.freedesktop.org>; amd-gfx mailing list 
<amd-...@lists.freedesktop.org>
Subject: Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
<deathsim...@vodafone.de<mailto:deathsim...@vodafone.de>> wrote:
Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie 
<airl...@gmail.com<mailto:airl...@gmail.com>> wrote:
From: Dave Airlie <airl...@redhat.com<mailto:airl...@redhat.com>>

This interface will allow sync object to be used to back
Vulkan fences. This API is pretty much the vulkan fence waiting
API, and I've ported the code from amdgpu.

v2: accept relative timeout, pass remaining time back
to userspace.
v3: return to absolute timeouts.
v4: absolute zero = poll,
rewrite any/all code to have same operation for arrays
return -EINVAL for 0 fences.
v4.1: fixup fences allocation check, use u64_to_user_ptr
v5: move to sec/nsec, and use timespec64 for calcs.
v6: use -ETIME and drop the out status flag. (-ETIME
is suggested by ickle, I can feel a shed painting)

Signed-off-by: Dave Airlie <airl...@redhat.com<mailto:airl...@redhat.com>>
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 142 +
 include/uapi/drm/drm.h |  13 
 4 files changed, 159 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 5cecc97..d71b50d 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -157,3 +157,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f1e5681..385ce74 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };

 #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE( drm_ioctls )
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 89441bc..2d5a7a1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1,5 +1,7 @@
 /*
  * Copyright 2017 Red Hat
+ * Parts ported from amdgpu (fence wait code).
+ * Copyright 2016 Advanced Micro Devices, Inc.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
@@ -31,6 +33,9 @@
  * that contain an optional fence. The fence can be updated with a new
  * fence, or be NULL.
  *
+ * syncobj's can be waited upon, where it will wait for the underlying
+ * fence.
+ *
  * syncobj's can be export to fd's and back, these fd's are opaque and
  * have no other use case, except passing the syncobj between processes.
  *
@@ -451,3 +456,140 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
return drm_syncobj_fd_to_handle(file_private, args->fd,
>handle);
 }
+
+/**
+ * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value
+ *
+ * @timeout_sec: timeout sec component, 0 for poll
+ * @timeout_nsec: timeout nsec component in ns, 0 for poll
+ * both must be 0 for poll.
+ *
+ * Calculate the timeout in jiffies from an absolute time in sec/nsec.
+ */
+static unsigned long drm_timeout_abs_to_jiffies(int64_t timeout_sec, uint64_t 
timeout_nsec)
+{
+   struct timespec64 abs_timeout, timeout, max_jiffy_timespec;
+   unsigned long timeout_jiffies;
+
+   /* make 0 timeout means poll - absolute 0 doesn't seem valid */
+   if (timeout_sec == 0 && timeout_nsec == 0)
+   return 0;
+
+   abs_timeout.tv_sec = timeout_sec;
+   abs_timeout.tv_nsec = timeout_nsec;
+
+   /* clamp ti

Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Christian König

Am 10.07.2017 um 17:52 schrieb Jason Ekstrand:
On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
> wrote:


Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:

On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie > wrote:
[SNIP]
So, reading some CTS tests again, and I think we have a problem
here.  The Vulkan spec allows you to wait on a fence that is in
the unsignaled state.


At least on the closed source driver that would be illegal as far
as I know.


Then they are doing workarounds in userspace.  There are definitely 
CTS tests for this:


https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/modules/vulkan/synchronization/vktSynchronizationBasicFenceTests.cpp#L74

You can't wait on a semaphore before the signal operation is send
down to the kerel.


We (Intel) deal with this today by tracking whether or not the fence 
has been submitted and using a condition variable in userspace to sort 
it all out.


Which sounds exactly like what AMD is doing in it's drivers as well.

If we ever want to share fences across processes (which we do), then 
this needs to be sorted in the kernel.


That would clearly get a NAK from my side, even Microsoft forbids wait 
before signal because you can easily end up in deadlock situations.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Christian König

Am 10.07.2017 um 17:58 schrieb Xie, AlexBin:


I understand this discussion from closes source driver terminology.

If a process is killed before it sends out the signaling command, will 
some part of the GPU be in a waiting situation forever?




Yes, exactly that's the problem here and the reason why that even 
Microsoft forbids that under windows.


Christian.


Alex Bin Xie

*From:*amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] *On 
Behalf Of *Jason Ekstrand

*Sent:* Monday, July 10, 2017 11:53 AM
*To:* Christian König <deathsim...@vodafone.de>
*Cc:* Dave Airlie <airl...@gmail.com>; Maling list - DRI developers 
<dri-devel@lists.freedesktop.org>; amd-gfx mailing list 
<amd-...@lists.freedesktop.org>

*Subject:* Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
<deathsim...@vodafone.de <mailto:deathsim...@vodafone.de>> wrote:


Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:

On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie <airl...@gmail.com
<mailto:airl...@gmail.com>> wrote:

From: Dave Airlie <airl...@redhat.com
<mailto:airl...@redhat.com>>

This interface will allow sync object to be used to back
Vulkan fences. This API is pretty much the vulkan fence
waiting
API, and I've ported the code from amdgpu.

v2: accept relative timeout, pass remaining time back
to userspace.
v3: return to absolute timeouts.
v4: absolute zero = poll,
rewrite any/all code to have same operation for arrays
return -EINVAL for 0 fences.
v4.1: fixup fences allocation check, use u64_to_user_ptr
v5: move to sec/nsec, and use timespec64 for calcs.
v6: use -ETIME and drop the out status flag. (-ETIME
is suggested by ickle, I can feel a shed painting)

Signed-off-by: Dave Airlie <airl...@redhat.com
<mailto:airl...@redhat.com>>
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 142
+
 include/uapi/drm/drm.h |  13 
 4 files changed, 159 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h
b/drivers/gpu/drm/drm_internal.h
index 5cecc97..d71b50d 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -157,3 +157,5 @@ int
drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev,
void *data,
 struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device
*dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_wait_ioctl(struct drm_device *dev, void
*data,
+  struct drm_file *file_private);
diff --git a/drivers/gpu/drm/drm_ioctl.c
b/drivers/gpu/drm/drm_ioctl.c
index f1e5681..385ce74 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -657,6 +657,8 @@ static const struct drm_ioctl_desc
drm_ioctls[] = {
DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE,
drm_syncobj_fd_to_handle_ioctl,
DRM_UNLOCKED|DRM_RENDER_ALLOW),
+  DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT,
drm_syncobj_wait_ioctl,
+  DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };

 #define DRM_CORE_IOCTL_COUNT  ARRAY_SIZE( drm_ioctls )
diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 89441bc..2d5a7a1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1,5 +1,7 @@
 /*
  * Copyright 2017 Red Hat
+ * Parts ported from amdgpu (fence wait code).
+ * Copyright 2016 Advanced Micro Devices, Inc.
  *
  * Permission is hereby granted, free of charge, to any
person obtaining a
  * copy of this software and associated documentation
files (the "Software"),
@@ -31,6 +33,9 @@
  * that contain an optional fence. The fence can be
updated with a new
  * fence, or be NULL.
  *
+ * syncobj's can be waited upon, where it will wait for
the underlying
+ * fence.
+ *
  * syncobj's can be export to fd's and back, these fd's
are opaque and
  

Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Jason Ekstrand
On Mon, Jul 10, 2017 at 8:45 AM, Christian König 
wrote:

> Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
>
> On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie  wrote:
>
>> From: Dave Airlie 
>>
>> This interface will allow sync object to be used to back
>> Vulkan fences. This API is pretty much the vulkan fence waiting
>> API, and I've ported the code from amdgpu.
>>
>> v2: accept relative timeout, pass remaining time back
>> to userspace.
>> v3: return to absolute timeouts.
>> v4: absolute zero = poll,
>> rewrite any/all code to have same operation for arrays
>> return -EINVAL for 0 fences.
>> v4.1: fixup fences allocation check, use u64_to_user_ptr
>> v5: move to sec/nsec, and use timespec64 for calcs.
>> v6: use -ETIME and drop the out status flag. (-ETIME
>> is suggested by ickle, I can feel a shed painting)
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  drivers/gpu/drm/drm_internal.h |   2 +
>>  drivers/gpu/drm/drm_ioctl.c|   2 +
>>  drivers/gpu/drm/drm_syncobj.c  | 142 ++
>> +++
>>  include/uapi/drm/drm.h |  13 
>>  4 files changed, 159 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_internal.h
>> b/drivers/gpu/drm/drm_internal.h
>> index 5cecc97..d71b50d 100644
>> --- a/drivers/gpu/drm/drm_internal.h
>> +++ b/drivers/gpu/drm/drm_internal.h
>> @@ -157,3 +157,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device
>> *dev, void *data,
>>struct drm_file *file_private);
>>  int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
>>struct drm_file *file_private);
>> +int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>> +  struct drm_file *file_private);
>> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
>> index f1e5681..385ce74 100644
>> --- a/drivers/gpu/drm/drm_ioctl.c
>> +++ b/drivers/gpu/drm/drm_ioctl.c
>> @@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
>>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
>> DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE,
>> drm_syncobj_fd_to_handle_ioctl,
>>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
>> +   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
>> + DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>  };
>>
>>  #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE( drm_ioctls )
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.
>> c
>> index 89441bc..2d5a7a1 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -1,5 +1,7 @@
>>  /*
>>   * Copyright 2017 Red Hat
>> + * Parts ported from amdgpu (fence wait code).
>> + * Copyright 2016 Advanced Micro Devices, Inc.
>>   *
>>   * Permission is hereby granted, free of charge, to any person obtaining
>> a
>>   * copy of this software and associated documentation files (the
>> "Software"),
>> @@ -31,6 +33,9 @@
>>   * that contain an optional fence. The fence can be updated with a new
>>   * fence, or be NULL.
>>   *
>> + * syncobj's can be waited upon, where it will wait for the underlying
>> + * fence.
>> + *
>>   * syncobj's can be export to fd's and back, these fd's are opaque and
>>   * have no other use case, except passing the syncobj between processes.
>>   *
>> @@ -451,3 +456,140 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device
>> *dev, void *data,
>> return drm_syncobj_fd_to_handle(file_private, args->fd,
>> >handle);
>>  }
>> +
>> +/**
>> + * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute
>> value
>> + *
>> + * @timeout_sec: timeout sec component, 0 for poll
>> + * @timeout_nsec: timeout nsec component in ns, 0 for poll
>> + * both must be 0 for poll.
>> + *
>> + * Calculate the timeout in jiffies from an absolute time in sec/nsec.
>> + */
>> +static unsigned long drm_timeout_abs_to_jiffies(int64_t timeout_sec,
>> uint64_t timeout_nsec)
>> +{
>> +   struct timespec64 abs_timeout, timeout, max_jiffy_timespec;
>> +   unsigned long timeout_jiffies;
>> +
>> +   /* make 0 timeout means poll - absolute 0 doesn't seem valid */
>> +   if (timeout_sec == 0 && timeout_nsec == 0)
>> +   return 0;
>> +
>> +   abs_timeout.tv_sec = timeout_sec;
>> +   abs_timeout.tv_nsec = timeout_nsec;
>> +
>> +   /* clamp timeout if it's to large */
>> +   if (!timespec64_valid_strict(_timeout))
>> +   return MAX_SCHEDULE_TIMEOUT - 1;
>> +
>> +   timeout = timespec64_sub(abs_timeout,
>> ktime_to_timespec64(ktime_get()));
>> +   if (!timespec64_valid())
>> +   return 0;
>> +
>> +   jiffies_to_timespec64(MAX_JIFFY_OFFSET, _jiffy_timespec);
>> +   if (timespec64_compare(, _jiffy_timespec) >= 0)
>> +   return MAX_SCHEDULE_TIMEOUT - 1;
>> +
>> +   timeout_jiffies = 

Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Christian König

Am 10.07.2017 um 17:28 schrieb Jason Ekstrand:
On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie > wrote:


From: Dave Airlie >

This interface will allow sync object to be used to back
Vulkan fences. This API is pretty much the vulkan fence waiting
API, and I've ported the code from amdgpu.

v2: accept relative timeout, pass remaining time back
to userspace.
v3: return to absolute timeouts.
v4: absolute zero = poll,
rewrite any/all code to have same operation for arrays
return -EINVAL for 0 fences.
v4.1: fixup fences allocation check, use u64_to_user_ptr
v5: move to sec/nsec, and use timespec64 for calcs.
v6: use -ETIME and drop the out status flag. (-ETIME
is suggested by ickle, I can feel a shed painting)

Signed-off-by: Dave Airlie >
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 142
+
 include/uapi/drm/drm.h |  13 
 4 files changed, 159 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h
b/drivers/gpu/drm/drm_internal.h
index 5cecc97..d71b50d 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -157,3 +157,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct
drm_device *dev, void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void
*data,
   struct drm_file *file_private);
+int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f1e5681..385ce74 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -657,6 +657,8 @@ static const struct drm_ioctl_desc
drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE,
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };

 #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE( drm_ioctls )
diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index 89441bc..2d5a7a1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1,5 +1,7 @@
 /*
  * Copyright 2017 Red Hat
+ * Parts ported from amdgpu (fence wait code).
+ * Copyright 2016 Advanced Micro Devices, Inc.
  *
  * Permission is hereby granted, free of charge, to any person
obtaining a
  * copy of this software and associated documentation files (the
"Software"),
@@ -31,6 +33,9 @@
  * that contain an optional fence. The fence can be updated with
a new
  * fence, or be NULL.
  *
+ * syncobj's can be waited upon, where it will wait for the
underlying
+ * fence.
+ *
  * syncobj's can be export to fd's and back, these fd's are
opaque and
  * have no other use case, except passing the syncobj between
processes.
  *
@@ -451,3 +456,140 @@ drm_syncobj_fd_to_handle_ioctl(struct
drm_device *dev, void *data,
return drm_syncobj_fd_to_handle(file_private, args->fd,
>handle);
 }
+
+/**
+ * drm_timeout_abs_to_jiffies - calculate jiffies timeout from
absolute value
+ *
+ * @timeout_sec: timeout sec component, 0 for poll
+ * @timeout_nsec: timeout nsec component in ns, 0 for poll
+ * both must be 0 for poll.
+ *
+ * Calculate the timeout in jiffies from an absolute time in
sec/nsec.
+ */
+static unsigned long drm_timeout_abs_to_jiffies(int64_t
timeout_sec, uint64_t timeout_nsec)
+{
+   struct timespec64 abs_timeout, timeout, max_jiffy_timespec;
+   unsigned long timeout_jiffies;
+
+   /* make 0 timeout means poll - absolute 0 doesn't seem
valid */
+   if (timeout_sec == 0 && timeout_nsec == 0)
+   return 0;
+
+   abs_timeout.tv_sec = timeout_sec;
+   abs_timeout.tv_nsec = timeout_nsec;
+
+   /* clamp timeout if it's to large */
+   if (!timespec64_valid_strict(_timeout))
+   return MAX_SCHEDULE_TIMEOUT - 1;
+
+   timeout = timespec64_sub(abs_timeout,
ktime_to_timespec64(ktime_get()));
+   if (!timespec64_valid())
+   return 0;
+
+   jiffies_to_timespec64(MAX_JIFFY_OFFSET, _jiffy_timespec);
+   if (timespec64_compare(, _jiffy_timespec) 

Re: [PATCH] drm/syncobj: add sync obj wait interface. (v6)

2017-07-10 Thread Jason Ekstrand
On Wed, Jul 5, 2017 at 6:04 PM, Dave Airlie  wrote:

> From: Dave Airlie 
>
> This interface will allow sync object to be used to back
> Vulkan fences. This API is pretty much the vulkan fence waiting
> API, and I've ported the code from amdgpu.
>
> v2: accept relative timeout, pass remaining time back
> to userspace.
> v3: return to absolute timeouts.
> v4: absolute zero = poll,
> rewrite any/all code to have same operation for arrays
> return -EINVAL for 0 fences.
> v4.1: fixup fences allocation check, use u64_to_user_ptr
> v5: move to sec/nsec, and use timespec64 for calcs.
> v6: use -ETIME and drop the out status flag. (-ETIME
> is suggested by ickle, I can feel a shed painting)
>
> Signed-off-by: Dave Airlie 
> ---
>  drivers/gpu/drm/drm_internal.h |   2 +
>  drivers/gpu/drm/drm_ioctl.c|   2 +
>  drivers/gpu/drm/drm_syncobj.c  | 142 ++
> +++
>  include/uapi/drm/drm.h |  13 
>  4 files changed, 159 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_
> internal.h
> index 5cecc97..d71b50d 100644
> --- a/drivers/gpu/drm/drm_internal.h
> +++ b/drivers/gpu/drm/drm_internal.h
> @@ -157,3 +157,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device
> *dev, void *data,
>struct drm_file *file_private);
>  int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
>struct drm_file *file_private);
> +int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> +  struct drm_file *file_private);
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index f1e5681..385ce74 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE,
> drm_syncobj_fd_to_handle_ioctl,
>   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
> + DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>
>  #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE( drm_ioctls )
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 89441bc..2d5a7a1 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1,5 +1,7 @@
>  /*
>   * Copyright 2017 Red Hat
> + * Parts ported from amdgpu (fence wait code).
> + * Copyright 2016 Advanced Micro Devices, Inc.
>   *
>   * Permission is hereby granted, free of charge, to any person obtaining a
>   * copy of this software and associated documentation files (the
> "Software"),
> @@ -31,6 +33,9 @@
>   * that contain an optional fence. The fence can be updated with a new
>   * fence, or be NULL.
>   *
> + * syncobj's can be waited upon, where it will wait for the underlying
> + * fence.
> + *
>   * syncobj's can be export to fd's and back, these fd's are opaque and
>   * have no other use case, except passing the syncobj between processes.
>   *
> @@ -451,3 +456,140 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device
> *dev, void *data,
> return drm_syncobj_fd_to_handle(file_private, args->fd,
> >handle);
>  }
> +
> +/**
> + * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute
> value
> + *
> + * @timeout_sec: timeout sec component, 0 for poll
> + * @timeout_nsec: timeout nsec component in ns, 0 for poll
> + * both must be 0 for poll.
> + *
> + * Calculate the timeout in jiffies from an absolute time in sec/nsec.
> + */
> +static unsigned long drm_timeout_abs_to_jiffies(int64_t timeout_sec,
> uint64_t timeout_nsec)
> +{
> +   struct timespec64 abs_timeout, timeout, max_jiffy_timespec;
> +   unsigned long timeout_jiffies;
> +
> +   /* make 0 timeout means poll - absolute 0 doesn't seem valid */
> +   if (timeout_sec == 0 && timeout_nsec == 0)
> +   return 0;
> +
> +   abs_timeout.tv_sec = timeout_sec;
> +   abs_timeout.tv_nsec = timeout_nsec;
> +
> +   /* clamp timeout if it's to large */
> +   if (!timespec64_valid_strict(_timeout))
> +   return MAX_SCHEDULE_TIMEOUT - 1;
> +
> +   timeout = timespec64_sub(abs_timeout,
> ktime_to_timespec64(ktime_get()));
> +   if (!timespec64_valid())
> +   return 0;
> +
> +   jiffies_to_timespec64(MAX_JIFFY_OFFSET, _jiffy_timespec);
> +   if (timespec64_compare(, _jiffy_timespec) >= 0)
> +   return MAX_SCHEDULE_TIMEOUT - 1;
> +
> +   timeout_jiffies = timespec64_to_jiffies();
> +   /*  clamp timeout to avoid infinite timeout */
> +   if (timeout_jiffies >= MAX_SCHEDULE_TIMEOUT)
> +   return MAX_SCHEDULE_TIMEOUT - 1;
> +
> +   return timeout_jiffies + 1;
> +}
> +
> +static int