[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-12 Thread Terje Bergström
On 11.06.2013 15:09, Daniel Vetter wrote:
> Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
> is used to restart the ioctl if we had to kick a thread (to make sure
> it doesn't hold any locks), e.g. for a blocking wait on oustanding
> rendering. The codepaths taken work exactly as if the thread is
> interrupt with a signal.

You did make it clear that there's no resubmission, but other parts
confused me.

So this is used so that a legacy driver which does not do fine-grained
locking can interrupt all waits for completion for a wedged submit. This
way a driver-wide lock get unlocked, cleanup code acquires locks, does
the magic to unwedge GPU, and unlocks. Then user space can re-submit the
waits as it got -EAGAIN.

Fortunately the error code doesn't really matter as long as it's not
EAGAIN, so we can just use ETIMEDOUT as Thierry suggested in his follow-up.

Terje


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-12 Thread Daniel Vetter
On Wed, Jun 12, 2013 at 12:28 PM, Terje Bergstr?m  
wrote:
> On 11.06.2013 15:09, Daniel Vetter wrote:
>> Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
>> is used to restart the ioctl if we had to kick a thread (to make sure
>> it doesn't hold any locks), e.g. for a blocking wait on oustanding
>> rendering. The codepaths taken work exactly as if the thread is
>> interrupt with a signal.
>
> You did make it clear that there's no resubmission, but other parts
> confused me.
>
> So this is used so that a legacy driver which does not do fine-grained
> locking can interrupt all waits for completion for a wedged submit. This
> way a driver-wide lock get unlocked, cleanup code acquires locks, does
> the magic to unwedge GPU, and unlocks. Then user space can re-submit the
> waits as it got -EAGAIN.

I think this is not just for drivers without fine-grained locking, at
least I expect that we'll keep the same mechanism when switching over
to per-object locking - we simply have too many places where a thread
could arbitrarily block while holding locks that the gpu reset handler
also needs to grab. You could of course restructure the code massively
and drop all locks while waiting, but that means adding tons of
special-purpose code which is only really exercises when a gpu hang
occurs. Our approach with the ioctl restart otoh reuses codepaths
which are all heavily used by X (due to X constantly getting interrupt
by timers and input events).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-12 Thread Terje Bergström
On 11.06.2013 15:09, Daniel Vetter wrote:
 Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
 is used to restart the ioctl if we had to kick a thread (to make sure
 it doesn't hold any locks), e.g. for a blocking wait on oustanding
 rendering. The codepaths taken work exactly as if the thread is
 interrupt with a signal.

You did make it clear that there's no resubmission, but other parts
confused me.

So this is used so that a legacy driver which does not do fine-grained
locking can interrupt all waits for completion for a wedged submit. This
way a driver-wide lock get unlocked, cleanup code acquires locks, does
the magic to unwedge GPU, and unlocks. Then user space can re-submit the
waits as it got -EAGAIN.

Fortunately the error code doesn't really matter as long as it's not
EAGAIN, so we can just use ETIMEDOUT as Thierry suggested in his follow-up.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-12 Thread Daniel Vetter
On Wed, Jun 12, 2013 at 12:28 PM, Terje Bergström tbergst...@nvidia.com wrote:
 On 11.06.2013 15:09, Daniel Vetter wrote:
 Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
 is used to restart the ioctl if we had to kick a thread (to make sure
 it doesn't hold any locks), e.g. for a blocking wait on oustanding
 rendering. The codepaths taken work exactly as if the thread is
 interrupt with a signal.

 You did make it clear that there's no resubmission, but other parts
 confused me.

 So this is used so that a legacy driver which does not do fine-grained
 locking can interrupt all waits for completion for a wedged submit. This
 way a driver-wide lock get unlocked, cleanup code acquires locks, does
 the magic to unwedge GPU, and unlocks. Then user space can re-submit the
 waits as it got -EAGAIN.

I think this is not just for drivers without fine-grained locking, at
least I expect that we'll keep the same mechanism when switching over
to per-object locking - we simply have too many places where a thread
could arbitrarily block while holding locks that the gpu reset handler
also needs to grab. You could of course restructure the code massively
and drop all locks while waiting, but that means adding tons of
special-purpose code which is only really exercises when a gpu hang
occurs. Our approach with the ioctl restart otoh reuses codepaths
which are all heavily used by X (due to X constantly getting interrupt
by timers and input events).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Thierry Reding
On Tue, Jun 11, 2013 at 02:09:31PM +0200, Daniel Vetter wrote:
> On Tue, Jun 11, 2013 at 1:43 PM, Terje Bergstr?m  
> wrote:
> > On 11.06.2013 14:00, Daniel Vetter wrote:
> >> We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
> >> which blew up the gpu (that one has been submitted already in a
> >> different ioctl call), but to be able to restart the ioctl after the
> >> reset has completed: We need to kick every thread which is potentially
> >> holding GEM locks and make sure that we block them (at a point where
> >> they don't hold any locks) until the reset handler completed. To avoid
> >> a validation nightmare we use the same codepaths as we use for signal
> >> interrupts, so ioctl restarting is a very natural fit for this.
> >>
> >> Resubmitting victim workloads when a gpu crash happened is something
> >> the reset handler would do (kernel work item currently), not any
> >> userspace process doing an ioctl. But atm we don't resubmit victimized
> >> workloads.
> >
> > I don't understand the end-to-end of how resubmit is supposed to work.
> > User space is not supposed to resubmit, but still EAGAIN is returned to
> > user space, and drmIoctl() in user space just calls the came ioctl
> > again. Sounds like drmIoctl() is completely wrong.
> 
> Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
> is used to restart the ioctl if we had to kick a thread (to make sure
> it doesn't hold any locks), e.g. for a blocking wait on oustanding
> rendering. The codepaths taken work exactly as if the thread is
> interrupt with a signal.

Okay. So if I understand correctly that reserves EAGAIN for a very
specific purpose DRM-wide and therefore we can't really use it for
something Tegra-specific. Even if we wanted to side-step the issue
by not using drmIoctl(), it might be a bad idea (or even break in
some other path) if we use EAGAIN. I guess we'll have to find some
other error-code for the case where a zero timeout is given to the
syncpoint wait ioctl.

Maybe it's best to use ETIMEDOUT after, just like for the case of a
non-zero timeout and the syncpoint not being incremented in time.
Userspace should be able to differentiate between the two because it
knows what timeout it did pass to the ioctl.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Terje Bergström
On 11.06.2013 14:00, Daniel Vetter wrote:
> We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
> which blew up the gpu (that one has been submitted already in a
> different ioctl call), but to be able to restart the ioctl after the
> reset has completed: We need to kick every thread which is potentially
> holding GEM locks and make sure that we block them (at a point where
> they don't hold any locks) until the reset handler completed. To avoid
> a validation nightmare we use the same codepaths as we use for signal
> interrupts, so ioctl restarting is a very natural fit for this.
> 
> Resubmitting victim workloads when a gpu crash happened is something
> the reset handler would do (kernel work item currently), not any
> userspace process doing an ioctl. But atm we don't resubmit victimized
> workloads.

I don't understand the end-to-end of how resubmit is supposed to work.
User space is not supposed to resubmit, but still EAGAIN is returned to
user space, and drmIoctl() in user space just calls the came ioctl
again. Sounds like drmIoctl() is completely wrong.

In Tegra, when a job blows up, we reset the involved units, and set the
pushbuffer pointer of host1x to point to the next job, and re-enable
units. There's no need for anybody to resubmit anything, as kernel
already has them.

Terje


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Daniel Vetter
On Tue, Jun 11, 2013 at 1:43 PM, Terje Bergstr?m  
wrote:
> On 11.06.2013 14:00, Daniel Vetter wrote:
>> We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
>> which blew up the gpu (that one has been submitted already in a
>> different ioctl call), but to be able to restart the ioctl after the
>> reset has completed: We need to kick every thread which is potentially
>> holding GEM locks and make sure that we block them (at a point where
>> they don't hold any locks) until the reset handler completed. To avoid
>> a validation nightmare we use the same codepaths as we use for signal
>> interrupts, so ioctl restarting is a very natural fit for this.
>>
>> Resubmitting victim workloads when a gpu crash happened is something
>> the reset handler would do (kernel work item currently), not any
>> userspace process doing an ioctl. But atm we don't resubmit victimized
>> workloads.
>
> I don't understand the end-to-end of how resubmit is supposed to work.
> User space is not supposed to resubmit, but still EAGAIN is returned to
> user space, and drmIoctl() in user space just calls the came ioctl
> again. Sounds like drmIoctl() is completely wrong.

Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
is used to restart the ioctl if we had to kick a thread (to make sure
it doesn't hold any locks), e.g. for a blocking wait on oustanding
rendering. The codepaths taken work exactly as if the thread is
interrupt with a signal.

> In Tegra, when a job blows up, we reset the involved units, and set the
> pushbuffer pointer of host1x to point to the next job, and re-enable
> units. There's no need for anybody to resubmit anything, as kernel
> already has them.

Yeah, that's how it works in i915.ko, too.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Daniel Vetter
On Tue, Jun 11, 2013 at 12:48 PM, Thierry Reding
 wrote:
> On Tue, May 28, 2013 at 01:12:12PM -0600, Keith Packard wrote:
>> Thierry Reding  writes:
>>
>>
>> > That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
>> > at the history, drmIoctl() was introduced to automatically loop if a
>> > signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
>> > However the ioctl(3p) manpage doesn't mention that ioctl() returns
>> > EAGAIN in case it is interrupted by a signal.
>>
>> EAGAIN is being returned when the GPU is wedged to ask the application
>> to re-submit the request, which will presumably be held until the  GPU
>> is un-wedged.
>
> Isn't that a bit risky? What if something special needs to be done to
> unwedge the GPU other than re-submit the request, or if it just can't
> be reasonably unwedged. In that case drmIoctl() will keep looping
> indefinitely.
>
> If the above is indeed the expected behaviour for drivers, then we need
> a different error code for the SYNCPT_WAIT ioctl. EAGAIN is the best fit
> and anything else doesn't quite match the use-case. A different option
> might be not to use drmIoctl() on Tegra.

We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
which blew up the gpu (that one has been submitted already in a
different ioctl call), but to be able to restart the ioctl after the
reset has completed: We need to kick every thread which is potentially
holding GEM locks and make sure that we block them (at a point where
they don't hold any locks) until the reset handler completed. To avoid
a validation nightmare we use the same codepaths as we use for signal
interrupts, so ioctl restarting is a very natural fit for this.

Resubmitting victim workloads when a gpu crash happened is something
the reset handler would do (kernel work item currently), not any
userspace process doing an ioctl. But atm we don't resubmit victimized
workloads.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Thierry Reding
On Tue, May 28, 2013 at 01:12:12PM -0600, Keith Packard wrote:
> Thierry Reding  writes:
> 
> 
> > That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
> > at the history, drmIoctl() was introduced to automatically loop if a
> > signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
> > However the ioctl(3p) manpage doesn't mention that ioctl() returns
> > EAGAIN in case it is interrupted by a signal.
> 
> EAGAIN is being returned when the GPU is wedged to ask the application
> to re-submit the request, which will presumably be held until the  GPU
> is un-wedged.

Isn't that a bit risky? What if something special needs to be done to
unwedge the GPU other than re-submit the request, or if it just can't
be reasonably unwedged. In that case drmIoctl() will keep looping
indefinitely.

If the above is indeed the expected behaviour for drivers, then we need
a different error code for the SYNCPT_WAIT ioctl. EAGAIN is the best fit
and anything else doesn't quite match the use-case. A different option
might be not to use drmIoctl() on Tegra.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Thierry Reding
On Tue, May 28, 2013 at 01:12:12PM -0600, Keith Packard wrote:
 Thierry Reding thierry.red...@gmail.com writes:
 
 
  That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
  at the history, drmIoctl() was introduced to automatically loop if a
  signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
  However the ioctl(3p) manpage doesn't mention that ioctl() returns
  EAGAIN in case it is interrupted by a signal.
 
 EAGAIN is being returned when the GPU is wedged to ask the application
 to re-submit the request, which will presumably be held until the  GPU
 is un-wedged.

Isn't that a bit risky? What if something special needs to be done to
unwedge the GPU other than re-submit the request, or if it just can't
be reasonably unwedged. In that case drmIoctl() will keep looping
indefinitely.

If the above is indeed the expected behaviour for drivers, then we need
a different error code for the SYNCPT_WAIT ioctl. EAGAIN is the best fit
and anything else doesn't quite match the use-case. A different option
might be not to use drmIoctl() on Tegra.

Thierry


pgpB1aPispHf1.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Daniel Vetter
On Tue, Jun 11, 2013 at 12:48 PM, Thierry Reding
thierry.red...@gmail.com wrote:
 On Tue, May 28, 2013 at 01:12:12PM -0600, Keith Packard wrote:
 Thierry Reding thierry.red...@gmail.com writes:


  That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
  at the history, drmIoctl() was introduced to automatically loop if a
  signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
  However the ioctl(3p) manpage doesn't mention that ioctl() returns
  EAGAIN in case it is interrupted by a signal.

 EAGAIN is being returned when the GPU is wedged to ask the application
 to re-submit the request, which will presumably be held until the  GPU
 is un-wedged.

 Isn't that a bit risky? What if something special needs to be done to
 unwedge the GPU other than re-submit the request, or if it just can't
 be reasonably unwedged. In that case drmIoctl() will keep looping
 indefinitely.

 If the above is indeed the expected behaviour for drivers, then we need
 a different error code for the SYNCPT_WAIT ioctl. EAGAIN is the best fit
 and anything else doesn't quite match the use-case. A different option
 might be not to use drmIoctl() on Tegra.

We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
which blew up the gpu (that one has been submitted already in a
different ioctl call), but to be able to restart the ioctl after the
reset has completed: We need to kick every thread which is potentially
holding GEM locks and make sure that we block them (at a point where
they don't hold any locks) until the reset handler completed. To avoid
a validation nightmare we use the same codepaths as we use for signal
interrupts, so ioctl restarting is a very natural fit for this.

Resubmitting victim workloads when a gpu crash happened is something
the reset handler would do (kernel work item currently), not any
userspace process doing an ioctl. But atm we don't resubmit victimized
workloads.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Terje Bergström
On 11.06.2013 14:00, Daniel Vetter wrote:
 We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
 which blew up the gpu (that one has been submitted already in a
 different ioctl call), but to be able to restart the ioctl after the
 reset has completed: We need to kick every thread which is potentially
 holding GEM locks and make sure that we block them (at a point where
 they don't hold any locks) until the reset handler completed. To avoid
 a validation nightmare we use the same codepaths as we use for signal
 interrupts, so ioctl restarting is a very natural fit for this.
 
 Resubmitting victim workloads when a gpu crash happened is something
 the reset handler would do (kernel work item currently), not any
 userspace process doing an ioctl. But atm we don't resubmit victimized
 workloads.

I don't understand the end-to-end of how resubmit is supposed to work.
User space is not supposed to resubmit, but still EAGAIN is returned to
user space, and drmIoctl() in user space just calls the came ioctl
again. Sounds like drmIoctl() is completely wrong.

In Tegra, when a job blows up, we reset the involved units, and set the
pushbuffer pointer of host1x to point to the next job, and re-enable
units. There's no need for anybody to resubmit anything, as kernel
already has them.

Terje
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Daniel Vetter
On Tue, Jun 11, 2013 at 1:43 PM, Terje Bergström tbergst...@nvidia.com wrote:
 On 11.06.2013 14:00, Daniel Vetter wrote:
 We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
 which blew up the gpu (that one has been submitted already in a
 different ioctl call), but to be able to restart the ioctl after the
 reset has completed: We need to kick every thread which is potentially
 holding GEM locks and make sure that we block them (at a point where
 they don't hold any locks) until the reset handler completed. To avoid
 a validation nightmare we use the same codepaths as we use for signal
 interrupts, so ioctl restarting is a very natural fit for this.

 Resubmitting victim workloads when a gpu crash happened is something
 the reset handler would do (kernel work item currently), not any
 userspace process doing an ioctl. But atm we don't resubmit victimized
 workloads.

 I don't understand the end-to-end of how resubmit is supposed to work.
 User space is not supposed to resubmit, but still EAGAIN is returned to
 user space, and drmIoctl() in user space just calls the came ioctl
 again. Sounds like drmIoctl() is completely wrong.

Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
is used to restart the ioctl if we had to kick a thread (to make sure
it doesn't hold any locks), e.g. for a blocking wait on oustanding
rendering. The codepaths taken work exactly as if the thread is
interrupt with a signal.

 In Tegra, when a job blows up, we reset the involved units, and set the
 pushbuffer pointer of host1x to point to the next job, and re-enable
 units. There's no need for anybody to resubmit anything, as kernel
 already has them.

Yeah, that's how it works in i915.ko, too.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-06-11 Thread Thierry Reding
On Tue, Jun 11, 2013 at 02:09:31PM +0200, Daniel Vetter wrote:
 On Tue, Jun 11, 2013 at 1:43 PM, Terje Bergström tbergst...@nvidia.com 
 wrote:
  On 11.06.2013 14:00, Daniel Vetter wrote:
  We don't use the EAGAIN ioctl restarting to resubmit the batchbuffer
  which blew up the gpu (that one has been submitted already in a
  different ioctl call), but to be able to restart the ioctl after the
  reset has completed: We need to kick every thread which is potentially
  holding GEM locks and make sure that we block them (at a point where
  they don't hold any locks) until the reset handler completed. To avoid
  a validation nightmare we use the same codepaths as we use for signal
  interrupts, so ioctl restarting is a very natural fit for this.
 
  Resubmitting victim workloads when a gpu crash happened is something
  the reset handler would do (kernel work item currently), not any
  userspace process doing an ioctl. But atm we don't resubmit victimized
  workloads.
 
  I don't understand the end-to-end of how resubmit is supposed to work.
  User space is not supposed to resubmit, but still EAGAIN is returned to
  user space, and drmIoctl() in user space just calls the came ioctl
  again. Sounds like drmIoctl() is completely wrong.
 
 Maybe it wasn't clear, but -EAGAIN does _not_ resubmit work. -EAGAIN
 is used to restart the ioctl if we had to kick a thread (to make sure
 it doesn't hold any locks), e.g. for a blocking wait on oustanding
 rendering. The codepaths taken work exactly as if the thread is
 interrupt with a signal.

Okay. So if I understand correctly that reserves EAGAIN for a very
specific purpose DRM-wide and therefore we can't really use it for
something Tegra-specific. Even if we wanted to side-step the issue
by not using drmIoctl(), it might be a bad idea (or even break in
some other path) if we use EAGAIN. I guess we'll have to find some
other error-code for the case where a zero timeout is given to the
syncpoint wait ioctl.

Maybe it's best to use ETIMEDOUT after, just like for the case of a
non-zero timeout and the syncpoint not being incremented in time.
Userspace should be able to differentiate between the two because it
knows what timeout it did pass to the ioctl.

Thierry


pgpF2HJgG5Npy.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-28 Thread Keith Packard
Thierry Reding  writes:


> That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
> at the history, drmIoctl() was introduced to automatically loop if a
> signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
> However the ioctl(3p) manpage doesn't mention that ioctl() returns
> EAGAIN in case it is interrupted by a signal.

EAGAIN is being returned when the GPU is wedged to ask the application
to re-submit the request, which will presumably be held until the  GPU
is un-wedged.

-- 
keith.packard at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: not available
URL: 



[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-28 Thread Thierry Reding
On Mon, May 27, 2013 at 09:55:46AM +0300, Arto Merilainen wrote:
> On 05/26/2013 01:12 PM, Thierry Reding wrote:
> >* PGP Signed by an unknown key
> >
> >On Fri, May 17, 2013 at 02:49:44PM +0300, Arto Merilainen wrote:
[...]
> >Thinking about it, maybe it would be good to have two separate error
> >codes. Keeping -EAGAIN for the case where a zero timeout was passed
> >doesn't sound too bad to differentiate it from the case where a non-
> >zero timeout was passed and it actually timed out. What do you think?
> 
> I agree, in this case it would not look bad at all. However, user
> space libraries may loop until the ioctl return code is something
> else than -EAGAIN or -EINTR. Especially function drmIoctl() in
> libdrm does this which is why I noted this isssue in the first
> place.
> 
> If user space uses zero timeout to just check if a syncpoint value
> has already passed the library continues looping until the syncpoint
> value actually passes. Of course, we could just modify the ioctl
> interface to "cast" this return code to something else but that does
> not seem correct.

That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
at the history, drmIoctl() was introduced to automatically loop if a
signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
However the ioctl(3p) manpage doesn't mention that ioctl() returns
EAGAIN in case it is interrupted by a signal.

I'm adding Keith as author of that commit and the xorg-devel mailing
list on Cc to get some more eyes on this.

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-28 Thread Thierry Reding
On Mon, May 27, 2013 at 09:55:46AM +0300, Arto Merilainen wrote:
 On 05/26/2013 01:12 PM, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Fri, May 17, 2013 at 02:49:44PM +0300, Arto Merilainen wrote:
[...]
 Thinking about it, maybe it would be good to have two separate error
 codes. Keeping -EAGAIN for the case where a zero timeout was passed
 doesn't sound too bad to differentiate it from the case where a non-
 zero timeout was passed and it actually timed out. What do you think?
 
 I agree, in this case it would not look bad at all. However, user
 space libraries may loop until the ioctl return code is something
 else than -EAGAIN or -EINTR. Especially function drmIoctl() in
 libdrm does this which is why I noted this isssue in the first
 place.
 
 If user space uses zero timeout to just check if a syncpoint value
 has already passed the library continues looping until the syncpoint
 value actually passes. Of course, we could just modify the ioctl
 interface to cast this return code to something else but that does
 not seem correct.

That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
at the history, drmIoctl() was introduced to automatically loop if a
signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
However the ioctl(3p) manpage doesn't mention that ioctl() returns
EAGAIN in case it is interrupted by a signal.

I'm adding Keith as author of that commit and the xorg-devel mailing
list on Cc to get some more eyes on this.

Thierry


pgpyyffoUJ8YI.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-28 Thread Keith Packard
Thierry Reding thierry.red...@gmail.com writes:


 That doesn't sound right. Maybe drmIoctl() needs fixing instead. Looking
 at the history, drmIoctl() was introduced to automatically loop if a
 signal was received (commit 8b9ab108ec1f2ba2b503f713769c4946849b3cb2).
 However the ioctl(3p) manpage doesn't mention that ioctl() returns
 EAGAIN in case it is interrupted by a signal.

EAGAIN is being returned when the GPU is wedged to ask the application
to re-submit the request, which will presumably be held until the  GPU
is un-wedged.

-- 
keith.pack...@intel.com


pgpuDOHJ5RMjV.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-27 Thread Arto Merilainen
On 05/26/2013 01:12 PM, Thierry Reding wrote:
> * PGP Signed by an unknown key
>
> On Fri, May 17, 2013 at 02:49:44PM +0300, Arto Merilainen wrote:
>> Syncpoint wait returned EAGAIN if it was called with zero timeout.
>> This patch modifies the function to return ETIMEDOUT.
>
> This description is a bit redundant, because it repeats in prose what
> the code does. I'd rather see a description of why the change is
> necessary.

True. Will fix.

>
> Thinking about it, maybe it would be good to have two separate error
> codes. Keeping -EAGAIN for the case where a zero timeout was passed
> doesn't sound too bad to differentiate it from the case where a non-
> zero timeout was passed and it actually timed out. What do you think?

I agree, in this case it would not look bad at all. However, user space 
libraries may loop until the ioctl return code is something else than 
-EAGAIN or -EINTR. Especially function drmIoctl() in libdrm does this 
which is why I noted this isssue in the first place.

If user space uses zero timeout to just check if a syncpoint value has 
already passed the library continues looping until the syncpoint value 
actually passes. Of course, we could just modify the ioctl interface to 
"cast" this return code to something else but that does not seem correct.

- Arto


Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-27 Thread Arto Merilainen

On 05/26/2013 01:12 PM, Thierry Reding wrote:

* PGP Signed by an unknown key

On Fri, May 17, 2013 at 02:49:44PM +0300, Arto Merilainen wrote:

Syncpoint wait returned EAGAIN if it was called with zero timeout.
This patch modifies the function to return ETIMEDOUT.


This description is a bit redundant, because it repeats in prose what
the code does. I'd rather see a description of why the change is
necessary.


True. Will fix.



Thinking about it, maybe it would be good to have two separate error
codes. Keeping -EAGAIN for the case where a zero timeout was passed
doesn't sound too bad to differentiate it from the case where a non-
zero timeout was passed and it actually timed out. What do you think?


I agree, in this case it would not look bad at all. However, user space 
libraries may loop until the ioctl return code is something else than 
-EAGAIN or -EINTR. Especially function drmIoctl() in libdrm does this 
which is why I noted this isssue in the first place.


If user space uses zero timeout to just check if a syncpoint value has 
already passed the library continues looping until the syncpoint value 
actually passes. Of course, we could just modify the ioctl interface to 
cast this return code to something else but that does not seem correct.


- Arto
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-26 Thread Thierry Reding
On Fri, May 17, 2013 at 02:49:44PM +0300, Arto Merilainen wrote:
> Syncpoint wait returned EAGAIN if it was called with zero timeout.
> This patch modifies the function to return ETIMEDOUT.

This description is a bit redundant, because it repeats in prose what
the code does. I'd rather see a description of why the change is
necessary.

Thinking about it, maybe it would be good to have two separate error
codes. Keeping -EAGAIN for the case where a zero timeout was passed
doesn't sound too bad to differentiate it from the case where a non-
zero timeout was passed and it actually timed out. What do you think?

Thierry
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: 



Re: [PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-26 Thread Thierry Reding
On Fri, May 17, 2013 at 02:49:44PM +0300, Arto Merilainen wrote:
 Syncpoint wait returned EAGAIN if it was called with zero timeout.
 This patch modifies the function to return ETIMEDOUT.

This description is a bit redundant, because it repeats in prose what
the code does. I'd rather see a description of why the change is
necessary.

Thinking about it, maybe it would be good to have two separate error
codes. Keeping -EAGAIN for the case where a zero timeout was passed
doesn't sound too bad to differentiate it from the case where a non-
zero timeout was passed and it actually timed out. What do you think?

Thierry


pgpVRG7ijonV2.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-17 Thread Arto Merilainen
Syncpoint wait returned EAGAIN if it was called with zero timeout.
This patch modifies the function to return ETIMEDOUT.

Signed-off-by: Arto Merilainen 
---
 drivers/gpu/host1x/syncpt.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 4b49345..5bf5366 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -187,7 +187,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 
thresh, long timeout,
}

if (!timeout) {
-   err = -EAGAIN;
+   err = -ETIMEDOUT;
goto done;
}

@@ -205,7 +205,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 
thresh, long timeout,
if (err)
goto done;

-   err = -EAGAIN;
+   err = -ETIMEDOUT;
/* Caller-specified timeout may be impractically low */
if (timeout < 0)
timeout = LONG_MAX;
-- 
1.7.9.5



[PATCH 2/6] gpu: host1x: Fix syncpoint wait return value

2013-05-17 Thread Arto Merilainen
Syncpoint wait returned EAGAIN if it was called with zero timeout.
This patch modifies the function to return ETIMEDOUT.

Signed-off-by: Arto Merilainen amerilai...@nvidia.com
---
 drivers/gpu/host1x/syncpt.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 4b49345..5bf5366 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -187,7 +187,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 
thresh, long timeout,
}
 
if (!timeout) {
-   err = -EAGAIN;
+   err = -ETIMEDOUT;
goto done;
}
 
@@ -205,7 +205,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 
thresh, long timeout,
if (err)
goto done;
 
-   err = -EAGAIN;
+   err = -ETIMEDOUT;
/* Caller-specified timeout may be impractically low */
if (timeout  0)
timeout = LONG_MAX;
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel