[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2011-01-10 Thread Andrew Lutomirski
On Mon, Dec 27, 2010 at 7:58 AM, Mario Kleiner
 wrote:
> On Dec 26, 2010, at 3:53 PM, Andrew Lutomirski wrote:
>
>> On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
>>  wrote:
>>>
>>> There's a new drm module parameter for selecting the timeout: echo 50 >
>>> /sys/module/drm/parameters/vblankoffdelay
>>> would set the timeout to 50 msecs. A setting of zero will disable the
>>> timer,
>>> so vblank irq's would stay on all the time.
>>>
>>> The default setting is still 5000 msecs as before, but reducing this to
>>> 100
>>> msecs wouldn't be a real problem imho. At least i didn't observe any
>>> miscounting during extensive testing with 100 msecs.
>>>
>>> The patches in drm-next fix a couple of races that i observed on intel
>>> and
>>> radeon during testing and a few that i didn't see but that i could
>>> imagine
>>> happening. It tries to make sure that the saved final count at vblank irq
>>> disable of the software vblank_count and the gpu counter are consistent -
>>> no
>>> off by one errors. They also try to detect and filter out spurious vblank
>>> interrupts at vblank enable time, e.g., on the radeon.
>>>
>>> There's still one possible race in the disable path which i will try to
>>> fix:
>>> We don't know when exactly the hardware counter increments wrt. the
>>> processing of the vblank interrupt - it could increment a few
>>> (dozen/hundred) microseconds before or after the irq handler runs, so if
>>> you
>>> happen to query the hardware counter while the gpu is inside the vblank
>>> you
>>> can't be sure if you picked up the old count or the new count for that
>>> vblank.
>>
>> That's disgusting. ?Does this affect many GPUs? ?(I can't imagine why
>> any sensible implementation wouldn't guarantee that the counter
>> increments just before the IRQ.)
>
> ;-). I don't know, but at least on the tested R500 and R600 class Radeon's,
> this was the case, so i assume it's at least this way on many radeon gpu's
> (probably all avivo parts?) out there. We don't have any evergreen gpu's yet
> in our lab so i don't know how the more recent parts behave. Also it doesn't
> matter
>
> I guess it's also a matter of definition when a new video frame starts?
> Leading edge / trailing edge of vblank? Start of vsync? Something else?
>
>>>
>>> This only matters during vblank disable. For that reason it's not such a
>>> good idea to disable vblank irq's from within the vblank irq handler. I
>>> tried that and it didn't work well --> When doing it from within irq you
>>> basically maximize the chance of hitting the time window when the race
>>> can
>>> happen. Delaying within the irq handler by a millisecond would fix that,
>>> but
>>> that's not what we want.
>>>
>>> Having the disable in a function triggered by a timer like now is the
>>> most
>>> simple solution i could come up with. There we can burn a few dozen
>>> microseconds if neccessary to remove this race.
>>
>> Maybe I'm missing something, but other than the race above (which
>> seems like it shouldn't exist on sane hardware), the new code seems
>> more complicated than necessary.
>
> I don't think it's more complicated than necessary for what it tries to
> achieve, but of course i'm a bit biased. It also started off more simple and
> grew a bit when i found new issues with the tested gpu's.
>
> The aim is to fix a couple of real races and to make vblank counts and
> timestamps as trustworthy and oml_sync_control spec-conformant (see
> http://www.opengl.org/registry/specs/OML/glx_sync_control.txt) as possible.
> It only consumes a few additional bytes of memory (approx. 40 bytes) per
> crtc, doesn't use excessive time inside the irq handler and tries to avoid
> taking locks ?that are shared between irq and non-irq context to avoid
> delays in irq execution, also if used with a kernel with the preempt_rt
> patch applied (important for my use case and other hard realtime apps). It's
> pretty self-contained and because most of it is driver-independent it can
> handle similar issues on different gpu's and kms drivers without the need
> for us to check each single gpu + driver combo if it has such issues or not.
>

OK, having read the glx_sync_control spec this makes a lot more sense.
 I reread the code and I have two questions right off the bat:

1. What's the point of vblank_disable_and_save?  AFAICT all that it
does (other than disabling vblanks) is to possibly increment
_vblank_count, ensuring that (barring races) it's correct as soon as
the call returns.  But there are only three ways that
vblank_disable_and_save gets called: the disable timer, and two paths
in i915 that turn off vblanks when disabling crtcs.  The latter two
shouldn't really care because the crtc is about to turn off, and the
former, being a timer, doesn't return into anything that cares about
the counter.

2. Why are the timestamps coming from do_gettimeofday instead of
ktime_get?  Currently, if someone changes the system time, then weird
things could happen, I think.

--Andy


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2011-01-09 Thread Andrew Lutomirski
On Mon, Dec 27, 2010 at 7:58 AM, Mario Kleiner
mario.klei...@tuebingen.mpg.de wrote:
 On Dec 26, 2010, at 3:53 PM, Andrew Lutomirski wrote:

 On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
 mario.klei...@tuebingen.mpg.de wrote:

 There's a new drm module parameter for selecting the timeout: echo 50 
 /sys/module/drm/parameters/vblankoffdelay
 would set the timeout to 50 msecs. A setting of zero will disable the
 timer,
 so vblank irq's would stay on all the time.

 The default setting is still 5000 msecs as before, but reducing this to
 100
 msecs wouldn't be a real problem imho. At least i didn't observe any
 miscounting during extensive testing with 100 msecs.

 The patches in drm-next fix a couple of races that i observed on intel
 and
 radeon during testing and a few that i didn't see but that i could
 imagine
 happening. It tries to make sure that the saved final count at vblank irq
 disable of the software vblank_count and the gpu counter are consistent -
 no
 off by one errors. They also try to detect and filter out spurious vblank
 interrupts at vblank enable time, e.g., on the radeon.

 There's still one possible race in the disable path which i will try to
 fix:
 We don't know when exactly the hardware counter increments wrt. the
 processing of the vblank interrupt - it could increment a few
 (dozen/hundred) microseconds before or after the irq handler runs, so if
 you
 happen to query the hardware counter while the gpu is inside the vblank
 you
 can't be sure if you picked up the old count or the new count for that
 vblank.

 That's disgusting.  Does this affect many GPUs?  (I can't imagine why
 any sensible implementation wouldn't guarantee that the counter
 increments just before the IRQ.)

 ;-). I don't know, but at least on the tested R500 and R600 class Radeon's,
 this was the case, so i assume it's at least this way on many radeon gpu's
 (probably all avivo parts?) out there. We don't have any evergreen gpu's yet
 in our lab so i don't know how the more recent parts behave. Also it doesn't
 matter

 I guess it's also a matter of definition when a new video frame starts?
 Leading edge / trailing edge of vblank? Start of vsync? Something else?


 This only matters during vblank disable. For that reason it's not such a
 good idea to disable vblank irq's from within the vblank irq handler. I
 tried that and it didn't work well -- When doing it from within irq you
 basically maximize the chance of hitting the time window when the race
 can
 happen. Delaying within the irq handler by a millisecond would fix that,
 but
 that's not what we want.

 Having the disable in a function triggered by a timer like now is the
 most
 simple solution i could come up with. There we can burn a few dozen
 microseconds if neccessary to remove this race.

 Maybe I'm missing something, but other than the race above (which
 seems like it shouldn't exist on sane hardware), the new code seems
 more complicated than necessary.

 I don't think it's more complicated than necessary for what it tries to
 achieve, but of course i'm a bit biased. It also started off more simple and
 grew a bit when i found new issues with the tested gpu's.

 The aim is to fix a couple of real races and to make vblank counts and
 timestamps as trustworthy and oml_sync_control spec-conformant (see
 http://www.opengl.org/registry/specs/OML/glx_sync_control.txt) as possible.
 It only consumes a few additional bytes of memory (approx. 40 bytes) per
 crtc, doesn't use excessive time inside the irq handler and tries to avoid
 taking locks  that are shared between irq and non-irq context to avoid
 delays in irq execution, also if used with a kernel with the preempt_rt
 patch applied (important for my use case and other hard realtime apps). It's
 pretty self-contained and because most of it is driver-independent it can
 handle similar issues on different gpu's and kms drivers without the need
 for us to check each single gpu + driver combo if it has such issues or not.


OK, having read the glx_sync_control spec this makes a lot more sense.
 I reread the code and I have two questions right off the bat:

1. What's the point of vblank_disable_and_save?  AFAICT all that it
does (other than disabling vblanks) is to possibly increment
_vblank_count, ensuring that (barring races) it's correct as soon as
the call returns.  But there are only three ways that
vblank_disable_and_save gets called: the disable timer, and two paths
in i915 that turn off vblanks when disabling crtcs.  The latter two
shouldn't really care because the crtc is about to turn off, and the
former, being a timer, doesn't return into anything that cares about
the counter.

2. Why are the timestamps coming from do_gettimeofday instead of
ktime_get?  Currently, if someone changes the system time, then weird
things could happen, I think.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org

[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-27 Thread Mario Kleiner

On Dec 27, 2010, at 12:16 PM, Ville Syrj?l? wrote:

> On Mon, Dec 27, 2010 at 12:58:10AM +0100, Mario Kleiner wrote:
>
>> 2. There are gpu's firing spurious vblank irq's as soon as you enable
>> irq's
>
> You're sure this isn't simply a matter of the driver forgetting to ack
> the irq just before enabling it?
>

Good point. This was on radeon. I can't remember for certain if it  
happened always, or only frequently. I can check that later this week  
when i'm back at the test machine.

Anyway, it's good to be robust against such problems, regardless if  
it is gpu quirks or driver bugs. The current implementation would  
filter the redundant vblank irq and DRM_DEBUG a message if the  
drm.debug parameter is set.

thanks,
-mario

> -- 
> Ville Syrj?l?
> syrjala at sci.fi
> http://www.sci.fi/~syrjala/

*
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:+49 (0)7071/601-616
www:http://www.kyb.tuebingen.mpg.de/~kleinerm
*
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)



[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-27 Thread Ville Syrjälä
On Mon, Dec 27, 2010 at 12:58:10AM +0100, Mario Kleiner wrote:

> 2. There are gpu's firing spurious vblank irq's as soon as you enable  
> irq's

You're sure this isn't simply a matter of the driver forgetting to ack
the irq just before enabling it?

-- 
Ville Syrj?l?
syrjala at sci.fi
http://www.sci.fi/~syrjala/


[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-27 Thread Mario Kleiner
On Dec 26, 2010, at 3:53 PM, Andrew Lutomirski wrote:

> On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
>  wrote:
>>
>> There's a new drm module parameter for selecting the timeout: echo  
>> 50 >
>> /sys/module/drm/parameters/vblankoffdelay
>> would set the timeout to 50 msecs. A setting of zero will disable  
>> the timer,
>> so vblank irq's would stay on all the time.
>>
>> The default setting is still 5000 msecs as before, but reducing  
>> this to 100
>> msecs wouldn't be a real problem imho. At least i didn't observe any
>> miscounting during extensive testing with 100 msecs.
>>
>> The patches in drm-next fix a couple of races that i observed on  
>> intel and
>> radeon during testing and a few that i didn't see but that i could  
>> imagine
>> happening. It tries to make sure that the saved final count at  
>> vblank irq
>> disable of the software vblank_count and the gpu counter are  
>> consistent - no
>> off by one errors. They also try to detect and filter out spurious  
>> vblank
>> interrupts at vblank enable time, e.g., on the radeon.
>>
>> There's still one possible race in the disable path which i will  
>> try to fix:
>> We don't know when exactly the hardware counter increments wrt. the
>> processing of the vblank interrupt - it could increment a few
>> (dozen/hundred) microseconds before or after the irq handler runs,  
>> so if you
>> happen to query the hardware counter while the gpu is inside the  
>> vblank you
>> can't be sure if you picked up the old count or the new count for  
>> that
>> vblank.
>
> That's disgusting.  Does this affect many GPUs?  (I can't imagine why
> any sensible implementation wouldn't guarantee that the counter
> increments just before the IRQ.)

;-). I don't know, but at least on the tested R500 and R600 class  
Radeon's, this was the case, so i assume it's at least this way on  
many radeon gpu's (probably all avivo parts?) out there. We don't  
have any evergreen gpu's yet in our lab so i don't know how the more  
recent parts behave. Also it doesn't matter

I guess it's also a matter of definition when a new video frame  
starts? Leading edge / trailing edge of vblank? Start of vsync?  
Something else?

>>
>> This only matters during vblank disable. For that reason it's not  
>> such a
>> good idea to disable vblank irq's from within the vblank irq  
>> handler. I
>> tried that and it didn't work well --> When doing it from within  
>> irq you
>> basically maximize the chance of hitting the time window when the  
>> race can
>> happen. Delaying within the irq handler by a millisecond would fix  
>> that, but
>> that's not what we want.
>>
>> Having the disable in a function triggered by a timer like now is  
>> the most
>> simple solution i could come up with. There we can burn a few dozen
>> microseconds if neccessary to remove this race.
>
> Maybe I'm missing something, but other than the race above (which
> seems like it shouldn't exist on sane hardware), the new code seems
> more complicated than necessary.

I don't think it's more complicated than necessary for what it tries  
to achieve, but of course i'm a bit biased. It also started off more  
simple and grew a bit when i found new issues with the tested gpu's.

The aim is to fix a couple of real races and to make vblank counts  
and timestamps as trustworthy and oml_sync_control spec-conformant  
(see http://www.opengl.org/registry/specs/OML/glx_sync_control.txt)  
as possible. It only consumes a few additional bytes of memory  
(approx. 40 bytes) per crtc, doesn't use excessive time inside the  
irq handler and tries to avoid taking locks  that are shared between  
irq and non-irq context to avoid delays in irq execution, also if  
used with a kernel with the preempt_rt patch applied (important for  
my use case and other hard realtime apps). It's pretty self-contained  
and because most of it is driver-independent it can handle similar  
issues on different gpu's and kms drivers without the need for us to  
check each single gpu + driver combo if it has such issues or not.

1. There's the hardware vblank counter race, and it's there on lots  
of existing hardware, regardless if this is sane or not, so it needs  
to be handled.

2. There are gpu's firing spurious vblank irq's as soon as you enable  
irq's -- you need to filter those out, otherwise counts and  
timestamps will be wrong for at least one refresh cycle after vblank  
irq enable. You don't know when these redundant ones get delivered or  
if they get delivered at all because a real vblank irq enable gets  
triggered by some userspace calls, not locked to the video refresh  
cycle and because the enable code itself holds spin_lock_irqsave  
locks and may or may not (depending on number of cores and irq  
routing) prevent delivery of the vblank irq's concurrent to its own  
execution -> a possible race between the drm_handle_vblank() routine  
and the drm_update_vblank_count() routine each time you call  
drm_vblank_get().

3. 

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-27 Thread Ville Syrjälä
On Mon, Dec 27, 2010 at 12:58:10AM +0100, Mario Kleiner wrote:
 
 2. There are gpu's firing spurious vblank irq's as soon as you enable  
 irq's

You're sure this isn't simply a matter of the driver forgetting to ack
the irq just before enabling it?

-- 
Ville Syrjälä
syrj...@sci.fi
http://www.sci.fi/~syrjala/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-27 Thread Mario Kleiner


On Dec 27, 2010, at 12:16 PM, Ville Syrjälä wrote:


On Mon, Dec 27, 2010 at 12:58:10AM +0100, Mario Kleiner wrote:


2. There are gpu's firing spurious vblank irq's as soon as you enable
irq's


You're sure this isn't simply a matter of the driver forgetting to ack
the irq just before enabling it?



Good point. This was on radeon. I can't remember for certain if it  
happened always, or only frequently. I can check that later this week  
when i'm back at the test machine.


Anyway, it's good to be robust against such problems, regardless if  
it is gpu quirks or driver bugs. The current implementation would  
filter the redundant vblank irq and DRM_DEBUG a message if the  
drm.debug parameter is set.


thanks,
-mario


--
Ville Syrjälä
syrj...@sci.fi
http://www.sci.fi/~syrjala/


*
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.klei...@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:+49 (0)7071/601-616
www:http://www.kyb.tuebingen.mpg.de/~kleinerm
*
For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled.
(Richard Feynman)

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-26 Thread Andrew Lutomirski
On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
 wrote:
>
> There's a new drm module parameter for selecting the timeout: echo 50 >
> /sys/module/drm/parameters/vblankoffdelay
> would set the timeout to 50 msecs. A setting of zero will disable the timer,
> so vblank irq's would stay on all the time.
>
> The default setting is still 5000 msecs as before, but reducing this to 100
> msecs wouldn't be a real problem imho. At least i didn't observe any
> miscounting during extensive testing with 100 msecs.
>
> The patches in drm-next fix a couple of races that i observed on intel and
> radeon during testing and a few that i didn't see but that i could imagine
> happening. It tries to make sure that the saved final count at vblank irq
> disable of the software vblank_count and the gpu counter are consistent - no
> off by one errors. They also try to detect and filter out spurious vblank
> interrupts at vblank enable time, e.g., on the radeon.
>
> There's still one possible race in the disable path which i will try to fix:
> We don't know when exactly the hardware counter increments wrt. the
> processing of the vblank interrupt - it could increment a few
> (dozen/hundred) microseconds before or after the irq handler runs, so if you
> happen to query the hardware counter while the gpu is inside the vblank you
> can't be sure if you picked up the old count or the new count for that
> vblank.

That's disgusting.  Does this affect many GPUs?  (I can't imagine why
any sensible implementation wouldn't guarantee that the counter
increments just before the IRQ.)

>
> This only matters during vblank disable. For that reason it's not such a
> good idea to disable vblank irq's from within the vblank irq handler. I
> tried that and it didn't work well --> When doing it from within irq you
> basically maximize the chance of hitting the time window when the race can
> happen. Delaying within the irq handler by a millisecond would fix that, but
> that's not what we want.
>
> Having the disable in a function triggered by a timer like now is the most
> simple solution i could come up with. There we can burn a few dozen
> microseconds if neccessary to remove this race.

Maybe I'm missing something, but other than the race above (which
seems like it shouldn't exist on sane hardware), the new code seems
more complicated than necessary.

Why not do nothing at all on vblank disable and, on enable, do this:

Call get_vblank_counter and declare the return value to be correct.
On each vblank IRQ (or maybe just the first one after enable), read
the counter again and if it matches the cached value, then assume we
raced on enable (i.e. we enabled right at the beginning of the vblank
interval, and this interrupt is redundant).  In that case, do nothing.
 Otherwise increment the cached value by one.

On hardware with the silly race where get_vblank_counter in the IRQ
can return a number one too low, use the scanout pos to fix it (i.e.
if the scanout position is at the end of the frame, assume we hit that
race and increment by 1).

This means that it would be safe to disable vblanks from any context
at all, because it has no effect other than turning off the interupt.

--Andy

>
> There could be other races that i couldn't think of and that i didn't see
> during my testing with my 2 radeons and 1 intel gpu. Therefore i think we
> should keep vblank irq's enabled for at least 2 or maybe 3 refresh cycles if
> they aren't used by anyone. Apps that want to schedule swaps very precisely
> and the ddx/drm/kms-driver itself do multiple queries in quick succession
> for a typical swapbuffers call, i.e., drm_vblank_get() -> query ->
> drm_vblank_put(), so on an otherwise idle graphics system the refcount will
> toggle between zero and one multiple times during a swap, usually within a
> few milliseconds. If the timeout is big enough so that irq's don't get
> disabled within such a sequence of toggles, even if there's a bit of
> scheduling delay for the x-server or client, then a client will see at least
> consistent vblank count during a swap, even if there are still some races
> hiding somewhere that we don't handle properly. That should be good enough,
> and paranoid clients can always increase the timeout value or set it to
> infinite.
>
> E.g., my toolkit schedules a swap for a specific system time like this:
>
> 1. glXGetSyncValuesOML(... _msc, _ust);
> 2. calculate target_msc based on user provided swap deadline t and
> (base_msc, base_ust) as a baseline.
> 3. glXSwapBuffersMscOML(, target_msc,...);
> 4. glXWaitForSbcOML() or use Intel_swap_events for retrieving the true msc
> and ust of swap completion.
>
> => Doesn't matter if there would be an off-by-one error in vblank counting
> due to an unknown race, as long as it doesn't happen between 1. and 4. As
> long as there aren't any client/x-server scheduling delays between step 1
> and 3 of more than /sys/module/drm/parameters/vblankoffdelay msecs, nothing
> can go wrong even if there are 

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-26 Thread Andrew Lutomirski
On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
mario.klei...@tuebingen.mpg.de wrote:

 There's a new drm module parameter for selecting the timeout: echo 50 
 /sys/module/drm/parameters/vblankoffdelay
 would set the timeout to 50 msecs. A setting of zero will disable the timer,
 so vblank irq's would stay on all the time.

 The default setting is still 5000 msecs as before, but reducing this to 100
 msecs wouldn't be a real problem imho. At least i didn't observe any
 miscounting during extensive testing with 100 msecs.

 The patches in drm-next fix a couple of races that i observed on intel and
 radeon during testing and a few that i didn't see but that i could imagine
 happening. It tries to make sure that the saved final count at vblank irq
 disable of the software vblank_count and the gpu counter are consistent - no
 off by one errors. They also try to detect and filter out spurious vblank
 interrupts at vblank enable time, e.g., on the radeon.

 There's still one possible race in the disable path which i will try to fix:
 We don't know when exactly the hardware counter increments wrt. the
 processing of the vblank interrupt - it could increment a few
 (dozen/hundred) microseconds before or after the irq handler runs, so if you
 happen to query the hardware counter while the gpu is inside the vblank you
 can't be sure if you picked up the old count or the new count for that
 vblank.

That's disgusting.  Does this affect many GPUs?  (I can't imagine why
any sensible implementation wouldn't guarantee that the counter
increments just before the IRQ.)


 This only matters during vblank disable. For that reason it's not such a
 good idea to disable vblank irq's from within the vblank irq handler. I
 tried that and it didn't work well -- When doing it from within irq you
 basically maximize the chance of hitting the time window when the race can
 happen. Delaying within the irq handler by a millisecond would fix that, but
 that's not what we want.

 Having the disable in a function triggered by a timer like now is the most
 simple solution i could come up with. There we can burn a few dozen
 microseconds if neccessary to remove this race.

Maybe I'm missing something, but other than the race above (which
seems like it shouldn't exist on sane hardware), the new code seems
more complicated than necessary.

Why not do nothing at all on vblank disable and, on enable, do this:

Call get_vblank_counter and declare the return value to be correct.
On each vblank IRQ (or maybe just the first one after enable), read
the counter again and if it matches the cached value, then assume we
raced on enable (i.e. we enabled right at the beginning of the vblank
interval, and this interrupt is redundant).  In that case, do nothing.
 Otherwise increment the cached value by one.

On hardware with the silly race where get_vblank_counter in the IRQ
can return a number one too low, use the scanout pos to fix it (i.e.
if the scanout position is at the end of the frame, assume we hit that
race and increment by 1).

This means that it would be safe to disable vblanks from any context
at all, because it has no effect other than turning off the interupt.

--Andy


 There could be other races that i couldn't think of and that i didn't see
 during my testing with my 2 radeons and 1 intel gpu. Therefore i think we
 should keep vblank irq's enabled for at least 2 or maybe 3 refresh cycles if
 they aren't used by anyone. Apps that want to schedule swaps very precisely
 and the ddx/drm/kms-driver itself do multiple queries in quick succession
 for a typical swapbuffers call, i.e., drm_vblank_get() - query -
 drm_vblank_put(), so on an otherwise idle graphics system the refcount will
 toggle between zero and one multiple times during a swap, usually within a
 few milliseconds. If the timeout is big enough so that irq's don't get
 disabled within such a sequence of toggles, even if there's a bit of
 scheduling delay for the x-server or client, then a client will see at least
 consistent vblank count during a swap, even if there are still some races
 hiding somewhere that we don't handle properly. That should be good enough,
 and paranoid clients can always increase the timeout value or set it to
 infinite.

 E.g., my toolkit schedules a swap for a specific system time like this:

 1. glXGetSyncValuesOML(... base_msc, base_ust);
 2. calculate target_msc based on user provided swap deadline t and
 (base_msc, base_ust) as a baseline.
 3. glXSwapBuffersMscOML(, target_msc,...);
 4. glXWaitForSbcOML() or use Intel_swap_events for retrieving the true msc
 and ust of swap completion.

 = Doesn't matter if there would be an off-by-one error in vblank counting
 due to an unknown race, as long as it doesn't happen between 1. and 4. As
 long as there aren't any client/x-server scheduling delays between step 1
 and 3 of more than /sys/module/drm/parameters/vblankoffdelay msecs, nothing
 can go wrong even if there are race conditions left in that area.


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-26 Thread Mario Kleiner

On Dec 26, 2010, at 3:53 PM, Andrew Lutomirski wrote:


On Wed, Dec 22, 2010 at 4:06 PM, Mario Kleiner
mario.klei...@tuebingen.mpg.de wrote:


There's a new drm module parameter for selecting the timeout: echo  
50 

/sys/module/drm/parameters/vblankoffdelay
would set the timeout to 50 msecs. A setting of zero will disable  
the timer,

so vblank irq's would stay on all the time.

The default setting is still 5000 msecs as before, but reducing  
this to 100

msecs wouldn't be a real problem imho. At least i didn't observe any
miscounting during extensive testing with 100 msecs.

The patches in drm-next fix a couple of races that i observed on  
intel and
radeon during testing and a few that i didn't see but that i could  
imagine
happening. It tries to make sure that the saved final count at  
vblank irq
disable of the software vblank_count and the gpu counter are  
consistent - no
off by one errors. They also try to detect and filter out spurious  
vblank

interrupts at vblank enable time, e.g., on the radeon.

There's still one possible race in the disable path which i will  
try to fix:

We don't know when exactly the hardware counter increments wrt. the
processing of the vblank interrupt - it could increment a few
(dozen/hundred) microseconds before or after the irq handler runs,  
so if you
happen to query the hardware counter while the gpu is inside the  
vblank you
can't be sure if you picked up the old count or the new count for  
that

vblank.


That's disgusting.  Does this affect many GPUs?  (I can't imagine why
any sensible implementation wouldn't guarantee that the counter
increments just before the IRQ.)


;-). I don't know, but at least on the tested R500 and R600 class  
Radeon's, this was the case, so i assume it's at least this way on  
many radeon gpu's (probably all avivo parts?) out there. We don't  
have any evergreen gpu's yet in our lab so i don't know how the more  
recent parts behave. Also it doesn't matter


I guess it's also a matter of definition when a new video frame  
starts? Leading edge / trailing edge of vblank? Start of vsync?  
Something else?




This only matters during vblank disable. For that reason it's not  
such a
good idea to disable vblank irq's from within the vblank irq  
handler. I
tried that and it didn't work well -- When doing it from within  
irq you
basically maximize the chance of hitting the time window when the  
race can
happen. Delaying within the irq handler by a millisecond would fix  
that, but

that's not what we want.

Having the disable in a function triggered by a timer like now is  
the most

simple solution i could come up with. There we can burn a few dozen
microseconds if neccessary to remove this race.


Maybe I'm missing something, but other than the race above (which
seems like it shouldn't exist on sane hardware), the new code seems
more complicated than necessary.


I don't think it's more complicated than necessary for what it tries  
to achieve, but of course i'm a bit biased. It also started off more  
simple and grew a bit when i found new issues with the tested gpu's.


The aim is to fix a couple of real races and to make vblank counts  
and timestamps as trustworthy and oml_sync_control spec-conformant  
(see http://www.opengl.org/registry/specs/OML/glx_sync_control.txt)  
as possible. It only consumes a few additional bytes of memory  
(approx. 40 bytes) per crtc, doesn't use excessive time inside the  
irq handler and tries to avoid taking locks  that are shared between  
irq and non-irq context to avoid delays in irq execution, also if  
used with a kernel with the preempt_rt patch applied (important for  
my use case and other hard realtime apps). It's pretty self-contained  
and because most of it is driver-independent it can handle similar  
issues on different gpu's and kms drivers without the need for us to  
check each single gpu + driver combo if it has such issues or not.


1. There's the hardware vblank counter race, and it's there on lots  
of existing hardware, regardless if this is sane or not, so it needs  
to be handled.


2. There are gpu's firing spurious vblank irq's as soon as you enable  
irq's -- you need to filter those out, otherwise counts and  
timestamps will be wrong for at least one refresh cycle after vblank  
irq enable. You don't know when these redundant ones get delivered or  
if they get delivered at all because a real vblank irq enable gets  
triggered by some userspace calls, not locked to the video refresh  
cycle and because the enable code itself holds spin_lock_irqsave  
locks and may or may not (depending on number of cores and irq  
routing) prevent delivery of the vblank irq's concurrent to its own  
execution - a possible race between the drm_handle_vblank() routine  
and the drm_update_vblank_count() routine each time you call  
drm_vblank_get().


3. There's gpu's sometimes, but not on other times, firing the irq  
too early (1-3 scanlines observed on radeon's) so you get your 

[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-22 Thread Mario Kleiner
On Dec 22, 2010, at 6:23 PM, Jesse Barnes wrote:

> On Wed, 22 Dec 2010 05:36:13 +0100
> Mario Kleiner  wrote:
>
>>>  
>>> --
>>>
>>> Message: 1
>>> Date: Mon, 20 Dec 2010 19:23:40 -0800
>>> From: Keith Packard 
>>> Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
>>> To: Andy Lutomirski , Jesse Barnes
>>> ,  Chris Wilson >> wilson.co.uk>,
>>> David Airlie 
>>> Cc: intel-gfx at lists.freedesktop.org, dri-devel at lists.freedesktop.org
>>> Message-ID: 
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski 
>>> wrote:
>>>
>>>> Enabling and disabling the vblank interrupt (on devices that
>>>> support it) is cheap.  So disable it quickly after each
>>>> interrupt.
>>>
>>> So, the concern (and reason for the original design) was that you
>>> might
>>> lose count of vblank interrupts as vblanks are counted differently
>>> while
>>> off than while on. If vblank interrupts get enabled near the  
>>> interrupt
>>> time, is it possible that you'll end up off by one somehow?
>>>
>>> Leaving them enabled for 'a while' was supposed to reduce the
>>> impact of
>>> this potential error.
>>>
>>> -- 
>>> keith.packard at intel.com
>>> -- next part --
>>> A non-text attachment was scrubbed...
>>> Name: not available
>>> Type: application/pgp-signature
>>> Size: 189 bytes
>>> Desc: not available
>>> URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/
>>> 20101220/105a9fb6/attachment-0001.pgp>
>>>
>>> --
>>>
>>> Message: 2
>>> Date: Mon, 20 Dec 2010 22:34:12 -0500
>>> From: Andrew Lutomirski 
>>> Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
>>> To: Keith Packard 
>>> Cc: intel-gfx at lists.freedesktop.org, dri-devel at lists.freedesktop.org
>>> Message-ID:
>>> <AANLkTik-1zni1DdS6i+1ARoenx6p=9jQAAiA6t5uGg_K at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard 
>>> wrote:
>>>> On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski 
>>>> wrote:
>>>>
>>>>> Enabling and disabling the vblank interrupt (on devices that
>>>>> support it) is cheap. ?So disable it quickly after each
>>>>> interrupt.
>>>>
>>>> So, the concern (and reason for the original design) was that you
>>>> might
>>>> lose count of vblank interrupts as vblanks are counted differently
>>>> while
>>>> off than while on. If vblank interrupts get enabled near the
>>>> interrupt
>>>> time, is it possible that you'll end up off by one somehow?
>>>
>>> So the race is:
>>>
>>> 1. Vblank happens.
>>> 2. vblank_get runs, reads hardware counter, and enables the  
>>> interrupt
>>> (and maybe not quite in that order)
>>> 3. Interrupt fires and increments the counter.  Now the counter is
>>> one too high.
>>>
>>> What if we read the hardware counter from the IRQ the first time  
>>> that
>>> it fires after being enabled?  That way, if the hardware and  
>>> software
>>> counters match (mod 2^23 or whatever the magic number is), we don't
>>> increment the counter.
>>>
>>>>
>>>> Leaving them enabled for 'a while' was supposed to reduce the
>>>> impact of
>>>> this potential error.
>>>>
>>>
>>> Fair enough,
>>>
>>> But five seconds is a *long* time, and anything short enough that  
>>> the
>>> interrupt actually gets turned off in normal use risks the same  
>>> race.
>>>
>>> --Andy
>>>
>>>
>>> --
>>>
>>> Message: 3
>>> Date: Mon, 20 Dec 2010 19:47:11 -0800
>>> From: Keith Packard 
>>> Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
>>> To: Andrew Lutomirski 
>>> Cc: intel-gfx at lists.freedesktop.org, dri-devel at lists.freedesktop.org
>>> Message-ID: 
>>> Content-Type: text/plain; charset="us

[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-22 Thread Jesse Barnes
On Wed, 22 Dec 2010 05:36:13 +0100
Mario Kleiner  wrote:

> > --
> >
> > Message: 1
> > Date: Mon, 20 Dec 2010 19:23:40 -0800
> > From: Keith Packard 
> > Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
> > To: Andy Lutomirski , Jesse Barnes
> > ,  Chris Wilson  > chris-wilson.co.uk>,
> > David Airlie 
> > Cc: intel-gfx at lists.freedesktop.org, dri-devel at lists.freedesktop.org
> > Message-ID: 
> > Content-Type: text/plain; charset="us-ascii"
> >
> > On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski   
> > wrote:
> >
> >> Enabling and disabling the vblank interrupt (on devices that
> >> support it) is cheap.  So disable it quickly after each
> >> interrupt.
> >
> > So, the concern (and reason for the original design) was that you  
> > might
> > lose count of vblank interrupts as vblanks are counted differently  
> > while
> > off than while on. If vblank interrupts get enabled near the interrupt
> > time, is it possible that you'll end up off by one somehow?
> >
> > Leaving them enabled for 'a while' was supposed to reduce the  
> > impact of
> > this potential error.
> >
> > -- 
> > keith.packard at intel.com
> > -- next part --
> > A non-text attachment was scrubbed...
> > Name: not available
> > Type: application/pgp-signature
> > Size: 189 bytes
> > Desc: not available
> > URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/ 
> > 20101220/105a9fb6/attachment-0001.pgp>
> >
> > --
> >
> > Message: 2
> > Date: Mon, 20 Dec 2010 22:34:12 -0500
> > From: Andrew Lutomirski 
> > Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
> > To: Keith Packard 
> > Cc: intel-gfx at lists.freedesktop.org, dri-devel at lists.freedesktop.org
> > Message-ID:
> > <AANLkTik-1zni1DdS6i+1ARoenx6p=9jQAAiA6t5uGg_K at mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard   
> > wrote:
> >> On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski   
> >> wrote:
> >>
> >>> Enabling and disabling the vblank interrupt (on devices that
> >>> support it) is cheap. ?So disable it quickly after each
> >>> interrupt.
> >>
> >> So, the concern (and reason for the original design) was that you  
> >> might
> >> lose count of vblank interrupts as vblanks are counted differently  
> >> while
> >> off than while on. If vblank interrupts get enabled near the  
> >> interrupt
> >> time, is it possible that you'll end up off by one somehow?
> >
> > So the race is:
> >
> > 1. Vblank happens.
> > 2. vblank_get runs, reads hardware counter, and enables the interrupt
> > (and maybe not quite in that order)
> > 3. Interrupt fires and increments the counter.  Now the counter is  
> > one too high.
> >
> > What if we read the hardware counter from the IRQ the first time that
> > it fires after being enabled?  That way, if the hardware and software
> > counters match (mod 2^23 or whatever the magic number is), we don't
> > increment the counter.
> >
> >>
> >> Leaving them enabled for 'a while' was supposed to reduce the  
> >> impact of
> >> this potential error.
> >>
> >
> > Fair enough,
> >
> > But five seconds is a *long* time, and anything short enough that the
> > interrupt actually gets turned off in normal use risks the same race.
> >
> > --Andy
> >
> >
> > --
> >
> > Message: 3
> > Date: Mon, 20 Dec 2010 19:47:11 -0800
> > From: Keith Packard 
> > Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
> > To: Andrew Lutomirski 
> > Cc: intel-gfx at lists.freedesktop.org, dri-devel at lists.freedesktop.org
> > Message-ID: 
> > Content-Type: text/plain; charset="us-ascii"
> >
> > On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  
> >  wrote:
> >
> >> But five seconds is a *long* time, and anything short enough that the
> >> interrupt actually gets turned off in normal use risks the same race.
> >
> > Right, so eliminating any race seems like the basic requirement. Can
> > that be done?
> >
> > -- 
> > keith.packard at intel.com
> > 

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-22 Thread Jesse Barnes
On Wed, 22 Dec 2010 05:36:13 +0100
Mario Kleiner mario.klei...@tuebingen.mpg.de wrote:

  --
 
  Message: 1
  Date: Mon, 20 Dec 2010 19:23:40 -0800
  From: Keith Packard kei...@keithp.com
  Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
  To: Andy Lutomirski l...@mit.edu, Jesse Barnes
  jbar...@virtuousgeek.org, Chris Wilson ch...@chris-wilson.co.uk,
  David Airlie airl...@linux.ie
  Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
  Message-ID: yunfwtrrepv@aiko.keithp.com
  Content-Type: text/plain; charset=us-ascii
 
  On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu  
  wrote:
 
  Enabling and disabling the vblank interrupt (on devices that
  support it) is cheap.  So disable it quickly after each
  interrupt.
 
  So, the concern (and reason for the original design) was that you  
  might
  lose count of vblank interrupts as vblanks are counted differently  
  while
  off than while on. If vblank interrupts get enabled near the interrupt
  time, is it possible that you'll end up off by one somehow?
 
  Leaving them enabled for 'a while' was supposed to reduce the  
  impact of
  this potential error.
 
  -- 
  keith.pack...@intel.com
  -- next part --
  A non-text attachment was scrubbed...
  Name: not available
  Type: application/pgp-signature
  Size: 189 bytes
  Desc: not available
  URL: http://lists.freedesktop.org/archives/dri-devel/attachments/ 
  20101220/105a9fb6/attachment-0001.pgp
 
  --
 
  Message: 2
  Date: Mon, 20 Dec 2010 22:34:12 -0500
  From: Andrew Lutomirski l...@mit.edu
  Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
  To: Keith Packard kei...@keithp.com
  Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
  Message-ID:
  aanlktik-1zni1dds6i+1aroenx6p=9jqaaia6t5ug...@mail.gmail.com
  Content-Type: text/plain; charset=ISO-8859-1
 
  On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard kei...@keithp.com  
  wrote:
  On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu  
  wrote:
 
  Enabling and disabling the vblank interrupt (on devices that
  support it) is cheap. ?So disable it quickly after each
  interrupt.
 
  So, the concern (and reason for the original design) was that you  
  might
  lose count of vblank interrupts as vblanks are counted differently  
  while
  off than while on. If vblank interrupts get enabled near the  
  interrupt
  time, is it possible that you'll end up off by one somehow?
 
  So the race is:
 
  1. Vblank happens.
  2. vblank_get runs, reads hardware counter, and enables the interrupt
  (and maybe not quite in that order)
  3. Interrupt fires and increments the counter.  Now the counter is  
  one too high.
 
  What if we read the hardware counter from the IRQ the first time that
  it fires after being enabled?  That way, if the hardware and software
  counters match (mod 2^23 or whatever the magic number is), we don't
  increment the counter.
 
 
  Leaving them enabled for 'a while' was supposed to reduce the  
  impact of
  this potential error.
 
 
  Fair enough,
 
  But five seconds is a *long* time, and anything short enough that the
  interrupt actually gets turned off in normal use risks the same race.
 
  --Andy
 
 
  --
 
  Message: 3
  Date: Mon, 20 Dec 2010 19:47:11 -0800
  From: Keith Packard kei...@keithp.com
  Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
  To: Andrew Lutomirski l...@mit.edu
  Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
  Message-ID: yunbp4frdmo@aiko.keithp.com
  Content-Type: text/plain; charset=us-ascii
 
  On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  
  l...@mit.edu wrote:
 
  But five seconds is a *long* time, and anything short enough that the
  interrupt actually gets turned off in normal use risks the same race.
 
  Right, so eliminating any race seems like the basic requirement. Can
  that be done?
 
  -- 
  keith.pack...@intel.com
  -- next part --
  A non-text attachment was scrubbed...
  Name: not available
  Type: application/pgp-signature
  Size: 189 bytes
  Desc: not available
  URL: http://lists.freedesktop.org/archives/dri-devel/attachments/ 
  20101220/5ca3b674/attachment-0001.pgp
 
  --
 
  Message: 4
  Date: Mon, 20 Dec 2010 22:55:46 -0500
  From: Andrew Lutomirski l...@mit.edu
  Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
  To: Keith Packard kei...@keithp.com
  Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
  Message-ID:
  aanlktimk6rlkr8lt76cr8nclw_3kax6dqa+=id9_g...@mail.gmail.com
  Content-Type: text/plain; charset=ISO-8859-1
 
  On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard kei...@keithp.com  
  wrote:
  On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  
  l

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-22 Thread Mario Kleiner

On Dec 22, 2010, at 6:23 PM, Jesse Barnes wrote:


On Wed, 22 Dec 2010 05:36:13 +0100
Mario Kleiner mario.klei...@tuebingen.mpg.de wrote:

 
--


Message: 1
Date: Mon, 20 Dec 2010 19:23:40 -0800
From: Keith Packard kei...@keithp.com
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andy Lutomirski l...@mit.edu, Jesse Barnes
	jbar...@virtuousgeek.org,	Chris Wilson ch...@chris- 
wilson.co.uk,

David Airlie airl...@linux.ie
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: yunfwtrrepv@aiko.keithp.com
Content-Type: text/plain; charset=us-ascii

On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu
wrote:


Enabling and disabling the vblank interrupt (on devices that
support it) is cheap.  So disable it quickly after each
interrupt.


So, the concern (and reason for the original design) was that you
might
lose count of vblank interrupts as vblanks are counted differently
while
off than while on. If vblank interrupts get enabled near the  
interrupt

time, is it possible that you'll end up off by one somehow?

Leaving them enabled for 'a while' was supposed to reduce the
impact of
this potential error.

--
keith.pack...@intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.freedesktop.org/archives/dri-devel/attachments/
20101220/105a9fb6/attachment-0001.pgp

--

Message: 2
Date: Mon, 20 Dec 2010 22:34:12 -0500
From: Andrew Lutomirski l...@mit.edu
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Keith Packard kei...@keithp.com
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID:
aanlktik-1zni1dds6i+1aroenx6p=9jqaaia6t5ug...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard kei...@keithp.com
wrote:

On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu
wrote:


Enabling and disabling the vblank interrupt (on devices that
support it) is cheap. ?So disable it quickly after each
interrupt.


So, the concern (and reason for the original design) was that you
might
lose count of vblank interrupts as vblanks are counted differently
while
off than while on. If vblank interrupts get enabled near the
interrupt
time, is it possible that you'll end up off by one somehow?


So the race is:

1. Vblank happens.
2. vblank_get runs, reads hardware counter, and enables the  
interrupt

(and maybe not quite in that order)
3. Interrupt fires and increments the counter.  Now the counter is
one too high.

What if we read the hardware counter from the IRQ the first time  
that
it fires after being enabled?  That way, if the hardware and  
software

counters match (mod 2^23 or whatever the magic number is), we don't
increment the counter.



Leaving them enabled for 'a while' was supposed to reduce the
impact of
this potential error.



Fair enough,

But five seconds is a *long* time, and anything short enough that  
the
interrupt actually gets turned off in normal use risks the same  
race.


--Andy


--

Message: 3
Date: Mon, 20 Dec 2010 19:47:11 -0800
From: Keith Packard kei...@keithp.com
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andrew Lutomirski l...@mit.edu
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: yunbp4frdmo@aiko.keithp.com
Content-Type: text/plain; charset=us-ascii

On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski
l...@mit.edu wrote:

But five seconds is a *long* time, and anything short enough  
that the
interrupt actually gets turned off in normal use risks the same  
race.


Right, so eliminating any race seems like the basic requirement. Can
that be done?

--
keith.pack...@intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.freedesktop.org/archives/dri-devel/attachments/
20101220/5ca3b674/attachment-0001.pgp

--

Message: 4
Date: Mon, 20 Dec 2010 22:55:46 -0500
From: Andrew Lutomirski l...@mit.edu
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Keith Packard kei...@keithp.com
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID:
aanlktimk6rlkr8lt76cr8nclw_3kax6dqa+=id9_g...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard kei...@keithp.com
wrote:

On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski
l...@mit.edu wrote:


But five seconds is a *long* time, and anything short enough that
the
interrupt actually gets turned off in normal use risks the same
race.


Right, so eliminating any race seems like

Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-21 Thread Mario Kleiner

--

Message: 1
Date: Mon, 20 Dec 2010 19:23:40 -0800
From: Keith Packard kei...@keithp.com
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andy Lutomirski l...@mit.edu, Jesse Barnes
jbar...@virtuousgeek.org,   Chris Wilson 
ch...@chris-wilson.co.uk,
David Airlie airl...@linux.ie
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: yunfwtrrepv@aiko.keithp.com
Content-Type: text/plain; charset=us-ascii

On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu  
wrote:



Enabling and disabling the vblank interrupt (on devices that
support it) is cheap.  So disable it quickly after each
interrupt.


So, the concern (and reason for the original design) was that you  
might
lose count of vblank interrupts as vblanks are counted differently  
while

off than while on. If vblank interrupts get enabled near the interrupt
time, is it possible that you'll end up off by one somehow?

Leaving them enabled for 'a while' was supposed to reduce the  
impact of

this potential error.

--
keith.pack...@intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.freedesktop.org/archives/dri-devel/attachments/ 
20101220/105a9fb6/attachment-0001.pgp


--

Message: 2
Date: Mon, 20 Dec 2010 22:34:12 -0500
From: Andrew Lutomirski l...@mit.edu
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Keith Packard kei...@keithp.com
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID:
aanlktik-1zni1dds6i+1aroenx6p=9jqaaia6t5ug...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard kei...@keithp.com  
wrote:
On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu  
wrote:



Enabling and disabling the vblank interrupt (on devices that
support it) is cheap. ?So disable it quickly after each
interrupt.


So, the concern (and reason for the original design) was that you  
might
lose count of vblank interrupts as vblanks are counted differently  
while
off than while on. If vblank interrupts get enabled near the  
interrupt

time, is it possible that you'll end up off by one somehow?


So the race is:

1. Vblank happens.
2. vblank_get runs, reads hardware counter, and enables the interrupt
(and maybe not quite in that order)
3. Interrupt fires and increments the counter.  Now the counter is  
one too high.


What if we read the hardware counter from the IRQ the first time that
it fires after being enabled?  That way, if the hardware and software
counters match (mod 2^23 or whatever the magic number is), we don't
increment the counter.



Leaving them enabled for 'a while' was supposed to reduce the  
impact of

this potential error.



Fair enough,

But five seconds is a *long* time, and anything short enough that the
interrupt actually gets turned off in normal use risks the same race.

--Andy


--

Message: 3
Date: Mon, 20 Dec 2010 19:47:11 -0800
From: Keith Packard kei...@keithp.com
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Andrew Lutomirski l...@mit.edu
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID: yunbp4frdmo@aiko.keithp.com
Content-Type: text/plain; charset=us-ascii

On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  
l...@mit.edu wrote:



But five seconds is a *long* time, and anything short enough that the
interrupt actually gets turned off in normal use risks the same race.


Right, so eliminating any race seems like the basic requirement. Can
that be done?

--
keith.pack...@intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.freedesktop.org/archives/dri-devel/attachments/ 
20101220/5ca3b674/attachment-0001.pgp


--

Message: 4
Date: Mon, 20 Dec 2010 22:55:46 -0500
From: Andrew Lutomirski l...@mit.edu
Subject: Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks
To: Keith Packard kei...@keithp.com
Cc: intel-...@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Message-ID:
aanlktimk6rlkr8lt76cr8nclw_3kax6dqa+=id9_g...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard kei...@keithp.com  
wrote:
On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  
l...@mit.edu wrote:


But five seconds is a *long* time, and anything short enough that  
the
interrupt actually gets turned off in normal use risks the same  
race.


Right, so eliminating any race seems like the basic requirement. Can
that be done?


I'll give it a shot.

Do you know if there's an existing tool to call

[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Andrew Lutomirski
On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard  wrote:
> On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski  wrote:
>
>> Enabling and disabling the vblank interrupt (on devices that
>> support it) is cheap. ?So disable it quickly after each
>> interrupt.
>
> So, the concern (and reason for the original design) was that you might
> lose count of vblank interrupts as vblanks are counted differently while
> off than while on. If vblank interrupts get enabled near the interrupt
> time, is it possible that you'll end up off by one somehow?

So the race is:

1. Vblank happens.
2. vblank_get runs, reads hardware counter, and enables the interrupt
(and maybe not quite in that order)
3. Interrupt fires and increments the counter.  Now the counter is one too high.

What if we read the hardware counter from the IRQ the first time that
it fires after being enabled?  That way, if the hardware and software
counters match (mod 2^23 or whatever the magic number is), we don't
increment the counter.

>
> Leaving them enabled for 'a while' was supposed to reduce the impact of
> this potential error.
>

Fair enough,

But five seconds is a *long* time, and anything short enough that the
interrupt actually gets turned off in normal use risks the same race.

--Andy


[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Jesse Barnes
On Mon, 20 Dec 2010 22:55:46 -0500
Andrew Lutomirski  wrote:

> On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard  wrote:
> > On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  
> > wrote:
> >
> >> But five seconds is a *long* time, and anything short enough that the
> >> interrupt actually gets turned off in normal use risks the same race.
> >
> > Right, so eliminating any race seems like the basic requirement. Can
> > that be done?
> 
> I'll give it a shot.
> 
> Do you know if there's an existing tool to call drm_wait_vblank from
> userspace for testing?  I know approximately nothing about libdrm or
> any userspace graphics stuff whatsoever.

Yeah, libdrm has a test program called vbltest; it should do what you
need.

-- 
Jesse Barnes, Intel Open Source Technology Center


[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Keith Packard
On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski  wrote:

> But five seconds is a *long* time, and anything short enough that the
> interrupt actually gets turned off in normal use risks the same race.

Right, so eliminating any race seems like the basic requirement. Can
that be done?

-- 
keith.packard at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Keith Packard
On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski  wrote:

> Enabling and disabling the vblank interrupt (on devices that
> support it) is cheap.  So disable it quickly after each
> interrupt.

So, the concern (and reason for the original design) was that you might
lose count of vblank interrupts as vblanks are counted differently while
off than while on. If vblank interrupts get enabled near the interrupt
time, is it possible that you'll end up off by one somehow?

Leaving them enabled for 'a while' was supposed to reduce the impact of
this potential error.

-- 
keith.packard at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Keith Packard
On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu wrote:

 Enabling and disabling the vblank interrupt (on devices that
 support it) is cheap.  So disable it quickly after each
 interrupt.

So, the concern (and reason for the original design) was that you might
lose count of vblank interrupts as vblanks are counted differently while
off than while on. If vblank interrupts get enabled near the interrupt
time, is it possible that you'll end up off by one somehow?

Leaving them enabled for 'a while' was supposed to reduce the impact of
this potential error.

-- 
keith.pack...@intel.com


pgp9DrqzaacH3.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Andrew Lutomirski
On Mon, Dec 20, 2010 at 10:23 PM, Keith Packard kei...@keithp.com wrote:
 On Mon, 20 Dec 2010 14:00:54 -0500, Andy Lutomirski l...@mit.edu wrote:

 Enabling and disabling the vblank interrupt (on devices that
 support it) is cheap.  So disable it quickly after each
 interrupt.

 So, the concern (and reason for the original design) was that you might
 lose count of vblank interrupts as vblanks are counted differently while
 off than while on. If vblank interrupts get enabled near the interrupt
 time, is it possible that you'll end up off by one somehow?

So the race is:

1. Vblank happens.
2. vblank_get runs, reads hardware counter, and enables the interrupt
(and maybe not quite in that order)
3. Interrupt fires and increments the counter.  Now the counter is one too high.

What if we read the hardware counter from the IRQ the first time that
it fires after being enabled?  That way, if the hardware and software
counters match (mod 2^23 or whatever the magic number is), we don't
increment the counter.


 Leaving them enabled for 'a while' was supposed to reduce the impact of
 this potential error.


Fair enough,

But five seconds is a *long* time, and anything short enough that the
interrupt actually gets turned off in normal use risks the same race.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Keith Packard
On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski l...@mit.edu wrote:

 But five seconds is a *long* time, and anything short enough that the
 interrupt actually gets turned off in normal use risks the same race.

Right, so eliminating any race seems like the basic requirement. Can
that be done?

-- 
keith.pack...@intel.com


pgpsXEi7dtYOv.pgp
Description: PGP signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Andrew Lutomirski
On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard kei...@keithp.com wrote:
 On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski l...@mit.edu wrote:

 But five seconds is a *long* time, and anything short enough that the
 interrupt actually gets turned off in normal use risks the same race.

 Right, so eliminating any race seems like the basic requirement. Can
 that be done?

I'll give it a shot.

Do you know if there's an existing tool to call drm_wait_vblank from
userspace for testing?  I know approximately nothing about libdrm or
any userspace graphics stuff whatsoever.

--Andy


 --
 keith.pack...@intel.com

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] drm: Aggressively disable vblanks

2010-12-20 Thread Jesse Barnes
On Mon, 20 Dec 2010 22:55:46 -0500
Andrew Lutomirski l...@mit.edu wrote:

 On Mon, Dec 20, 2010 at 10:47 PM, Keith Packard kei...@keithp.com wrote:
  On Mon, 20 Dec 2010 22:34:12 -0500, Andrew Lutomirski l...@mit.edu wrote:
 
  But five seconds is a *long* time, and anything short enough that the
  interrupt actually gets turned off in normal use risks the same race.
 
  Right, so eliminating any race seems like the basic requirement. Can
  that be done?
 
 I'll give it a shot.
 
 Do you know if there's an existing tool to call drm_wait_vblank from
 userspace for testing?  I know approximately nothing about libdrm or
 any userspace graphics stuff whatsoever.

Yeah, libdrm has a test program called vbltest; it should do what you
need.

-- 
Jesse Barnes, Intel Open Source Technology Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel