drm vblank regression fixes for Linux 4.4+

2016-02-08 Thread Mario Kleiner
Here is the series of patches with fixes for regressions in vblank
counting/timestamping caused by the rewrite of drm_update_vblank_count
in Linux 4.4. These are all meant for stable 4.4 and later.

I have tested them on radeon-kms and nouveau-kms by unplugging/replugging
displays, manual dpms off/on, dpms off/on due to screen blanking, system
suspend/resume, and mode setting to different resolutions and refresh rates,
checking the drm.debug logs to confirm that the large vblank counter jumps
no longer happen and the behavior of the vblank counter/ts around dpms and
modesetting is somewhat reasonable.

-mario



[PATCH 1/2] drm/radeon: Use drm_vblank_off/on to fix vblank counter trouble.

2016-02-07 Thread Mario Kleiner
I have a few simple patches which after testing seem to work well
enough and fix additional similar problems with nouveau. Got
distracted with other stuff last week. I'll try to send them out later
today when i'm at the machine.

-mario


On Sun, Feb 7, 2016 at 12:05 PM, Vlastimil Babka  wrote:
> On 01/22/2016 06:08 PM, Mario Kleiner wrote:
>> Anyway, some more hours of thinking and code browsing later, now i think
>> i have a simple and safe solution which should hopefully restore the
>> drm_vblank_pre/post_modeset behaviour with only a few lines of core
>> code. At the same time it should fix up another bug in that new
>> drm_update_vblank_count code that i just realized, in a way simple
>> enough for a stable fix.
>>
>> Now i just need to actually code and test it first.
>
> Ping, any news? :)
>


linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner


On 01/25/2016 09:32 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 07:51 PM, Daniel Vetter wrote:
>>> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>>>> Readding Daniel, which somehow got dropped from the cc.
>>>>
>>>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>>>
>>>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>>>
>>>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just 
>>>>>>>>>>>>> on IRC.
>>>>>>>>>>>>> See
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the 
>>>>>>>>>>>>> counter
>>>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary 
>>>>>>>>>>>>> after
>>>>>>>>>>>>> just a few days.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what you mean by "just", but to be clear: The 
>>>>>>>>>>> drm_vblank_on/off
>>>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It 
>>>>>>>>>>> seems
>>>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>>>
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>>>>>> current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 
>>>>>>>>>>> p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>>>>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>>>
>>>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>>>> clearly jumps backwards.
>>>>>>>>>>
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: 
>>>>>>>>>>> current=0, diff=0, hw=0 hw_last=

linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner


On 01/25/2016 07:51 PM, Daniel Vetter wrote:
> On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:
>> Readding Daniel, which somehow got dropped from the cc.
>>
>> On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
>>> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>>>
>>>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>>>
>>>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>>>
>>>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>>>> , but I can't find that in the archives, so maybe that was just on 
>>>>>>>>>>> IRC.
>>>>>>>>>>> See
>>>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the 
>>>>>>>>>>> counter
>>>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary 
>>>>>>>>>>> after
>>>>>>>>>>> just a few days.
>>>>>>>>>>
>>>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>>>
>>>>>>>>> Not sure what you mean by "just", but to be clear: The 
>>>>>>>>> drm_vblank_on/off
>>>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>>>> to happen when turning off the CRTC:
>>>>>>>>>
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>>>> current=218104694, diff=0, hw=916 hw_last=916
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 
>>>>>>>>> p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>>>
>>>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>>>> clearly jumps backwards.
>>>>>>>>
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: 
>>>>>>>>> current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: 
>>>>>>>>> current=0, diff=0, hw=0 hw_last=0
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>>>> [drm:drm_update_vblank_count] updating vblank count o

linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner
Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:
> On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
>>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>>>
>>>>>>> [ Trimming KDE folks from Cc ]
>>>>>>>
>>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>>>
>>>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>>>
>>>>>>>>> AFAIR I originally reported it in response to
>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>>>> , but I can't find that in the archives, so maybe that was just on 
>>>>>>>>> IRC.
>>>>>>>>> See
>>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>>>> . Basically, I ran into the bug fixed by your patch because the 
>>>>>>>>> counter
>>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>>>> just a few days.
>>>>>>>>
>>>>>>>> Ok, so just uncovered the overflow bug.
>>>>>>>
>>>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>>>> to happen when turning off the CRTC:
>>>>>>>
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>> current=218104694, diff=0, hw=916 hw_last=916
>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
>>>>>>> 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>>>>>
>>>>>> Not sure what bug we're talking about here, but here the hw counter
>>>>>> clearly jumps backwards.
>>>>>>
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: 
>>>>>>> current=0, diff=0, hw=0 hw_last=0
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: 
>>>>>>> current=0, diff=0, hw=0 hw_last=0
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: 
>>>>>>> current=0, diff=0, hw=0 hw_last=0
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 
>>>>>>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>>>> current=234880995, diff=16777215, hw=0 hw_last=1
>>>>>>
>>>>>> Same here.
>>>>>
>>>>> At least one of the jumps is 

linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner


On 01/25/2016 02:23 PM, Ville Syrjälä wrote:
> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:
>>
>>
>> On 01/25/2016 05:15 AM, Michel Dänzer wrote:
>>> On 23.01.2016 00:18, Ville Syrjälä wrote:
>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>>>
>>>>> [ Trimming KDE folks from Cc ]
>>>>>
>>>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>>>
>>>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>>>
>>>>>>> AFAIR I originally reported it in response to
>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>>> , but I can't find that in the archives, so maybe that was just on IRC.
>>>>>>> See
>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>>> . Basically, I ran into the bug fixed by your patch because the counter
>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>>>> just a few days.
>>>>>>
>>>>>> Ok, so just uncovered the overflow bug.
>>>>>
>>>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>>>> counter jumping bug (similar to the bug this thread is about), which
>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>>>> to happen when turning off the CRTC:
>>>>>
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>> current=218104694, diff=0, hw=916 hw_last=916
>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
>>>>> 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>>>
>>>> Not sure what bug we're talking about here, but here the hw counter
>>>> clearly jumps backwards.
>>>>
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
>>>>> diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
>>>>> diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
>>>>> diff=0, hw=0 hw_last=0
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 
>>>>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>>>> current=234880995, diff=16777215, hw=0 hw_last=1
>>>>
>>>> Same here.
>>>
>>> At least one of the jumps is expected, because this is around turning
>>> off the CRTC for DPMS off. Don't know yet why there are two jumps back
>>> though.
>>>
>>>
>>>> These things just don't happen on i915 because drm_vblank_off() and
>>>> drm_vblank_on() are always called around the times when the hw counter
>>>> might get reset. Or at least that's how it should be.
>>>
>>> Which is of course the idea of Daniel's patch (which is what I'm getting
>>> the above with) or Mario's patch as well, but clearly something's still
>>> wrong. It's certainly possible that it's something in the driver, but
>>> since calling drm_vblank_pre/post_modeset from the same places seems to
>>> work fine

linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner


On 01/25/2016 05:15 AM, Michel Dänzer wrote:
> On 23.01.2016 00:18, Ville Syrjälä wrote:
>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>
>>> [ Trimming KDE folks from Cc ]
>>>
>>> On 21.01.2016 19:09, Daniel Vetter wrote:
 On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
> On 21.01.2016 16:58, Daniel Vetter wrote:
>>
>> Can you please point me at the vblank on/off jump bug please?
>
> AFAIR I originally reported it in response to
> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
> , but I can't find that in the archives, so maybe that was just on IRC.
> See
> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
> . Basically, I ran into the bug fixed by your patch because the counter
> jumped forward on every DPMS off, so it hit the 32-bit boundary after
> just a few days.

 Ok, so just uncovered the overflow bug.
>>>
>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>> counter jumping bug (similar to the bug this thread is about), which
>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>> to happen when turning off the CRTC:
>>>
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>> current=218104694, diff=0, hw=916 hw_last=916
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
>>> 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>
>> Not sure what bug we're talking about here, but here the hw counter
>> clearly jumps backwards.
>>
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
>>> diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
>>> diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
>>> diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 
>>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>>> current=234880995, diff=16777215, hw=0 hw_last=1
>>
>> Same here.
>
> At least one of the jumps is expected, because this is around turning
> off the CRTC for DPMS off. Don't know yet why there are two jumps back
> though.
>
>
>> These things just don't happen on i915 because drm_vblank_off() and
>> drm_vblank_on() are always called around the times when the hw counter
>> might get reset. Or at least that's how it should be.
>
> Which is of course the idea of Daniel's patch (which is what I'm getting
> the above with) or Mario's patch as well, but clearly something's still
> wrong. It's certainly possible that it's something in the driver, but
> since calling drm_vblank_pre/post_modeset from the same places seems to
> work fine (ignoring the regression discussed in this thread)... Do
> drm_vblank_on/off require something else to handle this correctly?
>
>

I suspect it is because vblank_disable_and_save calls 
drm_update_vblank_count() unconditionally, even if vblank irqs are 
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
drm_vblank_off -> vblank_disable_and_save -> A pointless 
drm_update_vblank_count() while the hw counter is already reset to zero 
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video 
resolution/refresh rate, as then we only have one call into 
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call 
drm_update_vblank_count() if vblank irqs get actually disabled, not on 
no-op calls. I will try that now.

Otherwise kms drivers would have to be careful to never call 
drm_vblank_off multiple times before calling drm_vblank_on, but the help 
text to drm_vblank_on() claims that unbalanced calls to 

linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-23 Thread Mario Kleiner
On 01/22/2016 07:29 PM, Mario Kleiner wrote:
>
>
> On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>>
>>> [ Trimming KDE folks from Cc ]
>>>
>>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
>>>>> On 21.01.2016 16:58, Daniel Vetter wrote:
>>>>>>
>>>>>> Can you please point me at the vblank on/off jump bug please?
>>>>>
>>>>> AFAIR I originally reported it in response to
>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
>>>>>
>>>>> , but I can't find that in the archives, so maybe that was just on
>>>>> IRC.
>>>>> See
>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
>>>>>
>>>>> . Basically, I ran into the bug fixed by your patch because the
>>>>> counter
>>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary after
>>>>> just a few days.
>>>>
>>>> Ok, so just uncovered the overflow bug.
>>>
>>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>>> counter jumping bug (similar to the bug this thread is about), which
>>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>>> to happen when turning off the CRTC:
>>>
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=218104694, diff=0, hw=916 hw_last=916
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7
>>> p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=218104694, diff=16776301, hw=1 hw_last=916
>>
>> Not sure what bug we're talking about here, but here the hw counter
>> clearly jumps backwards.
>>
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3:
>>> current=0, diff=0, hw=0 hw_last=0
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@
>>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0:
>>> current=234880995, diff=16777215, hw=0 hw_last=1
>>
>> Same here.
>>
>> These things just don't happen on i915 because drm_vblank_off() and
>> drm_vblank_on() are always called around the times when the hw counter
>> might get reset. Or at least that's how it should be.
>>
>
> Fwiw, testing the HD-57570 single display with my patch that uses
> drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show
> hardware counter reset to zero as expected, but no jumps of software
> vblank counter. So with that vblank_off/on placement it seems to work
> nicely here.
>
> -mario
>

I spoke too early. The jump doesn't happen when i change video modes - 
video resolution / refresh rate etc, despite hw counter reset. But if i 
just disable and then reenable a display, the software counter jumps.

-mario


linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-22 Thread Mario Kleiner


On 01/22/2016 04:18 PM, Ville Syrjälä wrote:
> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:
>>
>> [ Trimming KDE folks from Cc ]
>>
>> On 21.01.2016 19:09, Daniel Vetter wrote:
>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:
 On 21.01.2016 16:58, Daniel Vetter wrote:
>
> Can you please point me at the vblank on/off jump bug please?

 AFAIR I originally reported it in response to
 http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
 , but I can't find that in the archives, so maybe that was just on IRC.
 See
 http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
 . Basically, I ran into the bug fixed by your patch because the counter
 jumped forward on every DPMS off, so it hit the 32-bit boundary after
 just a few days.
>>>
>>> Ok, so just uncovered the overflow bug.
>>
>> Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
>> counter jumping bug (similar to the bug this thread is about), which
>> exposed the overflow bug, is still alive and kicking in 4.5. It seems
>> to happen when turning off the CRTC:
>>
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>> current=218104694, diff=0, hw=916 hw_last=916
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
>> 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>> current=218104694, diff=16776301, hw=1 hw_last=916
>
> Not sure what bug we're talking about here, but here the hw counter
> clearly jumps backwards.
>
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
>> diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
>> diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3
>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
>> diff=0, hw=0 hw_last=0
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 
>> 7304.317140 -> 7304.317140 [e 0 us, 0 rep]
>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1
>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: 
>> current=234880995, diff=16777215, hw=0 hw_last=1
>
> Same here.
>
> These things just don't happen on i915 because drm_vblank_off() and
> drm_vblank_on() are always called around the times when the hw counter
> might get reset. Or at least that's how it should be.
>

Fwiw, testing the HD-57570 single display with my patch that uses 
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show 
hardware counter reset to zero as expected, but no jumps of software 
vblank counter. So with that vblank_off/on placement it seems to work 
nicely here.

-mario

>> dev->max_vblank_count = 0x, which makes the wraparound code in
>> drm_update_vblank_count a no-op. Maybe you can reproduce it if you
>> artificially set a lower max_vblank_count in the driver.
>>
>>
>> --
>> Earthling Michel Dänzer   |   http://www.amd.com
>> Libre software enthusiast | Mesa and X developer
>


[PATCH 1/2] drm/radeon: Use drm_vblank_off/on to fix vblank counter trouble.

2016-01-22 Thread Mario Kleiner
On 01/22/2016 04:17 AM, Michel Dänzer wrote:
> On 21.01.2016 18:16, Mario Kleiner wrote:
>> On 01/21/2016 09:25 AM, Michel Dänzer wrote:
>>> On 21.01.2016 17:16, Mario Kleiner wrote:
>>>>
>>>> This patch replaces calls to drm_vblank_pre/post_modeset in the
>>>> drivers dpms code with calls to drm_vblank_off/on, as recommended
>>>> for drivers with hw counters that reset to zero during modeset.
>>>
>>> Sounds like you fell for the drm_vblank_on/off propaganda. :(
>>>
>>> This was working fine with drm_vblank_pre/post_modeset, that it broke
>>> is simply a regression.
>>
>> I agree with you that pre/post modeset breakage is a regression. It's
>> just that i stumbled over the on/off stuff while searching for a
>> solution and the other sort of hacks i could think of looked similar or
>> more convoluted/hacky/fragile to me.
>
> Finding and fixing the cause of a regression isn't a hack, it's
> established procedure.
>

That's not what i meant. I meant i couldn't find something less 
complicated/risky/without new regression potential, so this looked like 
a better solution. Of course i would have tested my own patches against 
at least a couple bits of userspace (ati ddx, modesetting ddx, weston), 
i just didn't have access to the machine yesterday.

Anyway, some more hours of thinking and code browsing later, now i think 
i have a simple and safe solution which should hopefully restore the 
drm_vblank_pre/post_modeset behaviour with only a few lines of core 
code. At the same time it should fix up another bug in that new 
drm_update_vblank_count code that i just realized, in a way simple 
enough for a stable fix.

Now i just need to actually code and test it first.

>
>> And they probably wouldn't solve that other small race i found as easily
>> - I don't think it's likely to happen (often/at all?) in practice, but i
>> have trouble "forgetting" about its existence now.
>
> That's something which should be addressed independently from the
> regression fix.
>
> Please split up the PM fixes from your patch into one or two separate
> patches (which may be appropriate for 4.5 / stable trees), and leave the
> switch to drm_vblank_on/off to Daniel's patch for 4.6.
>

I will look into that once i'm done with the above, and probably got 
some sleep again.

Fixing this race regression without switching to vblank_off/on might 
need a small bit of extra band aid there.

Btw. wrt. the radeon_pm.c fix: It's certainly good to fix that potential 
drm_vblank_get/put imbalance. I wonder if that "might glitch" DEBUG 
message makes much sense though. Can that code run during a modeset at 
all? And if so, i'd almost expect that there won't be any vblank irqs 
available at that point anyway - once the crtc's are off they don't 
trigger vblank irqs anymore - so that code might glitch due to lack of 
vblank sync regardless if drm_vblank_get is successful or not?

The other thing is my placement of the radeon_pm_compute_clocks() in the 
DPMS_ON path. I moved it to fix the potential extra race i described. 
But thinking about it, wouldn't be the better place at the beginning of 
the DPMS on path, before the atombios calls reenable the crtcs? I don't 
know the driver well enough, but it looked a bit suspicious to me that 
the memory clocks, linebuffer watermarks etc. get updated for thew new 
video mode after the crtc has been enabled. Won't it then potentially 
start running for a moment with wrong memory bandwidth etc.? That's 
probably something for you to check - no idea, just something i noticed 
as slightly odd to me.

Also moving it up might avoid collisions with Daniel's patch, if that 
move doesn't hurt.

>
>>> I'm not against switching to drm_vblank_on/off for 4.6, but it's not a
>>> solution for older kernels.
>>
>> Linux 4.4 is an especially important stable kernel for me because it's
>> supposed to be the standard distro kernel for Ubuntu 16.04-LTS and
>> siblings/derivatives (Linux Mint) for up to the next 5 years. Having
>> many of my neuroscience users ending on that kernel as their very first
>> impression of Linux with something potentially broken in vblank land
>> scares me. The reliability of timing/timestamping stuff is
>> super-important for them, at the same time hand-holding many of them
>> through non-standard kernel upgrades would be so much not fun.
>
> But fast-tracking the switch to drm_vblank_on/off, which haven't been
> widely tested with this driver, all the way to 4.4 seems less risky to
> you? Seriously?
>

It made a lot of sense after 12 hours of browsing code, thinking about 
all kind of race conditions and other new personal horrors ;) - Luckily 
i don't have to decide.

>> Just to say 

[PATCH 1/2] drm/radeon: Use drm_vblank_off/on to fix vblank counter trouble.

2016-01-21 Thread Mario Kleiner
On 01/21/2016 09:25 AM, Michel Dänzer wrote:
> On 21.01.2016 17:16, Mario Kleiner wrote:
>>
>> This patch replaces calls to drm_vblank_pre/post_modeset in the
>> drivers dpms code with calls to drm_vblank_off/on, as recommended
>> for drivers with hw counters that reset to zero during modeset.
>
> Sounds like you fell for the drm_vblank_on/off propaganda. :( This was
> working fine with drm_vblank_pre/post_modeset, that it broke is simply a
> regression.
>

I agree with you that pre/post modeset breakage is a regression. It's 
just that i stumbled over the on/off stuff while searching for a 
solution and the other sort of hacks i could think of looked similar or 
more convoluted/hacky/fragile to me. And they probably wouldn't solve 
that other small race i found as easily - I don't think it's likely to 
happen (often/at all?) in practice, but i have trouble "forgetting" 
about its existence now.

>
> I'm not against switching to drm_vblank_on/off for 4.6, but it's not a
> solution for older kernels.
>
>

Linux 4.4 is an especially important stable kernel for me because it's 
supposed to be the standard distro kernel for Ubuntu 16.04-LTS and 
siblings/derivatives (Linux Mint) for up to the next 5 years. Having 
many of my neuroscience users ending on that kernel as their very first 
impression of Linux with something potentially broken in vblank land 
scares me. The reliability of timing/timestamping stuff is 
super-important for them, at the same time hand-holding many of them 
through non-standard kernel upgrades would be so much not fun. Just to 
say i'm probably way too biased wrt. what solution for this should get 
backported into an older kernel.

Anyway, urgently need to sleep.
-mario


linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-21 Thread Mario Kleiner
On 01/21/2016 07:38 AM, Michel Dänzer wrote:
> On 21.01.2016 14:31, Mario Kleiner wrote:
>> On 01/21/2016 04:43 AM, Michel Dänzer wrote:
>>> On 21.01.2016 05:32, Mario Kleiner wrote:
>>>>
>>>> So the problem is that AMDs hardware frame counters reset to
>>>> zero during a modeset. The old DRM code dealt with drivers doing that by
>>>> keeping vblank irqs enabled during modesets and incrementing vblank
>>>> count by one during each vblank irq, i think that's what
>>>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>>>
>>> Right, looks like there's been a regression breaking this. I suspect the
>>> problem is that vblank->last isn't getting updated from
>>> drm_vblank_post_modeset. Not sure which change broke that though, or how
>>> to fix it. Ville?
>>>
>>
>> The whole logic has changed and the software counter updates are now
>> driven all the time by the hw counter.
>>
>>>
>>> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
>>> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
>>> vblank counters"). I've been meaning to track that down since then; one
>>> of these days hopefully, but if anybody has any ideas offhand...
>>
>> I spent the last few hours reading through the drm and radeon code and i
>> think what should probably work is to replace the
>> drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
>> calls. These are apparently meant for drivers whose hw counters reset
>> during modeset, [...]
>
> ... just like drm_vblank_pre/post_modeset. That those were broken is a
> regression which needs to be fixed anyway. I don't think switching to
> drm_vblank_on/off is suitable for stable trees.
>
> Looking at Vlastimil's original post again, I'd say the most likely
> culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
> vblanks were missed").
>

Yes, i think reverting that one alone would likely fix it by reverting 
to the old vblank update logic.

>
>> Once drm_vblank_off is called, drm_vblank_get will no-op and return an
>> error, so clients can't enable vblank irqs during the modeset - pageflip
>> ioctl and waitvblank ioctl would fail while a modeset happens -
>> hopefully userspace handles this correctly everywhere.
>
> We've fixed xf86-video-ati for this.
>
>
>> I'll hack up a patch for demonstration now.
>
> You're a bit late to that party. :)
>
> http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
> http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html
>
>

Oops. Just sent out my little (so far untested) creations. Yes, they are 
essentially the same as Daniel's patches. The only addition is to also 
fix that other potential small race i describe by slightly moving the 
xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put 
imbalance in radeon_pm if vblank_on/off would be used.

-mario



[PATCH 1/2] drm/radeon: Use drm_vblank_off/on to fix vblank counter trouble.

2016-01-21 Thread Mario Kleiner
The hardware vblank counter of AMD gpu's resets to zero during a
modeset. The new implementation of drm_update_vblank_count() from
commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how
many vblanks were missed", introduced in Linux 4.4, treats that
as a counter wraparound and causes the software vblank counter
to jump forward by a large distance of up to 2^24 counts. This
interacts badly with 32-bit wraparound handling in
drm_handle_vblank_events(), causing that function to no longer
deliver pending vblank events to clients.

This leads to client hangs especially if clients perform OpenGL
or DRI3/Present animations while a modeset happens and triggers
the hw vblank counter reset. One prominent example is a hang of
KDE Plasma 5's startup progress splash screen during login, making
the KDE session unuseable.

Another small potential race exists when executing a modeset while
vblank interrupts are enabled or just get enabled: The modeset updates
radeon_crtc->lb_vblank_lead_lines during radeon_display_bandwidth_update,
so if vblank interrupt handling or enable would try to access that variable
multiple times at the wrong moment as part of drm_update_vblank_counter,
while the scanout happens to be within lb_vblank_lead_lines before the
start of vblank, it could cause inconsistent vblank counting and again
trigger a jump of the software vblank counter, causing similar client
hangs. The most easy way to avoid this small race is to not allow
vblank enable or vblank irq's during modeset.

This patch replaces calls to drm_vblank_pre/post_modeset in the
drivers dpms code with calls to drm_vblank_off/on, as recommended
for drivers with hw counters that reset to zero during modeset.
Those calls disable vblank interrupts during the modeset sequence
and reinitialize vblank counts and timestamps after the modeset
properly, taking hw counter reset into account, thereby fixing
the problem of forward jumping counters.

During a modeset, calls to drm_vblank_get() will no-op/intentionally
fail, so no vblank events or pageflips can be queued during modesetting.

Radeons static and dynpm power management uses drm_vblank_get to enable
vblank irqs to synchronize reclocking to start of vblank. If a modeset
would happen in parallel with such a power management action, drm_vblank_get
would be suppressed, sync to vblank wouldn't work and a visual glitch could
happen. However that glitch would hopefully be hidden by the blanking of
the crtc during modeset. A small fix to power management makes sure to
check for this and prevent unbalanced vblank reference counts due to
mismatched drm_vblank_get/put.

Reported-by: Vlastimil Babka 
Signed-off-by: Mario Kleiner 
Cc: michel at daenzer.net
Cc: vbabka at suse.cz
Cc: ville.syrjala at linux.intel.com
Cc: daniel.vetter at ffwll.ch
Cc: dri-devel at lists.freedesktop.org
Cc: alexander.deucher at amd.com
Cc: christian.koenig at amd.com
---
 drivers/gpu/drm/radeon/atombios_crtc.c  | 10 ++
 drivers/gpu/drm/radeon/radeon_legacy_crtc.c |  4 ++--
 drivers/gpu/drm/radeon/radeon_pm.c  |  8 ++--
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c 
b/drivers/gpu/drm/radeon/atombios_crtc.c
index 801dd60..1c853e0 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -275,23 +275,25 @@ void atombios_crtc_dpms(struct drm_crtc *crtc, int mode)
if (ASIC_IS_DCE3(rdev) && !ASIC_IS_DCE6(rdev))
atombios_enable_crtc_memreq(crtc, ATOM_ENABLE);
atombios_blank_crtc(crtc, ATOM_DISABLE);
-   drm_vblank_post_modeset(dev, radeon_crtc->crtc_id);
+   /* adjust pm to dpms *before* drm_vblank_on */
+   radeon_pm_compute_clocks(rdev);
+   drm_vblank_on(dev, radeon_crtc->crtc_id);
radeon_crtc_load_lut(crtc);
break;
case DRM_MODE_DPMS_STANDBY:
case DRM_MODE_DPMS_SUSPEND:
case DRM_MODE_DPMS_OFF:
-   drm_vblank_pre_modeset(dev, radeon_crtc->crtc_id);
+   drm_vblank_off(dev, radeon_crtc->crtc_id);
if (radeon_crtc->enabled)
atombios_blank_crtc(crtc, ATOM_ENABLE);
if (ASIC_IS_DCE3(rdev) && !ASIC_IS_DCE6(rdev))
atombios_enable_crtc_memreq(crtc, ATOM_DISABLE);
atombios_enable_crtc(crtc, ATOM_DISABLE);
radeon_crtc->enabled = false;
+   /* adjust pm to dpms *after* drm_vblank_off */
+   radeon_pm_compute_clocks(rdev);
break;
}
-   /* adjust pm to dpms */
-   radeon_pm_compute_clocks(rdev);
 }

 static void
diff --git a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c 
b/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
index 32b338f..24152df 100644
--- a/drivers/gpu/drm/radeon/radeon_legacy_crtc.c
+++ b/drivers/gpu/drm/radeon/ra

linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-21 Thread Mario Kleiner
On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> On 21.01.2016 05:32, Mario Kleiner wrote:
>>
>> So the problem is that AMDs hardware frame counters reset to
>> zero during a modeset. The old DRM code dealt with drivers doing that by
>> keeping vblank irqs enabled during modesets and incrementing vblank
>> count by one during each vblank irq, i think that's what
>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>
> Right, looks like there's been a regression breaking this. I suspect the
> problem is that vblank->last isn't getting updated from
> drm_vblank_post_modeset. Not sure which change broke that though, or how
> to fix it. Ville?
>

The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.

>
> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> vblank counters"). I've been meaning to track that down since then; one
> of these days hopefully, but if anybody has any ideas offhand...
>
>

I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.

Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.

It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?

There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.

I'll hack up a patch for demonstration now.


linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Mario Kleiner
On 01/18/2016 11:49 AM, Vlastimil Babka wrote:
> On 01/16/2016 05:24 AM, Mario Kleiner wrote:
>>
>>
>> On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
>>> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>>
>> I'm currently running...
>>
>> while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done
>>
>> ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
>> can't trigger a hang after hundreds of runs.
>>
>> Does this also hang for you?
>
> No, test mode seems to be fine.
>
>> I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
>> should probably give useful info around the time of the hang.
>
> Attached. Captured by having kdm running, switching to console, running
> "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
> frozen splashscreen, switch back, terminate dmesg. So somewhere around
> the middle there should be where ksplashscreen starts...
>
>> Maybe also check XOrg.0.log for (WW) warnings related to flip.
>
> No such warnings there.
>
>> thanks,
>> -mario
>>
>>
>>>> Thanks,
>>>> Vlastimil
>>>
>

Thanks. So the problem is that AMDs hardware frame counters reset to 
zero during a modeset. The old DRM code dealt with drivers doing that by 
keeping vblank irqs enabled during modesets and incrementing vblank 
count by one during each vblank irq, i think that's what 
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.

The new code in drm_update_vblank_count() breaks this. The reset of the 
counter to zero is treated as counter wraparound, so our software vblank 
counter jumps forward by up to 2^24 counts in response (in case of AMD's 
24 bit hw counters), and then the vblank event handling code in 
drm_handle_vblank_events() and other places detects the counter being 
more than 2^23 counts ahead of queued vblank events and as part of its 
own wraparound handling for the 32-Bit software counter doesn't deliver 
these queued events for a long time -> no vblank swap trigger event -> 
no swap -> client hangs waiting for swap completion.

I think i remember seeing the ksplash progress screen occasionally 
blanking half way through login, i guess that's when kwin triggers a 
modeset in parallel to ksplash doing its OpenGL animations. So depending 
on the hw vblank count at the time of login ksplash would or wouldn't 
hang, apparently i got "lucky" with my counts at login.

-mario


linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-16 Thread Mario Kleiner


On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE
>> Tumbleweed). There's a screen with progressbar showing the startup,
>> which normally fades away after reaching 100%. But with kernel 4.4, the
>> progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
>> Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
>> When I kill it from another console, I see that everything has actually
>> started up, just the progressbar screen was obscuring it. The windows
>> obviously don't have decorations etc. Starting kwin manually again shows
>> me again the progressbar screen at the same position.
>

Depressing. I was stress-testing those patches with Linux 4.4 for days 
on 2 AMD gpu's (HD-4000 RV 730 and HD-5770) under KDE 5 Plasma 5.4.2 
(KUbuntu 15.10, XOrg 1.17) and just retested Linux 4.4 on 
nouveau/radeon/intel also with XOrg 1.18 and XOrg master) with Linux 4.4 
a few days ago and never encountered such a hang or other vblank related 
problem on KDE-5 or GNOME-3.

I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i 
can't trigger a hang after hundreds of runs.

Does this also hang for you?

> Hmm. Sounds like it could then be waiting for a vblank in the distant
> future. There's that 1<<23 limit in the code though, but even with that
> we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.
>

xtrace suggests that ksplashqml seems to use classic OpenGL  + 
glXSwapBuffers under DRI2. So no clever swap scheduling based on vblank 
counter values.

> Stuff to try might include enabling drm.debug=0x2f, though that'll
> generate a lot of stuff. Another option would be to use the drm vblank
> tracepoints to try and catch what seq number it's waiting for and
> where we're at currently. Or I suppose you could just hack
> up drm_wait_vblank() to print an error message or something if the
> requested seq number is in the future by, say, more than a few seconds,
> and if that's the case then we could try to figure out why that happens.
>
>>
>> I have suspected that kwin is waiting for some event, but nevertheless
>> tried bisecting the kernel between 4.3 and 4.4, which lead to:
>>
>> # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use
>> vblank timestamps to guesstimate how many vblanks were missed
>>
>> I can confirm that 4.4 works if I revert the following commits:
>> 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v3)"
>>
>> d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v2)"
>
> The sha1s don't seem to match what I have, so not sure which kernel tree
> you have, but looking at the radeon commit at least one thing
> immediately caught my attention;
>
> +   /* Bump counter if we are at >= leading edge of 
> vblank,
> +* but before vsync where vpos would turn negative and
> +* the hw counter really increments.
> +*/
> +   if (vpos >= 0)
> +   count++;
>
> It's rather hard to see what it's really doing since the custom flags to
> the get_scanout_position now cause it return non-standard things. But if
> I'm reading things correctly it should really say something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start))
>   count++;
>
> Hmm. Actually even that might not be correct since it could be using the
> "fake" vblank start here, so might be it'd need to be something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines)
>   count++;
>

The current code should be correct. vpos here returns the distance of hw 
vertical scanout position to the start of vblank. According to Alex and 
Harry Wentland of AMD's display team, and my testing of my two cards the 
hw vertical scanout position resets to zero at start line of vsync, 
therefore the "vpos" in that code becomes negative at start of vsync. At 
the same time the hw frame counter increments by one, making that 
"count++" to bump the returned count by +1 no longer neccessary.

If the reset of hw vertical scanout pos to zero and the increment of hw 
frame counter wouldn't happen at exactly the same time at start of vsync 
i could see how that could cause two successive queries of 
driver->get_vblank_counter() could report a count of N+1 and then N if 
the timing of both calls would be just perfectly right. That would cause 
the DRM code to falsely detect counter wraparound and jump the vblank 
counter forward by 2^24.

My tested gpu's had DCE-3 or DCE-4 display engines, Caicos has DCE-5, so 
could this be some hw quirk for 

[PATCH] drm/nouveau: Fix pre-nv50 pageflip events

2015-12-01 Thread Mario Kleiner
When we are at it, the one with the title "[PATCH] drm/nouveau: Use 
drm_vblank_on/off consistently" from Daniel, which has a reviewed and 
tested by me also never made it into nouveau.

Maybe pick that up as well?

-mario

On 12/01/2015 04:55 PM, Daniel Vetter wrote:
> On Tue, Dec 01, 2015 at 04:08:16PM +0100, poma wrote:
>> On Mon, Nov 16, 2015 at 4:11 PM, Daniel Vetter  wrote:
>>> On Mon, Nov 02, 2015 at 04:45:00PM +0900, Michel Dänzer wrote:
>>>> On 31.10.2015 06:55, Daniel Vetter wrote:
>>>>> Apparently pre-nv50 pageflip events happen before the actual vblank
>>>>> period. Therefore that functionality got semi-disabled in
>>>>>
>>>>> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
>>>>> Author: Mario Kleiner 
>>>>> Date:   Tue May 13 00:42:08 2014 +0200
>>>>>
>>>>>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>>>>>
>>>>> Unfortunately that hack got uprooted in
>>>>>
>>>>> commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
>>>>> Author: Thierry Reding 
>>>>> Date:   Wed Aug 12 17:00:31 2015 +0200
>>>>>
>>>>>  drm/irq: Make pipe unsigned and name consistent
>>>>>
>>>>> Trigering a warning when trying to sample the vblank timestamp for a
>>>>> non-existing pipe. There's a few ways to fix this:
>>>>>
>>>>> - Open-code the old behaviour, which just enshrines this slight
>>>>>breakage of the userspace ABI.
>>>>>
>>>>> - Revert Mario's commit and again inflict broken timestamps, again not
>>>>>pretty.
>>>>>
>>>>> - Fix this for real by delaying the pageflip TS until the next vblank
>>>>>interrupt, thereby making it accurate.
>>>>>
>>>>> This patch implements the third option. Since having a page flip
>>>>> interrupt that happens when the pageflip gets armed and not when it
>>>>> completes in the next vblank seems to be fairly common (older i915 hw
>>>>> works very similarly) create a new helper to arm vblank events for
>>>>> such drivers.
>>>>
>>>> What happens when the page flip interrupt arrives during a vertical
>>>> blank period?  Presumably the userspace event will be deferred until the
>>>> next vertical blank period, but the flip might already take effect in
>>>> the current one.
>>>
>>> Hm yeah there's a tiny race if your update handler for the pageflip can
>>> race with your vblank handler. That's impossible here since it's all done
>>> from the same hw irq hanlder, and since that is single-threaded there
>>> shouldn't be a problem, as long as vblank handling are pageflip are
>>> ordered correctly.
>>>
>>> Might be worth a note in the kerneldoc though that this function isn't
>>> perfectly foolproof.
>>> -Daniel
>>
>>
>> Is there any updates in this respect?
>>
>> drm-nouveau-Fix-pre-nv50-pageflip-events-v4.patch
>> https://patchwork.kernel.org/patch/7591531
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=106431
>> Reported: 2015-10-21
>
> Ben Skeggs asleep probably. Dave, can you pls pick this up?
> -Daniel
>


Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-27 Thread Mario Kleiner
On 11/25/2015 08:38 PM, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
>  wrote:
>> On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
>>>
>>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>>>>
>>>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>>>>
>>>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>>>>
>>>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> ...
>>>>>>>> Ok, but why would that be a bad thing? I think we want it to think it
>>>>>>>> is
>>>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq
>>>>>>>> is
>>>>>>>> because we know the vblank irq handler we are executing atm. was
>>>>>>>> meant
>>>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>>>> frame. But if something called outside irq context it should get a
>>>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>>>
>>>>>>>
>>>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>>>> backwards.
>>>>>>>
>>>>>>
>>>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>>>> which we try to use on AMD.
>>>>>
>>>>>
>>>>> Well, if you do it that way then you have the problem of the hw counter
>>>>> seeming to jump forward by one after crossing the start of vblank (when
>>>>> compared to the value you sampled when you processed the early vblank
>>>>> interrupt).
>>>>>
>>>>
>>>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>>>> current locking with the new vblank counting in the core. The vblank
>>>> enable path is safe due to locking and discounting of redundant
>>>> timestamps etc. But the disable path could go wrong:
>>>>
>>>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>>>> updates timestamps and counts "as if" in vblank -> incremented vblank
>>>> count and timestamp now set in the future.
>>>>
>>>> 2. After vblank irq finishes, but just before leading edge of vblank,
>>>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>>>> count because before vblank and not in vblank irq. Now
>>>> drm_update_vblank_count() would process a
>>>> "new" timestamp and count from the past and we'd have time and counts
>>>> going backwards, and bad things would happen.
>>>>
>>>> I haven't observed such a thing happening during testing so far,
>>>> probably because the time window in which it could happen is tiny, but
>>>> given how awfully bad it would be, it needs to be prevented.
>>>>
>>>> I had a look at the description of the Vblank irq in the "M76 Register
>>>> Reference Guide" for older asics and the description suggests that the
>>>> vblank irq fires when the crtc's line buffer is finished reading pixel
>>>> data from the scanout buffer in memory for a frame, ie., when the line
>>>> buffer read "enters" vblank.
>>>
>>>
>>> Hmm. Does that mean there's always at least one fullscreen plane enabled
>>> in the hw? As in you can't turn off the primary plane or make it smaller
>>> than the active video area? Othwewise it sounds like you'd could either
>>> not get it at all, or get it somewhere in the m

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-27 Thread Mario Kleiner
On 11/25/2015 08:36 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
>> On 11/25/2015 06:46 PM, Ville Syrjälä wrote:

...

>> Attached is my current patch i wanted to submit for the drm core's
>> drm_update_vblank_count(). I think it's good to make the core somewhat
>> robust against potential kms driver bugs or glitches. But if you
>> wouldn't like that patch, there wouldn't be much of a point sending it
>> out at all.
>>
>> thanks,
>> -mario
>>
>
>> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
>> From: Mario Kleiner 
>> Date: Wed, 25 Nov 2015 18:48:31 +0100
>> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.
>>
>> The changes to drm_update_vblank_count() for Linux 4.4-rc
>> made the function more fragile wrt. some hw quirks. E.g.,
>> at dev->driver->enable_vblank(), AMD gpu's fire a spurious
>> redundant vblank irq shortly after enabling vblank irqs, not
>> locked to vblank. This causes a redundant call which needs
>> to be suppressed to avoid miscounting.
>>
>> To increase robustness, shuffle things around a bit:
>>
>> On drivers with high precision vblank timestamping always
>> evaluate the timestamp difference between current timestamp
>> and previously recorded timestamp to detect such redundant
>> invocations and no-op in that case.
>>
>> Also detect and warn about timestamps going backwards to
>> catch potential kms driver bugs.
>>
>> This patch is meant for Linux 4.4-rc and later.
>>
>> Signed-off-by: Mario Kleiner 
>> ---
>>   drivers/gpu/drm/drm_irq.c | 53 
>> ---
>>   1 file changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 819b8c1..8728c3c 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, unsigned int pipe,
>>  unsigned long flags)
>>   {
>>  struct drm_vblank_crtc *vblank = >vblank[pipe];
>> -u32 cur_vblank, diff;
>> +u32 cur_vblank, diff = 0;
>>  bool rc;
>>  struct timeval t_vblank;
>> +const struct timeval *t_old;
>> +u64 diff_ns;
>>  int count = DRM_TIMESTAMP_MAXRETRIES;
>>  int framedur_ns = vblank->framedur_ns;
>>
>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, unsigned int pipe,
>>  rc = drm_get_last_vbltimestamp(dev, pipe, _vblank, flags);
>>  } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && 
>> --count > 0);
>>
>> -if (dev->max_vblank_count != 0) {
>> -/* trust the hw counter when it's around */
>> -diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> -} else if (rc && framedur_ns) {
>> -const struct timeval *t_old;
>> -u64 diff_ns;
>> -
>> +/*
>> + * Always use vblank timestamping based method if supported to reject
>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>> + * due to some irq handling quirk.
>> + *
>> + * This also sets the diff value for use as fallback below in case the
>> + * hw does not support a suitable hw vblank counter.
>> + */
>> +if (rc && framedur_ns) {
>
> If you fudged everything properly why do you still need this? With
> working hw counter there should be no need to do this stuff.
>

As far as testing on one DCE4 card goes, i don't need it anymore with my 
fudged hw counters and timestamps. The fudging so far seems to work 
nicely. I just wanted to have a bit of extra robustness and a bit of 
extra available debug output against future or other broken drivers, or 
mistakes in fudging on the current driver, e.g., against things like 
timestamps going backwards. Especially since i can only test on two AMD 
cards atm., quite a limited sample. There are 3 display engine 
generations before and 5 generations after my test sample.

-mario


>>  t_old = (dev, pipe, vblank->count);
>>  diff_ns = timeval_to_ns(_vblank) - timeval_to_ns(t_old);
>>
>> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, unsigned int pipe,
>>   */
>>  diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>
>> -   

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Mario Kleiner
On 11/25/2015 06:46 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>> ...
>>>>>> Ok, but why would that be a bad thing? I think we want it to think it is
>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq is
>>>>>> because we know the vblank irq handler we are executing atm. was meant
>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>> frame. But if something called outside irq context it should get a
>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>
>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>> backwards.
>>>>>
>>>>
>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>> which we try to use on AMD.
>>>
>>> Well, if you do it that way then you have the problem of the hw counter
>>> seeming to jump forward by one after crossing the start of vblank (when
>>> compared to the value you sampled when you processed the early vblank
>>> interrupt).
>>>
>>
>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>> current locking with the new vblank counting in the core. The vblank
>> enable path is safe due to locking and discounting of redundant
>> timestamps etc. But the disable path could go wrong:
>>
>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>> updates timestamps and counts "as if" in vblank -> incremented vblank
>> count and timestamp now set in the future.
>>
>> 2. After vblank irq finishes, but just before leading edge of vblank,
>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>> count because before vblank and not in vblank irq. Now
>> drm_update_vblank_count() would process a
>> "new" timestamp and count from the past and we'd have time and counts
>> going backwards, and bad things would happen.
>>
>> I haven't observed such a thing happening during testing so far,
>> probably because the time window in which it could happen is tiny, but
>> given how awfully bad it would be, it needs to be prevented.
>>
>> I had a look at the description of the Vblank irq in the "M76 Register
>> Reference Guide" for older asics and the description suggests that the
>> vblank irq fires when the crtc's line buffer is finished reading pixel
>> data from the scanout buffer in memory for a frame, ie., when the line
>> buffer read "enters" vblank. That would explain why the irq happens a
>> few scanlines before actual vblank, because line buffer refills must
>> obviously happen before the crtc can send pixel data from the line
>> buffer to the encoders, so it would lead a bit in time. That also means
>> we can't delay the vblank irq to actually happen at start of vblank and
>> have to deal with the early vblank irq.
>>
>>> I guess one silly idea would be to defer the vblank interrupt processing
>>> to a timer, and just schedule it a bit into the future from the actual
>>> interrupt handler.
>>>
>>
>> Timer would be bad because then we get problems with the pageflip
>> completion irq sometimes being processed before the vblank irq,
>
> You you'd need to move page flip completion to happen from the vblank
> timer too I suppose.
>
>> and
>> because we want to be fast in vblank irq handling, delivering vblank
>> events etc. I wouldn't trust a timer to be reliable enough for such
>> short waits.
>
> hrtimers should be accurate. But maybe more expensive than the timer
> wheel.
>

Sounds all a bit complex and fraught with new possible complications. I 
can't spend much more time on t

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Mario Kleiner
On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>> ...
>>>>>> Ok, but why would that be a bad thing? I think we want it to think it is
>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq is
>>>>>> because we know the vblank irq handler we are executing atm. was meant
>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>> frame. But if something called outside irq context it should get a
>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>
>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>> backwards.
>>>>>
>>>>
>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>> which we try to use on AMD.
>>>
>>> Well, if you do it that way then you have the problem of the hw counter
>>> seeming to jump forward by one after crossing the start of vblank (when
>>> compared to the value you sampled when you processed the early vblank
>>> interrupt).
>>>
>>
>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>> current locking with the new vblank counting in the core. The vblank
>> enable path is safe due to locking and discounting of redundant
>> timestamps etc. But the disable path could go wrong:
>>
>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>> updates timestamps and counts "as if" in vblank -> incremented vblank
>> count and timestamp now set in the future.
>>
>> 2. After vblank irq finishes, but just before leading edge of vblank,
>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>> count because before vblank and not in vblank irq. Now
>> drm_update_vblank_count() would process a
>> "new" timestamp and count from the past and we'd have time and counts
>> going backwards, and bad things would happen.
>>
>> I haven't observed such a thing happening during testing so far,
>> probably because the time window in which it could happen is tiny, but
>> given how awfully bad it would be, it needs to be prevented.
>>
>> I had a look at the description of the Vblank irq in the "M76 Register
>> Reference Guide" for older asics and the description suggests that the
>> vblank irq fires when the crtc's line buffer is finished reading pixel
>> data from the scanout buffer in memory for a frame, ie., when the line
>> buffer read "enters" vblank.
>
> Hmm. Does that mean there's always at least one fullscreen plane enabled
> in the hw? As in you can't turn off the primary plane or make it smaller
> than the active video area? Othwewise it sounds like you'd could either
> not get it at all, or get it somewhere in the middle of the screen.
>

It says "Interrupt that can be programmed to be generated by the
primary display controller's line buffer logic either when the
source image line counter is not requesting any active
display data (i.e. in the vertical blank) or the output CRTC
timing generator is within the vertical blanking region."

So my statements were my interpretation of this quote, so i can make 
some sense out of the vblank irq behaviour. I guess Alex or Harry would 
know? The M76 reference refers to some older asics, i just assume it is 
the same for the current ones, given that observed behaviour would be 
consistent with the line buffer causing this lead of a couple of 
scanlines. I see about 2 scanlines on DCE4 and about 3 scanlines on 
DCE3. I don't know how big the line buffer is, how quickly it refills 
etc., but it sounds reasonable.

-mario




Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Mario Kleiner
On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>
>>
>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>
>>>> ...
>>>> Ok, but why would that be a bad thing? I think we want it to think it is
>>>> in the previous frame if it is called outside the vblank irq context.
>>>> The only reason we fudge it to the next frames vblank if i vblank irq is
>>>> because we know the vblank irq handler we are executing atm. was meant
>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>> the scanout positions and thereby timestamp to correspond to that new
>>>> frame. But if something called outside irq context it should get a
>>>> scanout position/timestamp that corresponds to "reality".
>>>
>>> It would be a bad thing since it would cause the timestamp to jump
>>> backwards, and that would also cause the frame count guesstimate to go
>>> backwards.
>>>
>>
>> But only if we don't use the dev->driver->get_vblank_counter() method,
>> which we try to use on AMD.
>
> Well, if you do it that way then you have the problem of the hw counter
> seeming to jump forward by one after crossing the start of vblank (when
> compared to the value you sampled when you processed the early vblank
> interrupt).
>

Ok, finally i see the bad scenario that wouldn't get prevented by our 
current locking with the new vblank counting in the core. The vblank 
enable path is safe due to locking and discounting of redundant 
timestamps etc. But the disable path could go wrong:

1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
updates timestamps and counts "as if" in vblank -> incremented vblank 
count and timestamp now set in the future.

2. After vblank irq finishes, but just before leading edge of vblank, 
vblank_disable_and_save() executes, doesn't get bumped timestamp or 
count because before vblank and not in vblank irq. Now 
drm_update_vblank_count() would process a
"new" timestamp and count from the past and we'd have time and counts 
going backwards, and bad things would happen.

I haven't observed such a thing happening during testing so far, 
probably because the time window in which it could happen is tiny, but 
given how awfully bad it would be, it needs to be prevented.

I had a look at the description of the Vblank irq in the "M76 Register 
Reference Guide" for older asics and the description suggests that the 
vblank irq fires when the crtc's line buffer is finished reading pixel 
data from the scanout buffer in memory for a frame, ie., when the line 
buffer read "enters" vblank. That would explain why the irq happens a 
few scanlines before actual vblank, because line buffer refills must 
obviously happen before the crtc can send pixel data from the line 
buffer to the encoders, so it would lead a bit in time. That also means 
we can't delay the vblank irq to actually happen at start of vblank and 
have to deal with the early vblank irq.

> I guess one silly idea would be to defer the vblank interrupt processing
> to a timer, and just schedule it a bit into the future from the actual
> interrupt handler.
>

Timer would be bad because then we get problems with the pageflip 
completion irq sometimes being processed before the vblank irq, and 
because we want to be fast in vblank irq handling, delivering vblank 
events etc. I wouldn't trust a timer to be reliable enough for such 
short waits. Busy waiting wouldn't be great either in irq.

So what about this relatively simple one?

1. In radeon_get_crtc_scanoutpos() we artifically define the 
vblank_start line to be, e.g, 5 scanlines before the true start of 
vblank, so for the purpose of vblank counter queries and timestamping 
our "vblank" would start a bit earlier and the vblank irq would always 
execute in "vblank". Non-Irq invocations like vblank_disable_and_save() 
would also be treated to this early vblank start and so what the DRM 
core observes would always be consistent.

2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() 
internally for "dynpm" dynamic power management/reclocking, and to 
implement pageflip completion detection on asics older than DCE3 which 
don't have pageflip interrupts. For those cases we need to use the true 
start of vblank, so for this internal use we pass in some special flag 
into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank 
start around.

3. I've added another ch

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner
On 11/23/2015 09:04 PM, Harry Wentland wrote:
> Hi Mario,
>
> when we've had issues with this on amdgpu Christian fixed it by enabling
> page flip irq all the time, rather than turning it on when usermode
> request a flip and turning it back off after we handled it. I believe
> that fix exists on radeon already. Michel should have more info on that.
>
> See other comments inline.
>
> Thanks,
> Harry
>
>
> On 2015-11-23 11:02 AM, Mario Kleiner wrote:
>> On 11/20/2015 04:42 AM, Alex Deucher wrote:
>>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>>>  wrote:
>>>> Hi Alex and Michel and Ville,
>>>>
>>>> it's "fix vblank stuff" time again ;-)
>>>
>>> Adding Harry from our display team.  He might be able to fill in the
>>> blanks of on some of this better than I can.  It might also be worth
>>> checking to see how our DAL (our new display code which is being
>>> developed directly by our display team) code handles this.  It could
>>> be that we are just missing register settings:
>>
>> Thanks Alex! And hello Harry :)
>>
>>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
>>
>> I'll have a look at this.
>>
>>> Additionally we've published full registers headers for the display
>>> block:
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
>>>
>>> The DCE8 stuff should generally apply back to DCE4.  If you have
>>> questions about registers older asics not covered in the hw docs, let
>>> me know.  Note the new headers are dword aligned rather than byte
>>> aligned.
>>>
>>
>> I've tested now with two different progressive modes on DCE3 and one
>> progressive mode on DCE4, the only cards i have atm. So far it seems
>> that the framecounter indeed increments when the vpos from the scanout
>> position query jumps back to zero. Attached for reference is my
>> current patch to radeon-kms. This one seems to work reliably so far,
>> also if i enable the immediate vblank irq disable which we so far only
>> used on intel-kms.
>>
>> But according to this patch the framecounter increment happens
>> somewhere in the middle of vblank.
>>
>> Essentially from the vpos extracted from
>> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start"
>> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1
>> the framecounter stays on the count of the old previous scanout cycle.
>> Then when vpos wraps to zero the framecounter increments by 1. And
>> then we have another couple of dozen lines inside vblank until
>> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and
>> entering active scanout for the new frame.
>>
>> So the position of observed framecounter increment seems to be not
>> close to the end of vblank ("start of frame" indicator?), but a couple
>> of scanlines after start of vblank.
>>
>> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478,
>> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the
>> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and
>> the framecounter increments by 1, then 38 scanlines later the vblank
>> ends.
>>
>> So i seem to have something that seems to work in practice and this
>> "increment framecounter if vpos wraps back to zero" behavior makes
>> some sense. It just doesn't conform to what those descriptions for
>> start_line and "start of frame" indicator describe?
>>
> This is correct. Our HW doesn't really have a vblank counter but a frame
> counter. The framecounter increments at the start of vsync, which is
> when we wrap to zero which doesn't coincide with the start of vblank.
>
> What we're trying to do with get_vblank_counter isn't the same as what
> framecount gives us, but we could probably do something like:
>
> if (get_scanout_pos > vblank_start)
>return frame_count + 1;
> else
>return frame_count;
>

Great! That's what my current patch does and it seems to work well with 
different video modes on both DCE3 and DCE4. So theory agrees with 
practice again :) - thanks for clarifying this.

So the other problem we have since forever is the vblank irq firing 
before the start of vblank. We are typically 1-2 scanlines before 
vblank_start when we sample the scanout position and framecounter. This 
needs a slightly ugly workaround. Is there a way, maybe some config 
register, to fire the irq at leading edge of vblank instead of a bit 

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner


On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>
>> ...
>> Ok, but why would that be a bad thing? I think we want it to think it is
>> in the previous frame if it is called outside the vblank irq context.
>> The only reason we fudge it to the next frames vblank if i vblank irq is
>> because we know the vblank irq handler we are executing atm. was meant
>> to execute within the upcoming vblank for the next frame, so we fudge
>> the scanout positions and thereby timestamp to correspond to that new
>> frame. But if something called outside irq context it should get a
>> scanout position/timestamp that corresponds to "reality".
>
> It would be a bad thing since it would cause the timestamp to jump
> backwards, and that would also cause the frame count guesstimate to go
> backwards.
>

But only if we don't use the dev->driver->get_vblank_counter() method, 
which we try to use on AMD. Also dev->vblank_time_lock should protect us 
from concurrent execution of the vblank counting/timestamping in irq 
context and non-irq context? At least that was one of the purposes of 
that lock in the past?

-mario

>>
>> It would maybe be a problem to get different answers for a query at the
>> same scanout position if we used .get_scanout_position() for something
>> else than for calculating vblank timestamps, but we don't do that atm.
>>
>> Maybe i'm overlooking something here related to the somewhat rewritten
>> code from the last year or so? But in the original design this would be
>> exactly what i intended?
>>
>> ...
>>
>>>> So it's good enough for typical desktop
>>>> applications/compositors/games/media players, and a nice improvement
>>>> over the previous state, but not quite sufficient for applications that
>>>> need long time consistent vblank counts for long waits of multiple
>>>> seconds or even minutes. So it's still good to have hw counters if we
>>>> can get some that are reliable enough.
>>>
>>> Ah, I didn't realize you care about small errors in the counter for
>>> such long periods of vblank off.
>>>
>>
>> Actually, you are right, i was stupid - not enough sleep last friday. I
>> do care about such small errors, but the long vblank off periods don't
>> matter at all the way my software works. I query the current count and
>> timestamp (glXGetSyncValuesOML), calculate a target count based on those
>> and then schedule a swap for that target count via glXSwapBuffersMscOML.
>> That swapbuffers call will keep vblank irqs on until the kms pageflip is
>> queued. So i only care about vblank counts/timestamps being consistent
>> for short amounts of time, typically 1 video refresh from vblank query
>> to queueing a vblank event, and then from reception of that
>> event/queuing the pageflip to pageflip completion event. So even if the
>> system would be heavily loaded and my code would have big preemption
>> delays i think counts that are consistent over a few seconds would be
>> enough to keep things working. Otherwise it wouldn't work now either
>> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
>>
>> -mario
>>
>>
>>>>
>>>> -mario
>>>>
>>>>
>>>>>>
>>>>>> -mario
>>>>>>
>>>>>>>>
>>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of 
>>>>>>>> info
>>>>>>>> from you:
>>>>>>>>
>>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not
>>>>>>>> part of the current register defines for radeon-kms:
>>>>>>>>
>>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>>>>>
>>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>>>>>> vertical scanout position. It seems to give the same readings as the 24
>>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>>>>>> would come handy.
>>>>>>>>
>>>>>>>> Does Evergreen and later have a same/similar register and where is it?
>>>>>>>>
>

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner
On 11/20/2015 04:42 AM, Alex Deucher wrote:
> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>  wrote:
>> Hi Alex and Michel and Ville,
>>
>> it's "fix vblank stuff" time again ;-)
>
> Adding Harry from our display team.  He might be able to fill in the
> blanks of on some of this better than I can.  It might also be worth
> checking to see how our DAL (our new display code which is being
> developed directly by our display team) code handles this.  It could
> be that we are just missing register settings:

Thanks Alex! And hello Harry :)

> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip

I'll have a look at this.

> Additionally we've published full registers headers for the display block:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
> The DCE8 stuff should generally apply back to DCE4.  If you have
> questions about registers older asics not covered in the hw docs, let
> me know.  Note the new headers are dword aligned rather than byte
> aligned.
>

I've tested now with two different progressive modes on DCE3 and one 
progressive mode on DCE4, the only cards i have atm. So far it seems 
that the framecounter indeed increments when the vpos from the scanout 
position query jumps back to zero. Attached for reference is my current 
patch to radeon-kms. This one seems to work reliably so far, also if i 
enable the immediate vblank irq disable which we so far only used on 
intel-kms.

But according to this patch the framecounter increment happens somewhere 
in the middle of vblank.

Essentially from the vpos extracted from EVERGREEN_CRTC_STATUS_POSITION 
which defines start of vblank ("Start" part of 
EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 the 
framecounter stays on the count of the old previous scanout cycle. Then 
when vpos wraps to zero the framecounter increments by 1. And then we 
have another couple of dozen lines inside vblank until reaching the 
"End" part of EVERGREEN_CRTC_V_BLANK_START_END and entering active 
scanout for the new frame.

So the position of observed framecounter increment seems to be not close 
to the end of vblank ("start of frame" indicator?), but a couple of 
scanlines after start of vblank.

E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, 
vtotal is 1481, end of vblank is 38. So i enter the vblank and see the 
old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and the 
framecounter increments by 1, then 38 scanlines later the vblank ends.

So i seem to have something that seems to work in practice and this 
"increment framecounter if vpos wraps back to zero" behavior makes some 
sense. It just doesn't conform to what those descriptions for start_line 
and "start of frame" indicator describe?

I'll test with a few more video modes.

thanks,
-mario


>>
>> Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_count()
>> code in Linux 4.4 not only made that code more elegant, but also removed the
>> robustness against the vblank irq quirks in AMD hw and similar hardware. So
>> now i get tons of off-by-one errors and
>>
>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>> completion event has impossible msc 24803 < target_msc 24804" XOrg messages
>> from that kernel.
>>
>> One of the reasons for trouble is that AMD hw quirk where the hw fires an
>> extra vblank irq shortly after vblank irq's get enabled, not synchronized to
>> vblank, but typically in the middle of active scanout, so we get a redundant
>> call to drm_handle_vblank in the middle of scanout.
>>
>> To fix that i have a minor patch to make drm_update_vblank_count() again
>> robust against such redundant calls, which i will send out later to the
>> mailing list. Diff attached for reference.
>>
>> The second quirk of AMD hw is that the vblank interrupt fires a few
>> scanlines before start of vblank, so drm_handle_vblank ->
>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets called
>> before the start of the vblank for which the new vblank count should be
>> queried.
>>
>> The third problem is that the DRM vblank handling always had the assumption
>> that hardware vblank counters would essentially increment at leading edge of
>> vblank - basically in sync with the firing of the vblank irq, so that a hw
>> counter readout from within the vblank irq handler would always deliver the
>> new incremented value. If this assumption is violated then the counting by
>> use of the hw counter gets unreliable, because depending on random small
>> delays in irq handling the code may end up sampling the hw 

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner
On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:

...

>> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we
>> are no more than 1% of the display height away from start of vblank we
>> fudge scanout position in a way so that the timestamp gets computed for
>> the soon-to-begin vblank, not the old one.
>
> The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
> that if you call .get_scanout_positon() from a non-irq context between
> the irq firing and start of vblank, you'll think you're still in the
> previous frame. Hence my suggestion to note down the frame counter when
> called from the irq, and then keep doing the fudging until you've truly
> crossed the start of vblank.
>

Ok, but why would that be a bad thing? I think we want it to think it is 
in the previous frame if it is called outside the vblank irq context. 
The only reason we fudge it to the next frames vblank if i vblank irq is 
because we know the vblank irq handler we are executing atm. was meant 
to execute within the upcoming vblank for the next frame, so we fudge 
the scanout positions and thereby timestamp to correspond to that new 
frame. But if something called outside irq context it should get a 
scanout position/timestamp that corresponds to "reality".

It would maybe be a problem to get different answers for a query at the 
same scanout position if we used .get_scanout_position() for something 
else than for calculating vblank timestamps, but we don't do that atm.

Maybe i'm overlooking something here related to the somewhat rewritten 
code from the last year or so? But in the original design this would be 
exactly what i intended?

...

>> So it's good enough for typical desktop
>> applications/compositors/games/media players, and a nice improvement
>> over the previous state, but not quite sufficient for applications that
>> need long time consistent vblank counts for long waits of multiple
>> seconds or even minutes. So it's still good to have hw counters if we
>> can get some that are reliable enough.
>
> Ah, I didn't realize you care about small errors in the counter for
> such long periods of vblank off.
>

Actually, you are right, i was stupid - not enough sleep last friday. I 
do care about such small errors, but the long vblank off periods don't 
matter at all the way my software works. I query the current count and 
timestamp (glXGetSyncValuesOML), calculate a target count based on those 
and then schedule a swap for that target count via glXSwapBuffersMscOML. 
That swapbuffers call will keep vblank irqs on until the kms pageflip is 
queued. So i only care about vblank counts/timestamps being consistent 
for short amounts of time, typically 1 video refresh from vblank query 
to queueing a vblank event, and then from reception of that 
event/queuing the pageflip to pageflip completion event. So even if the 
system would be heavily loaded and my code would have big preemption 
delays i think counts that are consistent over a few seconds would be 
enough to keep things working. Otherwise it wouldn't work now either 
with a vblank off after 5 seconds and nouveau not having vblank hw counters.

-mario


>>
>> -mario
>>
>>
>>>>
>>>> -mario
>>>>
>>>>>>
>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
>>>>>> from you:
>>>>>>
>>>>>> 1. There's this register from the old specs for m76.pdf, which is not
>>>>>> part of the current register defines for radeon-kms:
>>>>>>
>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>>>
>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>>>> vertical scanout position. It seems to give the same readings as the 24
>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>>>> would come handy.
>>>>>>
>>>>>> Does Evergreen and later have a same/similar register and where is it?
>>>>>>
>>>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
>>>>>> increment happening exactly at the moment that vertical scanout position
>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
>>>>>> the framecounter increment just happening somewhere inside either

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-20 Thread Mario Kleiner


On 11/19/2015 08:45 PM, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
>> On 11/19/2015 07:20 PM, Ville Syrjälä wrote:
>>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
>>>> Hi Alex and Michel and Ville,
>>>>
>>>> it's "fix vblank stuff" time again ;-)
>>>>
>>>> Ville's changes to the DRM's drm_handle_vblank() /
>>>> drm_update_vblank_count() code in Linux 4.4 not only made that code more
>>>> elegant, but also removed the robustness against the vblank irq quirks
>>>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
>>>>
>>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>>>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>>>> messages from that kernel.
>>>
>>> Argh. Sorry about that.
>>>
>>
>> On the plus side, your "vblank timestamp deltas as fake vblank counters"
>> code seems to work nicely on nouveau-kms, as far as testing with three
>> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and
>> Ivybridge) i tested checked out nicely.
>>
>> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank
>> counters which we could implement in nouveau, maybe with the same
>> trickery to allow long trouble-free vblank off periods, hopefully that
>> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3
>> will probably also work. I think i read in the Tegra-3 PRM that the sync
>> points they use to implement vblank counters do increment at leading
>> edge of vblank.
>>
>> The only problem we may have is that some of the embedded gpus may not
>> have compliant vblank counters and they probably also lack vblank
>> timestamping, so it might be a good idea to rather not use vblank
>> counters at all in those drivers - patch their kms drivers to
>> max_vblank_count = 0;
>>
>>>>
>>>> One of the reasons for trouble is that AMD hw quirk where the hw fires
>>>> an extra vblank irq shortly after vblank irq's get enabled, not
>>>> synchronized to vblank, but typically in the middle of active scanout,
>>>> so we get a redundant call to drm_handle_vblank in the middle of scanout.
>>>
>>> I think that should be fine as such. The code should ignore redudntant
>>> vbl irqs. Well, assuming you have a reliable hw counter or you use the
>>> timestamp guesstimate mechanism and your scanout position is reported
>>> accurately. But I guess you have a bit of problem with both.
>>>
>>
>> The problem is i'll need to treat calls to radeon kms
>> driver->get_vblank_counter differently, depending if the function gets
>> called from vblank irq, or from regular code, so that hw quirk that
>> causes spontaneous misfiring of the vblank irq in the middle of scanout
>> would confuse my hw vblank counter cooking method to produce a fake hw
>> vblank counter increment. That's why i moved the filtering for redundant
>> irqs based on vblank timestamps in drm_vblank_update() around to always
>> apply. Makes us robust against that type of hw quirk in general and
>> makes life for the vblank counter cooking so much easier.
>>
>> It's a beautiful collaboration of different hw bugs to make things
>> interesting :)
>>
>>>>
>>>> To fix that i have a minor patch to make drm_update_vblank_count() again
>>>> robust against such redundant calls, which i will send out later to the
>>>> mailing list. Diff attached for reference.
>>>>
>>>> The second quirk of AMD hw is that the vblank interrupt fires a few
>>>> scanlines before start of vblank, so drm_handle_vblank ->
>>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>>>> called before the start of the vblank for which the new vblank count
>>>> should be queried.
>>>
>>> Does it fire too soon, or is the scanout position register value(s)
>>> just offset by a few lines perhaps?
>>>
>>> We have that with i915 and I simply fix up the value when reading it
>>> out. Fortunately for us the offset is constant (or at least seems to
>>> be) for a given platform/connector combo.
>>>
>>
>> I think they fire too soon, from all i've seen so far on a few cards.
>
> That's unfortunate. Firing a bit too late would be perfectly fine for

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Mario Kleiner
On 11/19/2015 07:20 PM, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
>> Hi Alex and Michel and Ville,
>>
>> it's "fix vblank stuff" time again ;-)
>>
>> Ville's changes to the DRM's drm_handle_vblank() /
>> drm_update_vblank_count() code in Linux 4.4 not only made that code more
>> elegant, but also removed the robustness against the vblank irq quirks
>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
>>
>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>> messages from that kernel.
>
> Argh. Sorry about that.
>

On the plus side, your "vblank timestamp deltas as fake vblank counters" 
code seems to work nicely on nouveau-kms, as far as testing with three 
Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and 
Ivybridge) i tested checked out nicely.

And at least the recent nv50+ NVidia Tesla also have 16 bit vblank 
counters which we could implement in nouveau, maybe with the same 
trickery to allow long trouble-free vblank off periods, hopefully that 
would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 
will probably also work. I think i read in the Tegra-3 PRM that the sync 
points they use to implement vblank counters do increment at leading 
edge of vblank.

The only problem we may have is that some of the embedded gpus may not 
have compliant vblank counters and they probably also lack vblank 
timestamping, so it might be a good idea to rather not use vblank 
counters at all in those drivers - patch their kms drivers to 
max_vblank_count = 0;

>>
>> One of the reasons for trouble is that AMD hw quirk where the hw fires
>> an extra vblank irq shortly after vblank irq's get enabled, not
>> synchronized to vblank, but typically in the middle of active scanout,
>> so we get a redundant call to drm_handle_vblank in the middle of scanout.
>
> I think that should be fine as such. The code should ignore redudntant
> vbl irqs. Well, assuming you have a reliable hw counter or you use the
> timestamp guesstimate mechanism and your scanout position is reported
> accurately. But I guess you have a bit of problem with both.
>

The problem is i'll need to treat calls to radeon kms 
driver->get_vblank_counter differently, depending if the function gets 
called from vblank irq, or from regular code, so that hw quirk that 
causes spontaneous misfiring of the vblank irq in the middle of scanout 
would confuse my hw vblank counter cooking method to produce a fake hw 
vblank counter increment. That's why i moved the filtering for redundant 
irqs based on vblank timestamps in drm_vblank_update() around to always 
apply. Makes us robust against that type of hw quirk in general and 
makes life for the vblank counter cooking so much easier.

It's a beautiful collaboration of different hw bugs to make things 
interesting :)

>>
>> To fix that i have a minor patch to make drm_update_vblank_count() again
>> robust against such redundant calls, which i will send out later to the
>> mailing list. Diff attached for reference.
>>
>> The second quirk of AMD hw is that the vblank interrupt fires a few
>> scanlines before start of vblank, so drm_handle_vblank ->
>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>> called before the start of the vblank for which the new vblank count
>> should be queried.
>
> Does it fire too soon, or is the scanout position register value(s)
> just offset by a few lines perhaps?
>
> We have that with i915 and I simply fix up the value when reading it
> out. Fortunately for us the offset is constant (or at least seems to
> be) for a given platform/connector combo.
>

I think they fire too soon, from all i've seen so far on a few cards.

>>
>> The third problem is that the DRM vblank handling always had the
>> assumption that hardware vblank counters would essentially increment at
>> leading edge of vblank - basically in sync with the firing of the vblank
>> irq, so that a hw counter readout from within the vblank irq handler
>> would always deliver the new incremented value. If this assumption is
>> violated then the counting by use of the hw counter gets unreliable,
>> because depending on random small delays in irq handling the code may
>> end up sampling the hw counter pre- or post-increment, leading to
>> inconsistent updating and funky bugs. It just so happens that AMD
>> hardware doesn't increment the hw counter at leading edge of vblank, so
>> stuff falls apart.
>>
>> So to fix those two problems i'm tinkering with cooking the hw vblank
>&

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Mario Kleiner
Hi Alex and Michel and Ville,

it's "fix vblank stuff" time again ;-)

Ville's changes to the DRM's drm_handle_vblank() / 
drm_update_vblank_count() code in Linux 4.4 not only made that code more 
elegant, but also removed the robustness against the vblank irq quirks 
in AMD hw and similar hardware. So now i get tons of off-by-one errors and

"[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
completion event has impossible msc 24803 < target_msc 24804" XOrg 
messages from that kernel.

One of the reasons for trouble is that AMD hw quirk where the hw fires 
an extra vblank irq shortly after vblank irq's get enabled, not 
synchronized to vblank, but typically in the middle of active scanout, 
so we get a redundant call to drm_handle_vblank in the middle of scanout.

To fix that i have a minor patch to make drm_update_vblank_count() again 
robust against such redundant calls, which i will send out later to the 
mailing list. Diff attached for reference.

The second quirk of AMD hw is that the vblank interrupt fires a few 
scanlines before start of vblank, so drm_handle_vblank -> 
drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
called before the start of the vblank for which the new vblank count 
should be queried.

The third problem is that the DRM vblank handling always had the 
assumption that hardware vblank counters would essentially increment at 
leading edge of vblank - basically in sync with the firing of the vblank 
irq, so that a hw counter readout from within the vblank irq handler 
would always deliver the new incremented value. If this assumption is 
violated then the counting by use of the hw counter gets unreliable, 
because depending on random small delays in irq handling the code may 
end up sampling the hw counter pre- or post-increment, leading to 
inconsistent updating and funky bugs. It just so happens that AMD 
hardware doesn't increment the hw counter at leading edge of vblank, so 
stuff falls apart.

So to fix those two problems i'm tinkering with cooking the hw vblank 
counter value returned by radeon_get_vblank_counter_kms() to make it 
appear as if the counter incremented at leading edge of vblank in sync 
with vblank irq.

It almost sort of works on the rs600 code path, but i need a bit of info 
from you:

1. There's this register from the old specs for m76.pdf, which is not 
part of the current register defines for radeon-kms:

"D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"

It contains the lower 16 bits of framecounter and the 13 bits of 
vertical scanout position. It seems to give the same readings as the 24 
bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
would come handy.

Does Evergreen and later have a same/similar register and where is it?

2. The hw framecounter seems to increment when the vertical scanout 
position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
gpu i tested so far. Is this so on all asics? And is the hw counter 
increment happening exactly at the moment that vertical scanout position 
jumps back to zero, ie. both events are driven by the same signal? Or is 
the framecounter increment just happening somewhere inside either 
scanline VTOTAL-1 or scanline 0?


If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
regression and with a bit of luck at the same time improve by being able 
to set dev->vblank_disable_immediate = true then and allow vblank irqs 
to get turned off more aggressively for a bit of extra power saving.

thanks,
-mario
-- next part --
A non-text attachment was scrubbed...
Name: fixupForDRM.patch
Type: text/x-patch
Size: 3373 bytes
Desc: not available
URL: 



[PATCH] drm/nouveau: Fix pre-nv50 pageflip events (v3)

2015-11-10 Thread Mario Kleiner
On 11/10/2015 05:00 PM, Thierry Reding wrote:
> On Tue, Nov 10, 2015 at 03:54:52PM +0100, Mario Kleiner wrote:
>> From: Daniel Vetter 
>>
>> Apparently pre-nv50 pageflip events happen before the actual vblank
>> period. Therefore that functionality got semi-disabled in
>>
>> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
>> Author: Mario Kleiner 
>> Date:   Tue May 13 00:42:08 2014 +0200
>>
>>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>>
>> Unfortunately that hack got uprooted in
>>
>> commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
>> Author: Thierry Reding 
>> Date:   Wed Aug 12 17:00:31 2015 +0200
>>
>>  drm/irq: Make pipe unsigned and name consistent
>>
>> Trigering a warning when trying to sample the vblank timestamp for a
>> non-existing pipe. There's a few ways to fix this:
>>
>> - Open-code the old behaviour, which just enshrines this slight
>>breakage of the userspace ABI.
>>
>> - Revert Mario's commit and again inflict broken timestamps, again not
>>pretty.
>>
>> - Fix this for real by delaying the pageflip TS until the next vblank
>>interrupt, thereby making it accurate.
>>
>> This patch implements the third option. Since having a page flip
>> interrupt that happens when the pageflip gets armed and not when it
>> completes in the next vblank seems to be fairly common (older i915 hw
>> works very similarly) create a new helper to arm vblank events for
>> such drivers.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
>> Cc: Thierry Reding 
>> Cc: Mario Kleiner 
>> Cc: Ben Skeggs 
>> Cc: Ilia Mirkin 
>>
>> v2 (mario): Integrate my own review comments into Daniels patch.
>> - Fix function prototypes in drmP.h
>> - Add missing vblank_put() for pageflip completion without
>>   pageflip event.
>> - Initialize sequence number for queued pageflip event to avoidng
>>   trouble in drm_handle_vblank_events().
>> - Remove dead code and spelling fix.
>>
>> v3 (mario): Add a signed-off-by and cc stable tag per Ilja's advice.
>>
>> Signed-off-by: Daniel Vetter 
>> (v1) Reviewed-by: Mario Kleiner 
>> (v2/v3) Signed-off-by: Mario Kleiner 
>>
>> Cc: stable at vger.kernel.org # v4.3
>> ---
>>   drivers/gpu/drm/drm_irq.c | 54 
>> ++-
>>   drivers/gpu/drm/nouveau/nouveau_display.c | 19 ++-
>>   include/drm/drmP.h|  4 +++
>>   3 files changed, 68 insertions(+), 9 deletions(-)
>
> This looks good to me. Let me clean this up a little and submit it to
> Dave.
>
> Thierry
>

Btw., if somebody has a functional old card for testing this, it should 
be easy to verify if it works on pre-nv50. If it would not work it would 
deliver the pageflip event 1 frame delayed, so at least on standard 
nouveau + default DRI2 + default double-buffering the rate for a tight 
loop of page-flipped swaps should go down to 30 fps on a 60 Hz display, 
quite noticeable. Afaik we also have Piglit tests for OML_sync_control 
which would likely fail if this would be broken.

Oh and if someone has tips on how to resurrect an old nv-40 PC (booted 
with BIOS only) graphics card in a MacPro (EFI boot), i wouldn't mind 
hearing them. It would be nice to still be able to use that card for 
testing.

thanks,
-mario


[PATCH] drm/nouveau: Fix pre-nv50 pageflip events (v3)

2015-11-10 Thread Mario Kleiner
From: Daniel Vetter <daniel.vet...@ffwll.ch>

Apparently pre-nv50 pageflip events happen before the actual vblank
period. Therefore that functionality got semi-disabled in

commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
Author: Mario Kleiner 
Date:   Tue May 13 00:42:08 2014 +0200

drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.

Unfortunately that hack got uprooted in

commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
Author: Thierry Reding 
Date:   Wed Aug 12 17:00:31 2015 +0200

drm/irq: Make pipe unsigned and name consistent

Trigering a warning when trying to sample the vblank timestamp for a
non-existing pipe. There's a few ways to fix this:

- Open-code the old behaviour, which just enshrines this slight
  breakage of the userspace ABI.

- Revert Mario's commit and again inflict broken timestamps, again not
  pretty.

- Fix this for real by delaying the pageflip TS until the next vblank
  interrupt, thereby making it accurate.

This patch implements the third option. Since having a page flip
interrupt that happens when the pageflip gets armed and not when it
completes in the next vblank seems to be fairly common (older i915 hw
works very similarly) create a new helper to arm vblank events for
such drivers.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
Cc: Thierry Reding 
Cc: Mario Kleiner 
Cc: Ben Skeggs 
Cc: Ilia Mirkin 

v2 (mario): Integrate my own review comments into Daniels patch.
   - Fix function prototypes in drmP.h
   - Add missing vblank_put() for pageflip completion without
 pageflip event.
   - Initialize sequence number for queued pageflip event to avoid
 trouble in drm_handle_vblank_events().
   - Remove dead code and spelling fix.

v3 (mario): Add a signed-off-by and cc stable tag per Ilja's advice.

Signed-off-by: Daniel Vetter 
(v1) Reviewed-by: Mario Kleiner 
(v2/v3) Signed-off-by: Mario Kleiner 

Cc: stable at vger.kernel.org # v4.3
---
 drivers/gpu/drm/drm_irq.c | 54 ++-
 drivers/gpu/drm/nouveau/nouveau_display.c | 19 ++-
 include/drm/drmP.h|  4 +++
 3 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index eba6337..819b8c1 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -972,7 +972,8 @@ static void send_vblank_event(struct drm_device *dev,
struct drm_pending_vblank_event *e,
unsigned long seq, struct timeval *now)
 {
-   WARN_ON_SMP(!spin_is_locked(>event_lock));
+   assert_spin_locked(>event_lock);
+
e->event.sequence = seq;
e->event.tv_sec = now->tv_sec;
e->event.tv_usec = now->tv_usec;
@@ -985,6 +986,57 @@ static void send_vblank_event(struct drm_device *dev,
 }

 /**
+ * drm_arm_vblank_event - arm vblank event after pageflip
+ * @dev: DRM device
+ * @pipe: CRTC index
+ * @e: the event to prepare to send
+ *
+ * A lot of drivers need to generate vblank events for the very next vblank
+ * interrupt. For example when the page flip interrupt happens when the page
+ * flip gets armed, but not when it actually executes within the next vblank
+ * period. This helper function implements exactly the required vblank arming
+ * behaviour.
+ *
+ * Caller must hold event lock. Caller must also hold a vblank reference for 
the
+ * event @e, which will be dropped when the next vblank arrives.
+ *
+ * This is the legacy version of drm_crtc_arm_vblank_event().
+ */
+void drm_arm_vblank_event(struct drm_device *dev, unsigned int pipe,
+ struct drm_pending_vblank_event *e)
+{
+   assert_spin_locked(>event_lock);
+
+   e->pipe = pipe;
+   e->event.sequence = drm_vblank_count(dev, pipe);
+   list_add_tail(>base.link, >vblank_event_list);
+}
+EXPORT_SYMBOL(drm_arm_vblank_event);
+
+/**
+ * drm_arm_vblank_event - arm vblank event after pageflip
+ * @crtc: the source CRTC of the vblank event
+ * @e: the event to send
+ *
+ * A lot of drivers need to generate vblank events for the very next vblank
+ * interrupt. For example when the page flip interrupt happens when the page
+ * flip gets armed, but not when it actually executes within the next vblank
+ * period. This helper function implements exactly the required vblank arming
+ * behaviour.
+ *
+ * Caller must hold event lock. Caller must also hold a vblank reference for 
the
+ * event @e, which will be dropped when the next vblank arrives.
+ *
+ * This is the native KMS version of drm_send_vblank_event().
+ */
+void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
+  struct drm_pending_vblank_event *e)
+{
+   drm_arm_vblank_event(crtc->dev, drm_crtc_index(crtc), e);
+}
+EXPORT_SYMBOL(drm_crtc_arm_vblank_event);
+
+/**
  * drm_send_vblank_event - helper to send vblank event after pageflip
  * @dev: DRM device
  * @pipe: CRTC index
diff --git a/drivers/gpu

[PATCH] drm/nouveau: Fix pre-nv50 pageflip events (v2)

2015-11-09 Thread Mario Kleiner
On 11/09/2015 02:02 PM, Ilia Mirkin wrote:
> On Mon, Nov 9, 2015 at 7:57 AM, Mario Kleiner
>  wrote:
>> From: Daniel Vetter 
>>
>> Apparently pre-nv50 pageflip events happen before the actual vblank
>> period. Therefore that functionality got semi-disabled in
>>
>> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
>> Author: Mario Kleiner 
>> Date:   Tue May 13 00:42:08 2014 +0200
>>
>>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>>
>> Unfortunately that hack got uprooted in
>>
>> commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
>> Author: Thierry Reding 
>> Date:   Wed Aug 12 17:00:31 2015 +0200
>>
>>  drm/irq: Make pipe unsigned and name consistent
>>
>> Trigering a warning when trying to sample the vblank timestamp for a
>> non-existing pipe. There's a few ways to fix this:
>>
>> - Open-code the old behaviour, which just enshrines this slight
>>breakage of the userspace ABI.
>>
>> - Revert Mario's commit and again inflict broken timestamps, again not
>>pretty.
>>
>> - Fix this for real by delaying the pageflip TS until the next vblank
>>interrupt, thereby making it accurate.
>>
>> This patch implements the third option. Since having a page flip
>> interrupt that happens when the pageflip gets armed and not when it
>> completes in the next vblank seems to be fairly common (older i915 hw
>> works very similarly) create a new helper to arm vblank events for
>> such drivers.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
>> Cc: Thierry Reding 
>> Cc: Mario Kleiner 
>> Cc: Ben Skeggs 
>> Cc: Ilia Mirkin 
>>
>> v2 (mario): Integrate my own review comments into Daniels patch.
>> - Fix function prototypes in drmP.h
>> - Add missing vblank_put() for pageflip completion without
>>   pageflip event.
>> - Initialize sequence number for queued pageflip event to avoid
>>   trouble in drm_handle_vblank_events().
>> - Remove dead code and spelling fix.
>>
>> Signed-off-by: Daniel Vetter 
>> Reviewed-by: Mario Kleiner 
>
> Without commenting on the actual patch, a few points of procedure:
>
> (a) If you're sending the patch, you're supposed to add your
> Signed-off-by. So you'd keep Daniel's and add yours.

I thought my tiny fixes didn't warrant adding a signed off by myself, 
but if that was wrong, consider it added:

v2: Signed-off-by: Mario Kleiner 

> (b) Since this is triggering warns for real people in real situations,
> tack on a
>
> Cc: stable at vger.kernel.org # v4.3
>

Ah, sorry, this is already a problem in a released kernel? I thought 
this was something new and so far only for drm-next.

ciao,
-mario

> Cheers,
>
>-ilia
>


[PATCH] drm/nouveau: Fix pre-nv50 pageflip events

2015-11-09 Thread Mario Kleiner
Hi,

i just sent out a (v2) of Daniels patch, with my review comments and 
reviewed-by for the code already applied to the code for convenience. 
Interspersed below in the patch the review comments for a few small bugs.

This and Daniels original patch is only compile tested. I still have 
that GeForce 7800 GTX, but unfortunately i don't have the original PC 
anymore for testing it. Today i tried to put the card as a 2nd non-boot 
card into a MacPro for testing, but the EFI based Mac apparently didn't 
like that old PC card that much, so testing was a no go. Bootup ended 
with some nouveau MMIO read and write faults and then lockup. Usually 
more recent NVidia PC cards do work in Macs under Linux with nouveau as 
non-boot gpus, but for some reason this one doesn't.

Anyway, after digging through my old e-mail conversation with Ben from a 
year ago, i think Daniel's patch should work and solve the problem quite 
elegantly:

iirc Ben explained to me that on pre-nv50, nouveau_flip_complete() 
(which calls nouveau_finish_page_flip()), is not triggered by an actual 
pageflip interrupt, but by a fifo software interrupt programmed to fire 
shortly before the vblank. On my test card it fired in the last scanline 
before vblank, probably at the end of active scanout. 
nouveau_flip_complete() would first call nouveau_finish_page_flip() to 
send the pageflip event, and then manually flip to the new framebuffer 
by calling  nv_set_crtc_base(). I think/assume nv_set_crtc_base() is not 
itself synchronized to vblank, so we should get the correct behaviour:

1. Shortly before start of vblank: fifo sw interrupt -> 
nouveau_flip_complete() -> nouveau_finish_page_flip() queues pageflip 
event for later delivery by vblank irq handler -> nv_set_crtc_base() 
flips to the new fb. Return from irq.

2. A few scanlines later, vblank irq fires -> drm_handle_vblank() 
updates vblank count and timestamps -> drm_handle_vblank_events() 
dispatches queued pageflip completion event from 1), now tagged with 
proper vblank count and timestamp of flip completion.

thanks,
-mario


On 11/06/2015 06:19 PM, Thierry Reding wrote:
> Cc += Mario Kleiner, Mario, can you take a look whether this proposed
> solution makes sense and fixes the issues you were seeing back when you
> posted the patch in commit:
>
> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
> Author: Mario Kleiner 
> Date:   Tue May 13 00:42:08 2014 +0200
>
>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>
>  Cards with nv04 display engine can't reliably use vblank
>  counts and timestamps computed via drm_handle_vblank(), as
>  the function gets invoked after sending the pageflip events.
>
>  Fix this by defaulting to the old crtcid = -1 fallback path
>  on <= NV-50 cards, and only using the precise path on NV-50
>  and later.
>
>  Signed-off-by: Mario Kleiner 
>  Signed-off-by: Ben Skeggs 
>  Cc:  # 3.13+
>
> Do you happen to still have the setup around where you saw this?
>
> Thierry
>
> On Fri, Oct 30, 2015 at 10:55:40PM +0100, Daniel Vetter wrote:
>> Apparently pre-nv50 pageflip events happen before the actual vblank
>> period. Therefore that functionality got semi-disabled in
>>
>> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
>> Author: Mario Kleiner 
>> Date:   Tue May 13 00:42:08 2014 +0200
>>
>>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>>
>> Unfortunately that hack got uprooted in
>>
>> commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
>> Author: Thierry Reding 
>> Date:   Wed Aug 12 17:00:31 2015 +0200
>>
>>  drm/irq: Make pipe unsigned and name consistent
>>
>> Trigering a warning when trying to sample the vblank timestamp for a
>> non-existing pipe. There's a few ways to fix this:
>>
>> - Open-code the old behaviour, which just enshrines this slight
>>breakage of the userspace ABI.
>>
>> - Revert Mario's commit and again inflict broken timestamps, again not
>>pretty.
>>
>> - Fix this for real by delaying the pageflip TS until the next vblank
>>interrupt, thereby making it accurate.
>>
>> This patch implements the third option. Since having a page flip
>> interrupt that happens when the pageflip gets armed and not when it
>> completes in the next vblank seems to be fairly common (older i915 hw
>> works very similarly) create a new helper to arm vblank events for
>> such drivers.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
>> Cc: Thierry Reding 
>> Cc: Mario Kleiner 
>> Cc: Ben Skeggs 
>> Cc: Ilia Mirkin 
>> Signed-off-by: Daniel Vetter 
>> ---
>>
>> Note that due to lack of hw this is c

[PATCH] drm/nouveau: Fix pre-nv50 pageflip events (v2)

2015-11-09 Thread Mario Kleiner
From: Daniel Vetter <daniel.vet...@ffwll.ch>

Apparently pre-nv50 pageflip events happen before the actual vblank
period. Therefore that functionality got semi-disabled in

commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
Author: Mario Kleiner 
Date:   Tue May 13 00:42:08 2014 +0200

drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.

Unfortunately that hack got uprooted in

commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
Author: Thierry Reding 
Date:   Wed Aug 12 17:00:31 2015 +0200

drm/irq: Make pipe unsigned and name consistent

Trigering a warning when trying to sample the vblank timestamp for a
non-existing pipe. There's a few ways to fix this:

- Open-code the old behaviour, which just enshrines this slight
  breakage of the userspace ABI.

- Revert Mario's commit and again inflict broken timestamps, again not
  pretty.

- Fix this for real by delaying the pageflip TS until the next vblank
  interrupt, thereby making it accurate.

This patch implements the third option. Since having a page flip
interrupt that happens when the pageflip gets armed and not when it
completes in the next vblank seems to be fairly common (older i915 hw
works very similarly) create a new helper to arm vblank events for
such drivers.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
Cc: Thierry Reding 
Cc: Mario Kleiner 
Cc: Ben Skeggs 
Cc: Ilia Mirkin 

v2 (mario): Integrate my own review comments into Daniels patch.
   - Fix function prototypes in drmP.h
   - Add missing vblank_put() for pageflip completion without
 pageflip event.
   - Initialize sequence number for queued pageflip event to avoid
 trouble in drm_handle_vblank_events().
   - Remove dead code and spelling fix.

Signed-off-by: Daniel Vetter 
Reviewed-by: Mario Kleiner 
---
 drivers/gpu/drm/drm_irq.c | 54 ++-
 drivers/gpu/drm/nouveau/nouveau_display.c | 19 ++-
 include/drm/drmP.h|  4 +++
 3 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index eba6337..819b8c1 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -972,7 +972,8 @@ static void send_vblank_event(struct drm_device *dev,
struct drm_pending_vblank_event *e,
unsigned long seq, struct timeval *now)
 {
-   WARN_ON_SMP(!spin_is_locked(>event_lock));
+   assert_spin_locked(>event_lock);
+
e->event.sequence = seq;
e->event.tv_sec = now->tv_sec;
e->event.tv_usec = now->tv_usec;
@@ -985,6 +986,57 @@ static void send_vblank_event(struct drm_device *dev,
 }

 /**
+ * drm_arm_vblank_event - arm vblank event after pageflip
+ * @dev: DRM device
+ * @pipe: CRTC index
+ * @e: the event to prepare to send
+ *
+ * A lot of drivers need to generate vblank events for the very next vblank
+ * interrupt. For example when the page flip interrupt happens when the page
+ * flip gets armed, but not when it actually executes within the next vblank
+ * period. This helper function implements exactly the required vblank arming
+ * behaviour.
+ *
+ * Caller must hold event lock. Caller must also hold a vblank reference for 
the
+ * event @e, which will be dropped when the next vblank arrives.
+ *
+ * This is the legacy version of drm_crtc_arm_vblank_event().
+ */
+void drm_arm_vblank_event(struct drm_device *dev, unsigned int pipe,
+ struct drm_pending_vblank_event *e)
+{
+   assert_spin_locked(>event_lock);
+
+   e->pipe = pipe;
+   e->event.sequence = drm_vblank_count(dev, pipe);
+   list_add_tail(>base.link, >vblank_event_list);
+}
+EXPORT_SYMBOL(drm_arm_vblank_event);
+
+/**
+ * drm_arm_vblank_event - arm vblank event after pageflip
+ * @crtc: the source CRTC of the vblank event
+ * @e: the event to send
+ *
+ * A lot of drivers need to generate vblank events for the very next vblank
+ * interrupt. For example when the page flip interrupt happens when the page
+ * flip gets armed, but not when it actually executes within the next vblank
+ * period. This helper function implements exactly the required vblank arming
+ * behaviour.
+ *
+ * Caller must hold event lock. Caller must also hold a vblank reference for 
the
+ * event @e, which will be dropped when the next vblank arrives.
+ *
+ * This is the native KMS version of drm_send_vblank_event().
+ */
+void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
+  struct drm_pending_vblank_event *e)
+{
+   drm_arm_vblank_event(crtc->dev, drm_crtc_index(crtc), e);
+}
+EXPORT_SYMBOL(drm_crtc_arm_vblank_event);
+
+/**
  * drm_send_vblank_event - helper to send vblank event after pageflip
  * @dev: DRM device
  * @pipe: CRTC index
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index db6bc67..64c8d93 100644
--- a/drivers/gpu/drm/nouveau/no

[PATCH] drm/nouveau: Fix pre-nv50 pageflip events

2015-11-08 Thread Mario Kleiner
Sorry for the late reply! Looking into it...
-mario

On 11/06/2015 06:19 PM, Thierry Reding wrote:
> Cc += Mario Kleiner, Mario, can you take a look whether this proposed
> solution makes sense and fixes the issues you were seeing back when you
> posted the patch in commit:
>
> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
> Author: Mario Kleiner 
> Date:   Tue May 13 00:42:08 2014 +0200
>
>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>
>  Cards with nv04 display engine can't reliably use vblank
>  counts and timestamps computed via drm_handle_vblank(), as
>  the function gets invoked after sending the pageflip events.
>
>  Fix this by defaulting to the old crtcid = -1 fallback path
>  on <= NV-50 cards, and only using the precise path on NV-50
>  and later.
>
>  Signed-off-by: Mario Kleiner 
>  Signed-off-by: Ben Skeggs 
>  Cc:  # 3.13+
>
> Do you happen to still have the setup around where you saw this?
>
> Thierry
>
> On Fri, Oct 30, 2015 at 10:55:40PM +0100, Daniel Vetter wrote:
>> Apparently pre-nv50 pageflip events happen before the actual vblank
>> period. Therefore that functionality got semi-disabled in
>>
>> commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
>> Author: Mario Kleiner 
>> Date:   Tue May 13 00:42:08 2014 +0200
>>
>>  drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
>>
>> Unfortunately that hack got uprooted in
>>
>> commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
>> Author: Thierry Reding 
>> Date:   Wed Aug 12 17:00:31 2015 +0200
>>
>>  drm/irq: Make pipe unsigned and name consistent
>>
>> Trigering a warning when trying to sample the vblank timestamp for a
>> non-existing pipe. There's a few ways to fix this:
>>
>> - Open-code the old behaviour, which just enshrines this slight
>>breakage of the userspace ABI.
>>
>> - Revert Mario's commit and again inflict broken timestamps, again not
>>pretty.
>>
>> - Fix this for real by delaying the pageflip TS until the next vblank
>>interrupt, thereby making it accurate.
>>
>> This patch implements the third option. Since having a page flip
>> interrupt that happens when the pageflip gets armed and not when it
>> completes in the next vblank seems to be fairly common (older i915 hw
>> works very similarly) create a new helper to arm vblank events for
>> such drivers.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
>> Cc: Thierry Reding 
>> Cc: Mario Kleiner 
>> Cc: Ben Skeggs 
>> Cc: Ilia Mirkin 
>> Signed-off-by: Daniel Vetter 
>> ---
>>
>> Note that due to lack of hw this is completely untested. But I think
>> it's the right way to fix this.
>> -Daniel
>> ---
>>   drivers/gpu/drm/drm_irq.c | 56 
>> ++-
>>   drivers/gpu/drm/nouveau/nouveau_display.c | 16 -
>>   include/drm/drmP.h|  4 +++
>>   3 files changed, 66 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 46dbc34b81ba..b3e1f58666a6 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -972,7 +972,8 @@ static void send_vblank_event(struct drm_device *dev,
>>  struct drm_pending_vblank_event *e,
>>  unsigned long seq, struct timeval *now)
>>   {
>> -WARN_ON_SMP(!spin_is_locked(>event_lock));
>> +assert_spin_locked(>event_lock);
>> +
>>  e->event.sequence = seq;
>>  e->event.tv_sec = now->tv_sec;
>>  e->event.tv_usec = now->tv_usec;
>> @@ -985,6 +986,59 @@ static void send_vblank_event(struct drm_device *dev,
>>   }
>>
>>   /**
>> + * drm_arm_vblank_event - arm vblanke event after pageflip
>> + * @dev: DRM device
>> + * @pipe: CRTC index
>> + * @e: the event to prepare to send
>> + *
>> + * A lot of drivers need to generate vblank events for the very next vblank
>> + * interrupt. For example when the page flip interrupt happens when the page
>> + * flip gets armed, but not when it actually executes within the next vblank
>> + * period. This helper function implements exactly the required vblank 
>> arming
>> + * behaviour.
>> + *
>> + * Caller must hold event lock. Caller must also hold a vblank reference 
>> for the
>> + * event @e, which will be dropped when the next vblank arrives.
>> + *
>> + * This is the legacy version of drm_crtc_a

[PATCH] drm/i915: Only dither on 6bpc panels

2015-08-13 Thread Mario Kleiner
Thanks for the quick fix! Comments below...

On 08/12/2015 11:43 AM, Daniel Vetter wrote:
> In
>
> commit d328c9d78d64ca11e744fe227096990430a88477
> Author: Daniel Vetter 
> Date:   Fri Apr 10 16:22:37 2015 +0200
>
>  drm/i915: Select starting pipe bpp irrespective or the primary plane
>
> we started to select the pipe bpp from sink capabilities and not from
> the primary framebuffer - that one might change (and we don't want to
> incur a modeset) and sprites might contain higher bpp content too.
>
> Problem is that now if you have a 10bpc screen and display 24bpp rgb
> primary then we select dithering, and apparently that mangles the high
> 8 bits even (even thought you'd expect dithering only to affect how
> 12bpc gets mapped into 10bpc). And that mangling upsets certain users.
>

Probably doesn't matter, but your explanation of the former problem here 
is slightly off. We also selected dithering on a 8 bpc screen displaying 
a 24bpp rgb primary, because pipe_bpp is 24 for such a typical 8 bpc 
sink, but since the commit mentioned above, base_bpp is always the 
absolute maximum supported by the hardware, e.g., 36 bpp on my Ironlake 
chip. Iow. the only way to not get dithering would have been to connect 
a deep color 12 bpc display, so pipe_bpp == 36 == base_bpp.

> Hence only enable dithering on 6bpc screens where we difinitely and
> always want it.
>

Other than that, i tested the patch on both 8 bpc output with my 
measurement equipment and on the internal laptop 6 bpc panel, and 
everything is fine now - No banding on the 6 bpc panel, no banding or 
equipment failure on the external 8 bpc output. Life is good again :)

Reviewed-and-tested-by: Mario Kleiner 

thanks,
-mario

> Cc: Mario Kleiner 
> Reported-by: Mario Kleiner 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/i915/intel_display.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index 9a2f229a1c3a..128462e0a0b5 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -12186,7 +12186,9 @@ encoder_retry:
>   goto encoder_retry;
>   }
>
> - pipe_config->dither = pipe_config->pipe_bpp != base_bpp;
> + /* Dithering seems to not pass-through bits correctly when it should, so
> +  * only enable it on 6bpc panels. */
> + pipe_config->dither = pipe_config->pipe_bpp == 6*3;
>   DRM_DEBUG_KMS("plane bpp: %i, pipe bpp: %i, dithering: %i\n",
> base_bpp, pipe_config->pipe_bpp, pipe_config->dither);
>
>


Intel-kms in Linux-4.2rc causes regression due to dithering always on.

2015-08-12 Thread Mario Kleiner
On 08/07/2015 09:14 AM, Daniel Vetter wrote:
> On Fri, Aug 07, 2015 at 12:45:52AM +0200, Mario Kleiner wrote:
>> On 08/07/2015 12:12 AM, Daniel Vetter wrote:
>>> On Thu, Aug 6, 2015 at 11:56 PM, Mario Kleiner
>>>  wrote:
>>>> Hi Daniel and all,
>>>>
>>>> since Linux 4.2 (tested with rc4), i think this commit
>>>> d328c9d78d64ca11e744fe227096990430a88477
>>>> "drm/i915: Select starting pipe bpp irrespective or the primary plane"
>>>>
>>>> causes trouble for me and my users, as tested on Intel HD Ironlake and Ivy
>>>> Bridge with MiniDP->Singlelink-DVI adapter -> Measurement device.
>>>>
>>>> Afaics it causes dithering to always be enabled on a regular 8bpc
>>>> framebuffer, even when outputting to a 8 bpc DVI-D output, and that
>>>> dithering causes my display measurement equipment and other special display
>>>> devices used for neuro-science and medical applications to fail. This
>>>> equipment requires an identity passthrough of 8 bpc framebuffer pixels to
>>>> the digital outputs, iow. dithering off.
>>>>
>>>> Log output on Linux 4.1 (good):
>>>>
>>>> Aug  1 06:39:26 twisty kernel: [  154.175394]
>>>> [drm:connected_sink_compute_bpp] [CONNECTOR:35:HDMI-A-1] checking for sink
>>>> bpp constrains
>>>> Aug  1 06:39:26 twisty kernel: [  154.175396]
>>>> [drm:intel_hdmi_compute_config] picking bpc to 8 for HDMI output
>>>> Aug  1 06:39:26 twisty kernel: [  154.175397]
>>>> [drm:intel_hdmi_compute_config] forcing pipe bpc to 24 for HDMI
>>>> Aug  1 06:39:26 twisty kernel: [  154.175400] 
>>>> [drm:ironlake_check_fdi_lanes]
>>>> checking fdi config on pipe A, lanes 1
>>>> Aug  1 06:39:26 twisty kernel: [  154.175402]
>>>> [drm:intel_modeset_pipe_config] plane bpp: 24, pipe bpp: 24, dithering: 0
>>>> Aug  1 06:39:26 twisty kernel: [  154.175403] [drm:intel_dump_pipe_config]
>>>> [CRTC:20][modeset] config for pipe A
>>>> Aug  1 06:39:26 twisty kernel: [  154.175404] [drm:intel_dump_pipe_config]
>>>> cpu_transcoder: A
>>>> Aug  1 06:39:26 twisty kernel: [  154.175405] [drm:intel_dump_pipe_config]
>>>> pipe bpp: 24, dithering: 0
>>>>
>>>> Log output on Linux 4.2-rc4 (bad):
>>>>
>>>> Aug  1 06:21:31 twisty kernel: [  200.924831]
>>>> [drm:connected_sink_compute_bpp] [CONNECTOR:36:HDMI-A-1] checking for sink
>>>> bpp constrains
>>>> Aug  1 06:21:31 twisty kernel: [  200.924832]
>>>> [drm:connected_sink_compute_bpp] clamping display bpp (was 36) to default
>>>> limit of 24
>>>> Aug  1 06:21:31 twisty kernel: [  200.924834]
>>>> [drm:intel_hdmi_compute_config] picking bpc to 8 for HDMI output
>>>> Aug  1 06:21:31 twisty kernel: [  200.924835]
>>>> [drm:intel_hdmi_compute_config] forcing pipe bpc to 24 for HDMI
>>>> Aug  1 06:21:31 twisty kernel: [  200.924838] 
>>>> [drm:ironlake_check_fdi_lanes]
>>>> checking fdi config on pipe A, lanes 1
>>>> Aug  1 06:21:31 twisty kernel: [  200.924840]
>>>> [drm:intel_modeset_pipe_config] plane bpp: 36, pipe bpp: 24, dithering: 1
>>>> Aug  1 06:21:31 twisty kernel: [  200.924841] [drm:intel_dump_pipe_config]
>>>> [CRTC:21][modeset] config 880131a5c800 for pipe A
>>>> Aug  1 06:21:31 twisty kernel: [  200.924842] [drm:intel_dump_pipe_config]
>>>> cpu_transcoder: A
>>>> Aug  1 06:21:31 twisty kernel: [  200.924843] [drm:intel_dump_pipe_config]
>>>> pipe bpp: 24, dithering: 1
>>>>
>>>> Ideas what to do about this?
>>>
>>> Well I somehow assumed the dither bit would be sane and not wreak
>>> havoc with the lower bits when they would fit into the final bpc pipe
>>> mode ... Can you confirm with your equipment that we seem to be doing
>>> 8bpc->6bpc dithering on the 8bpc sink?
>>>
>>
>> It will need a bit of work to find this out when i'm back in the lab. So far
>> i just know something bad is happening to the signal and i assume it's the
>> dithering, because the visual error pattern of messiness looks like that
>> caused by dithering. E.g., on a static framebuffer i see some repeating
>> pattern over the screen, but the pattern changes with every OpenGL
>> bufferswap, even if i swap to the same fb content, as if the swap triggers
>> some change of the spatial dither pattern (assuming PIPECONF_DITHE

Intel-kms in Linux-4.2rc causes regression due to dithering always on.

2015-08-07 Thread Mario Kleiner
On 08/07/2015 12:12 AM, Daniel Vetter wrote:
> On Thu, Aug 6, 2015 at 11:56 PM, Mario Kleiner
>  wrote:
>> Hi Daniel and all,
>>
>> since Linux 4.2 (tested with rc4), i think this commit
>> d328c9d78d64ca11e744fe227096990430a88477
>> "drm/i915: Select starting pipe bpp irrespective or the primary plane"
>>
>> causes trouble for me and my users, as tested on Intel HD Ironlake and Ivy
>> Bridge with MiniDP->Singlelink-DVI adapter -> Measurement device.
>>
>> Afaics it causes dithering to always be enabled on a regular 8bpc
>> framebuffer, even when outputting to a 8 bpc DVI-D output, and that
>> dithering causes my display measurement equipment and other special display
>> devices used for neuro-science and medical applications to fail. This
>> equipment requires an identity passthrough of 8 bpc framebuffer pixels to
>> the digital outputs, iow. dithering off.
>>
>> Log output on Linux 4.1 (good):
>>
>> Aug  1 06:39:26 twisty kernel: [  154.175394]
>> [drm:connected_sink_compute_bpp] [CONNECTOR:35:HDMI-A-1] checking for sink
>> bpp constrains
>> Aug  1 06:39:26 twisty kernel: [  154.175396]
>> [drm:intel_hdmi_compute_config] picking bpc to 8 for HDMI output
>> Aug  1 06:39:26 twisty kernel: [  154.175397]
>> [drm:intel_hdmi_compute_config] forcing pipe bpc to 24 for HDMI
>> Aug  1 06:39:26 twisty kernel: [  154.175400] [drm:ironlake_check_fdi_lanes]
>> checking fdi config on pipe A, lanes 1
>> Aug  1 06:39:26 twisty kernel: [  154.175402]
>> [drm:intel_modeset_pipe_config] plane bpp: 24, pipe bpp: 24, dithering: 0
>> Aug  1 06:39:26 twisty kernel: [  154.175403] [drm:intel_dump_pipe_config]
>> [CRTC:20][modeset] config for pipe A
>> Aug  1 06:39:26 twisty kernel: [  154.175404] [drm:intel_dump_pipe_config]
>> cpu_transcoder: A
>> Aug  1 06:39:26 twisty kernel: [  154.175405] [drm:intel_dump_pipe_config]
>> pipe bpp: 24, dithering: 0
>>
>> Log output on Linux 4.2-rc4 (bad):
>>
>> Aug  1 06:21:31 twisty kernel: [  200.924831]
>> [drm:connected_sink_compute_bpp] [CONNECTOR:36:HDMI-A-1] checking for sink
>> bpp constrains
>> Aug  1 06:21:31 twisty kernel: [  200.924832]
>> [drm:connected_sink_compute_bpp] clamping display bpp (was 36) to default
>> limit of 24
>> Aug  1 06:21:31 twisty kernel: [  200.924834]
>> [drm:intel_hdmi_compute_config] picking bpc to 8 for HDMI output
>> Aug  1 06:21:31 twisty kernel: [  200.924835]
>> [drm:intel_hdmi_compute_config] forcing pipe bpc to 24 for HDMI
>> Aug  1 06:21:31 twisty kernel: [  200.924838] [drm:ironlake_check_fdi_lanes]
>> checking fdi config on pipe A, lanes 1
>> Aug  1 06:21:31 twisty kernel: [  200.924840]
>> [drm:intel_modeset_pipe_config] plane bpp: 36, pipe bpp: 24, dithering: 1
>> Aug  1 06:21:31 twisty kernel: [  200.924841] [drm:intel_dump_pipe_config]
>> [CRTC:21][modeset] config 880131a5c800 for pipe A
>> Aug  1 06:21:31 twisty kernel: [  200.924842] [drm:intel_dump_pipe_config]
>> cpu_transcoder: A
>> Aug  1 06:21:31 twisty kernel: [  200.924843] [drm:intel_dump_pipe_config]
>> pipe bpp: 24, dithering: 1
>>
>> Ideas what to do about this?
>
> Well I somehow assumed the dither bit would be sane and not wreak
> havoc with the lower bits when they would fit into the final bpc pipe
> mode ... Can you confirm with your equipment that we seem to be doing
> 8bpc->6bpc dithering on the 8bpc sink?
>

It will need a bit of work to find this out when i'm back in the lab. So 
far i just know something bad is happening to the signal and i assume 
it's the dithering, because the visual error pattern of messiness looks 
like that caused by dithering. E.g., on a static framebuffer i see some 
repeating pattern over the screen, but the pattern changes with every 
OpenGL bufferswap, even if i swap to the same fb content, as if the swap 
triggers some change of the spatial dither pattern (assuming 
PIPECONF_DITHER_TYPE_SP = spatial dithering?)

> If that's the case we simply limit to only ever dither when the sink
> is 6bpc, and not in any other case.
> -Daniel
>

That would be an improvement for my immediate problem if that works. But 
assuming we have 10 bpc framebuffers at some point, dithering 10 bpc -> 
8 bpc would also have some practical use.

Probably some dynamic check would be good, a la if there is a mismatch 
between the max(bpc) over all active planes and the supported depth of 
the sink then dither?

It's not clear to me where the dithering happens on intel hw. I'd 
expected that with a 24 bpp framebuffer feeding into a 24 bpp pipe, 
dithering simply wouldn't do anything even if enabled.

-mario


Intel-kms in Linux-4.2rc causes regression due to dithering always on.

2015-08-07 Thread Mario Kleiner
Hi Daniel and all,

since Linux 4.2 (tested with rc4), i think this commit 
d328c9d78d64ca11e744fe227096990430a88477
"drm/i915: Select starting pipe bpp irrespective or the primary plane"

causes trouble for me and my users, as tested on Intel HD Ironlake and 
Ivy Bridge with MiniDP->Singlelink-DVI adapter -> Measurement device.

Afaics it causes dithering to always be enabled on a regular 8bpc 
framebuffer, even when outputting to a 8 bpc DVI-D output, and that 
dithering causes my display measurement equipment and other special 
display devices used for neuro-science and medical applications to fail. 
This equipment requires an identity passthrough of 8 bpc framebuffer 
pixels to the digital outputs, iow. dithering off.

Log output on Linux 4.1 (good):

Aug  1 06:39:26 twisty kernel: [  154.175394] 
[drm:connected_sink_compute_bpp] [CONNECTOR:35:HDMI-A-1] checking for 
sink bpp constrains
Aug  1 06:39:26 twisty kernel: [  154.175396] 
[drm:intel_hdmi_compute_config] picking bpc to 8 for HDMI output
Aug  1 06:39:26 twisty kernel: [  154.175397] 
[drm:intel_hdmi_compute_config] forcing pipe bpc to 24 for HDMI
Aug  1 06:39:26 twisty kernel: [  154.175400] 
[drm:ironlake_check_fdi_lanes] checking fdi config on pipe A, lanes 1
Aug  1 06:39:26 twisty kernel: [  154.175402] 
[drm:intel_modeset_pipe_config] plane bpp: 24, pipe bpp: 24, dithering: 0
Aug  1 06:39:26 twisty kernel: [  154.175403] 
[drm:intel_dump_pipe_config] [CRTC:20][modeset] config for pipe A
Aug  1 06:39:26 twisty kernel: [  154.175404] 
[drm:intel_dump_pipe_config] cpu_transcoder: A
Aug  1 06:39:26 twisty kernel: [  154.175405] 
[drm:intel_dump_pipe_config] pipe bpp: 24, dithering: 0

Log output on Linux 4.2-rc4 (bad):

Aug  1 06:21:31 twisty kernel: [  200.924831] 
[drm:connected_sink_compute_bpp] [CONNECTOR:36:HDMI-A-1] checking for 
sink bpp constrains
Aug  1 06:21:31 twisty kernel: [  200.924832] 
[drm:connected_sink_compute_bpp] clamping display bpp (was 36) to 
default limit of 24
Aug  1 06:21:31 twisty kernel: [  200.924834] 
[drm:intel_hdmi_compute_config] picking bpc to 8 for HDMI output
Aug  1 06:21:31 twisty kernel: [  200.924835] 
[drm:intel_hdmi_compute_config] forcing pipe bpc to 24 for HDMI
Aug  1 06:21:31 twisty kernel: [  200.924838] 
[drm:ironlake_check_fdi_lanes] checking fdi config on pipe A, lanes 1
Aug  1 06:21:31 twisty kernel: [  200.924840] 
[drm:intel_modeset_pipe_config] plane bpp: 36, pipe bpp: 24, dithering: 1
Aug  1 06:21:31 twisty kernel: [  200.924841] 
[drm:intel_dump_pipe_config] [CRTC:21][modeset] config 880131a5c800 
for pipe A
Aug  1 06:21:31 twisty kernel: [  200.924842] 
[drm:intel_dump_pipe_config] cpu_transcoder: A
Aug  1 06:21:31 twisty kernel: [  200.924843] 
[drm:intel_dump_pipe_config] pipe bpp: 24, dithering: 1

Ideas what to do about this?

thanks,
-mario


[PATCH 2/2] drm/amdgpu: Handle irqs only based on irq ring, not irq status regs.

2015-07-03 Thread Mario Kleiner
This is a translation of the patch ...
"drm/radeon: Handle irqs only based on irq ring, not irq status regs."
... for the vblank irq handling, to fix the same problem described
in that patch on the new driver.

Only compile tested due to lack of suitable hw.

Signed-off-by: Mario Kleiner 
CC: Michel Dänzer 
CC: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/dce_v10_0.c | 22 ++
 drivers/gpu/drm/amd/amdgpu/dce_v11_0.c | 22 ++
 drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 22 ++
 3 files changed, 42 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
index 5cde635..6e77964 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
@@ -3403,19 +3403,25 @@ static int dce_v10_0_crtc_irq(struct amdgpu_device 
*adev,

switch (entry->src_data) {
case 0: /* vblank */
-   if (disp_int & interrupt_status_offsets[crtc].vblank) {
+   if (disp_int & interrupt_status_offsets[crtc].vblank)
dce_v10_0_crtc_vblank_int_ack(adev, crtc);
-   if (amdgpu_irq_enabled(adev, source, irq_type)) {
-   drm_handle_vblank(adev->ddev, crtc);
-   }
-   DRM_DEBUG("IH: D%d vblank\n", crtc + 1);
+   else
+   DRM_DEBUG("IH: IH event w/o asserted irq bit?\n");
+
+   if (amdgpu_irq_enabled(adev, source, irq_type)) {
+   drm_handle_vblank(adev->ddev, crtc);
}
+   DRM_DEBUG("IH: D%d vblank\n", crtc + 1);
+
break;
case 1: /* vline */
-   if (disp_int & interrupt_status_offsets[crtc].vline) {
+   if (disp_int & interrupt_status_offsets[crtc].vline)
dce_v10_0_crtc_vline_int_ack(adev, crtc);
-   DRM_DEBUG("IH: D%d vline\n", crtc + 1);
-   }
+   else
+   DRM_DEBUG("IH: IH event w/o asserted irq bit?\n");
+
+   DRM_DEBUG("IH: D%d vline\n", crtc + 1);
+
break;
default:
DRM_DEBUG("Unhandled interrupt: %d %d\n", entry->src_id, 
entry->src_data);
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
index 95efd98..7f7abb0 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
@@ -3402,19 +3402,25 @@ static int dce_v11_0_crtc_irq(struct amdgpu_device 
*adev,

switch (entry->src_data) {
case 0: /* vblank */
-   if (disp_int & interrupt_status_offsets[crtc].vblank) {
+   if (disp_int & interrupt_status_offsets[crtc].vblank)
dce_v11_0_crtc_vblank_int_ack(adev, crtc);
-   if (amdgpu_irq_enabled(adev, source, irq_type)) {
-   drm_handle_vblank(adev->ddev, crtc);
-   }
-   DRM_DEBUG("IH: D%d vblank\n", crtc + 1);
+   else
+   DRM_DEBUG("IH: IH event w/o asserted irq bit?\n");
+
+   if (amdgpu_irq_enabled(adev, source, irq_type)) {
+   drm_handle_vblank(adev->ddev, crtc);
}
+   DRM_DEBUG("IH: D%d vblank\n", crtc + 1);
+
break;
case 1: /* vline */
-   if (disp_int & interrupt_status_offsets[crtc].vline) {
+   if (disp_int & interrupt_status_offsets[crtc].vline)
dce_v11_0_crtc_vline_int_ack(adev, crtc);
-   DRM_DEBUG("IH: D%d vline\n", crtc + 1);
-   }
+   else
+   DRM_DEBUG("IH: IH event w/o asserted irq bit?\n");
+
+   DRM_DEBUG("IH: D%d vline\n", crtc + 1);
+
break;
default:
DRM_DEBUG("Unhandled interrupt: %d %d\n", entry->src_id, 
entry->src_data);
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
index 72c27ac..2694e54 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
@@ -3237,19 +3237,25 @@ static int dce_v8_0_crtc_irq(struct amdgpu_device *adev,

switch (entry->src_data) {
case 0: /* vblank */
-   if (disp_int & interrupt_status_offsets[crtc].vblank) {
+   if (disp_int & interrupt_status_offsets[crtc].vblank)
WREG32(mmLB_VBLANK_STATUS + crtc_offsets[crtc], 
LB_VBLANK_STATUS__VBLANK_ACK_MASK);
-   if (amdgpu_irq_enabled(adev, source, irq_type)) {
-

[PATCH 1/2] drm/radeon: Handle irqs only based on irq ring, not irq status regs.

2015-07-03 Thread Mario Kleiner
Trying to resolve issues with missed vblanks and impossible
values inside delivered kms pageflip completion events showed
that radeon's irq handling sometimes doesn't handle valid irqs,
but silently skips them. This was observed for vblank interrupts.

Although those irqs have corresponding events queued in the gpu's
irq ring at time of interrupt, and therefore the corresponding
handling code gets triggered by these events, the handling code
sometimes silently skipped processing the irq. The reason for those
skips is that the handling code double-checks for each irq event if
the corresponding irq status bits in the irq status registers
are set. Sometimes those bits are not set at time of check
for valid irqs, maybe due to some hardware race on some setups?

The problem only seems to happen on some machine + card combos
sometimes, e.g., never happened during my testing of different PC
cards of the DCE-2/3/4 generation a year ago, but happens consistently
now on two different Apple Mac cards (RV730, DCE-3, Apple iMac and
Evergreen JUNIPER, DCE-4 in a Apple MacPro). It also doesn't happen
at each interrupt but only occassionally every couple of
hundred or thousand vblank interrupts.

This results in XOrg warning messages like

"[  7084.472] (WW) RADEON(0): radeon_dri2_flip_event_handler:
Pageflip completion event has impossible msc 420120 < target_msc 420121"

as well as skipped frames and problems for applications that
use kms pageflip events or vblank events, e.g., users of DRI2 and
DRI3/Present, Waylands Weston compositor, etc. See also

https://bugs.freedesktop.org/show_bug.cgi?id=85203

After some talking to Alex and Michel, we decided to fix this
by turning the double-check for asserted irq status bits into a
warning. Whenever a irq event is queued in the IH ring, always
execute the corresponding interrupt handler. Still check the irq
status bits, but only to log a DRM_DEBUG message on a mismatch.

This fixed the problems reliably on both previously failing
cards, RV-730 dual-head tested on both crtcs (pipes D1 and D2)
and a triple-output Juniper HD-5770 card tested on all three
available crtcs (D1/D2/D3). The r600 and evergreen irq handling
is therefore tested, but the cik an si handling is only compile
tested due to lack of hw.

Signed-off-by: Mario Kleiner 
CC: Michel Dänzer 
CC: Alex Deucher 

CC:  # v3.16+
---
 drivers/gpu/drm/radeon/cik.c   | 336 +--
 drivers/gpu/drm/radeon/evergreen.c | 392 -
 drivers/gpu/drm/radeon/r600.c  | 155 ---
 drivers/gpu/drm/radeon/si.c| 336 +--
 4 files changed, 688 insertions(+), 531 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index b0688b0..35917e7 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -7930,23 +7930,27 @@ restart_ih:
case 1: /* D1 vblank/vline */
switch (src_data) {
case 0: /* D1 vblank */
-   if (rdev->irq.stat_regs.cik.disp_int & 
LB_D1_VBLANK_INTERRUPT) {
-   if (rdev->irq.crtc_vblank_int[0]) {
-   drm_handle_vblank(rdev->ddev, 
0);
-   rdev->pm.vblank_sync = true;
-   
wake_up(>irq.vblank_queue);
-   }
-   if (atomic_read(>irq.pflip[0]))
-   radeon_crtc_handle_vblank(rdev, 
0);
-   rdev->irq.stat_regs.cik.disp_int &= 
~LB_D1_VBLANK_INTERRUPT;
-   DRM_DEBUG("IH: D1 vblank\n");
+   if (!(rdev->irq.stat_regs.cik.disp_int & 
LB_D1_VBLANK_INTERRUPT))
+   DRM_DEBUG("IH: IH event w/o asserted 
irq bit?\n");
+
+   if (rdev->irq.crtc_vblank_int[0]) {
+   drm_handle_vblank(rdev->ddev, 0);
+   rdev->pm.vblank_sync = true;
+   wake_up(>irq.vblank_queue);
}
+   if (atomic_read(>irq.pflip[0]))
+   radeon_crtc_handle_vblank(rdev, 0);
+   rdev->irq.stat_regs.cik.disp_int &= 
~LB_D1_VBLANK_INTERRUPT;
+   DRM_DEBUG("IH: D1 vblank\n");
+
break;
case 1: /* D1 vline */
-   if (rdev->irq.stat_regs.cik.disp_int & 
LB_D1_VLINE_INTERRUPT) {
-   rdev->irq.stat_

[PATCH] drm/nouveau: Use drm_vblank_on/off consistently

2015-06-17 Thread Mario Kleiner


On 06/15/2015 08:07 AM, Daniel Vetter wrote:
> In
>
> commit 9cba5efab5a8145ae6c52ea273553f069c294482
> Author: Mario Kleiner 
> Date:   Tue Jul 29 02:36:44 2014 +0200
>
>  drm/nouveau: Dis/Enable vblank irqs during suspend/resume
>
> drm_vblank_on/off calls where added around suspend/resume to make sure
> vblank stay doesn't go boom over that transition. But nouveau already
> used drm_vblank_pre/post_modeset over modesets. Instead use
> drm_vblank_on/off everyhwere. The slight change here is that after
> _off drm_vblank_get will refuse to work right away, but nouveau
> doesn't seem to depend upon that anywhere outside of the pageflip
> paths.
>
> The longer-term plan here is to switch all kms drivers to
> drm_vblank_on/off so that common code like pending event cleanup can
> be done there, while drm_vblank_pre/post_modeset will be purely
> drm internal for the old UMS ioctl.
>
> Note that the drm_vblank_off still seems required in the suspend path
> since nouveau doesn't explicitly disable crtcs. But on the resume side
> drm_helper_resume_force_mode should end up calling drm_vblank_on
> through the nouveau crtc hooks already. Hence remove the call in the
> resume code.
>
> v2: Don't forget about nv50+, reported by Mario.
>
> Tested-by: Mario Kleiner 
> Cc: Mario Kleiner 
> Cc: Ben Skeggs 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/nouveau/dispnv04/crtc.c   | 4 ++--
>   drivers/gpu/drm/nouveau/nouveau_display.c | 6 +-
>   drivers/gpu/drm/nouveau/nv50_display.c| 8 
>   3 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index 3d96b49fe662..dab24066fa21 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -708,7 +708,7 @@ static void nv_crtc_prepare(struct drm_crtc *crtc)
>   if (nv_two_heads(dev))
>   NVSetOwner(dev, nv_crtc->index);
>
> - drm_vblank_pre_modeset(dev, nv_crtc->index);
> + drm_vblank_off(dev, nv_crtc->index);
>   funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
>
>   NVBlankScreen(dev, nv_crtc->index, true);
> @@ -740,7 +740,7 @@ static void nv_crtc_commit(struct drm_crtc *crtc)
>   #endif
>
>   funcs->dpms(crtc, DRM_MODE_DPMS_ON);
> - drm_vblank_post_modeset(dev, nv_crtc->index);
> + drm_vblank_on(dev, nv_crtc->index);
>   }
>
>   static void nv_crtc_destroy(struct drm_crtc *crtc)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
> b/drivers/gpu/drm/nouveau/nouveau_display.c
> index 8670d90cdc11..9d2d647da3aa 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
> @@ -584,7 +584,7 @@ nouveau_display_resume(struct drm_device *dev, bool 
> runtime)
>   {
>   struct nouveau_drm *drm = nouveau_drm(dev);
>   struct drm_crtc *crtc;
> - int ret, head;
> + int ret;
>
>   /* re-pin fb/cursors */
>   list_for_each_entry(crtc, >mode_config.crtc_list, head) {
> @@ -620,10 +620,6 @@ nouveau_display_resume(struct drm_device *dev, bool 
> runtime)
>   nv_crtc->lut.depth = 0;
>   }
>
> - /* Make sure that drm and hw vblank irqs get resumed if needed. */
> - for (head = 0; head < dev->mode_config.num_crtc; head++)
> - drm_vblank_on(dev, head);
> -
>   /* This should ensure we don't hit a locking problem when someone
>* wakes us up via a connector.  We should never go into suspend
>* while the display is on anyways.
> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c 
> b/drivers/gpu/drm/nouveau/nv50_display.c
> index 7da7958556a3..a16c37d8f7e1 100644
> --- a/drivers/gpu/drm/nouveau/nv50_display.c
> +++ b/drivers/gpu/drm/nouveau/nv50_display.c
> @@ -997,6 +997,10 @@ nv50_crtc_cursor_show_hide(struct nouveau_crtc *nv_crtc, 
> bool show, bool update)
>   static void
>   nv50_crtc_dpms(struct drm_crtc *crtc, int mode)
>   {
> + if (mode == DRM_MODE_DPMS_ON)
> + drm_crtc_vblank_on(crtc);
> + else
> + drm_crtc_vblank_off(crtc);
>   }
>
>   static void
> @@ -1036,6 +1040,8 @@ nv50_crtc_prepare(struct drm_crtc *crtc)
>   }
>
>   nv50_crtc_cursor_show_hide(nv_crtc, false, false);
> +
> + drm_crtc_vblank_on(crtc);
>   }
>
>   static void
> @@ -1045,6 +1051,8 @@ nv50_crtc_commit(struct drm_crtc *crtc)
>   struct nv50_mast *mast = nv50_mast(crtc->dev);
>   u32 *push;
>
> + drm_crtc_vblank_on(crtc);
> +
>   push = evo_wait(mast, 32);
>   if (push) {
>   if (nv50_vers(mast) < G82_DISP_CORE_CHANNEL_DMA) {
>

This is

Reviewed-and-tested-by: Mario Kleiner 

-mario



[PATCH 1/3] drm/nouveau: Use drm_vblank_on/off consistently

2015-06-05 Thread Mario Kleiner
On 05/29/2015 07:35 PM, Daniel Vetter wrote:
> On Fri, May 29, 2015 at 07:23:35PM +0200, Mario Kleiner wrote:
>>
>>
>> On 05/29/2015 07:19 PM, Daniel Vetter wrote:
>>> On Fri, May 29, 2015 at 06:50:06PM +0200, Mario Kleiner wrote:
>>>> On 05/27/2015 11:04 AM, Daniel Vetter wrote:
>>>>> In
>>>>>
>>>>> commit 9cba5efab5a8145ae6c52ea273553f069c294482
>>>>> Author: Mario Kleiner 
>>>>> Date:   Tue Jul 29 02:36:44 2014 +0200
>>>>>
>>>>>  drm/nouveau: Dis/Enable vblank irqs during suspend/resume
>>>>>
>>>>> drm_vblank_on/off calls where added around suspend/resume to make sure
>>>>> vblank stay doesn't go boom over that transition. But nouveau already
>>>>> used drm_vblank_pre/post_modeset over modesets. Instead use
>>>>> drm_vblank_on/off everyhwere. The slight change here is that after
>>>>> _off drm_vblank_get will refuse to work right away, but nouveau
>>>>> doesn't seem to depend upon that anywhere outside of the pageflip
>>>>> paths.
>>>>>
>>>>> The longer-term plan here is to switch all kms drivers to
>>>>> drm_vblank_on/off so that common code like pending event cleanup can
>>>>> be done there, while drm_vblank_pre/post_modeset will be purely
>>>>> drm internal for the old UMS ioctl.
>>>>>
>>>>> Note that the drm_vblank_off still seems required in the suspend path
>>>>> since nouveau doesn't explicitly disable crtcs. But on the resume side
>>>>> drm_helper_resume_force_mode should end up calling drm_vblank_on
>>>>> through the nouveau crtc hooks already. Hence remove the call in the
>>>>> resume code.
>>>>>
>>>>> Cc: Mario Kleiner 
>>>>> Cc: Ben Skeggs 
>>>>> Signed-off-by: Daniel Vetter 
>>>>> ---
>>>>>   drivers/gpu/drm/nouveau/dispnv04/crtc.c   | 4 ++--
>>>>>   drivers/gpu/drm/nouveau/nouveau_display.c | 4 
>>>>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
>>>>> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>>>> index 3d96b49fe662..dab24066fa21 100644
>>>>> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>>>> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>>>> @@ -708,7 +708,7 @@ static void nv_crtc_prepare(struct drm_crtc *crtc)
>>>>>   if (nv_two_heads(dev))
>>>>>   NVSetOwner(dev, nv_crtc->index);
>>>>>
>>>>> - drm_vblank_pre_modeset(dev, nv_crtc->index);
>>>>> + drm_vblank_off(dev, nv_crtc->index);
>>>>>   funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
>>>>>
>>>>>   NVBlankScreen(dev, nv_crtc->index, true);
>>>>> @@ -740,7 +740,7 @@ static void nv_crtc_commit(struct drm_crtc *crtc)
>>>>>   #endif
>>>>>
>>>>>   funcs->dpms(crtc, DRM_MODE_DPMS_ON);
>>>>> - drm_vblank_post_modeset(dev, nv_crtc->index);
>>>>> + drm_vblank_on(dev, nv_crtc->index);
>>>>>   }
>>>>
>>>> The above hunk is probably correct, but i couldn't test it without
>>>> sufficiently old pre-nv 50 hardware.
>>>>
>>>>>
>>>>>   static void nv_crtc_destroy(struct drm_crtc *crtc)
>>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
>>>>> b/drivers/gpu/drm/nouveau/nouveau_display.c
>>>>> index 8670d90cdc11..d824023f9fc6 100644
>>>>> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
>>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
>>>>> @@ -620,10 +620,6 @@ nouveau_display_resume(struct drm_device *dev, bool 
>>>>> runtime)
>>>>>   nv_crtc->lut.depth = 0;
>>>>>   }
>>>>>
>>>>> - /* Make sure that drm and hw vblank irqs get resumed if needed. */
>>>>> - for (head = 0; head < dev->mode_config.num_crtc; head++)
>>>>> - drm_vblank_on(dev, head);
>>>>> -
>>>>>   /* This should ensure we don't hit a locking problem when 
>>>>> someone
>>>>>* wakes us up via a connector.  We should never go into suspend
>>>>>* while the 

[PATCH 1/3] drm/nouveau: Use drm_vblank_on/off consistently

2015-05-29 Thread Mario Kleiner


On 05/29/2015 07:19 PM, Daniel Vetter wrote:
> On Fri, May 29, 2015 at 06:50:06PM +0200, Mario Kleiner wrote:
>> On 05/27/2015 11:04 AM, Daniel Vetter wrote:
>>> In
>>>
>>> commit 9cba5efab5a8145ae6c52ea273553f069c294482
>>> Author: Mario Kleiner 
>>> Date:   Tue Jul 29 02:36:44 2014 +0200
>>>
>>>  drm/nouveau: Dis/Enable vblank irqs during suspend/resume
>>>
>>> drm_vblank_on/off calls where added around suspend/resume to make sure
>>> vblank stay doesn't go boom over that transition. But nouveau already
>>> used drm_vblank_pre/post_modeset over modesets. Instead use
>>> drm_vblank_on/off everyhwere. The slight change here is that after
>>> _off drm_vblank_get will refuse to work right away, but nouveau
>>> doesn't seem to depend upon that anywhere outside of the pageflip
>>> paths.
>>>
>>> The longer-term plan here is to switch all kms drivers to
>>> drm_vblank_on/off so that common code like pending event cleanup can
>>> be done there, while drm_vblank_pre/post_modeset will be purely
>>> drm internal for the old UMS ioctl.
>>>
>>> Note that the drm_vblank_off still seems required in the suspend path
>>> since nouveau doesn't explicitly disable crtcs. But on the resume side
>>> drm_helper_resume_force_mode should end up calling drm_vblank_on
>>> through the nouveau crtc hooks already. Hence remove the call in the
>>> resume code.
>>>
>>> Cc: Mario Kleiner 
>>> Cc: Ben Skeggs 
>>> Signed-off-by: Daniel Vetter 
>>> ---
>>>   drivers/gpu/drm/nouveau/dispnv04/crtc.c   | 4 ++--
>>>   drivers/gpu/drm/nouveau/nouveau_display.c | 4 
>>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
>>> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>> index 3d96b49fe662..dab24066fa21 100644
>>> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
>>> @@ -708,7 +708,7 @@ static void nv_crtc_prepare(struct drm_crtc *crtc)
>>> if (nv_two_heads(dev))
>>> NVSetOwner(dev, nv_crtc->index);
>>>
>>> -   drm_vblank_pre_modeset(dev, nv_crtc->index);
>>> +   drm_vblank_off(dev, nv_crtc->index);
>>> funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
>>>
>>> NVBlankScreen(dev, nv_crtc->index, true);
>>> @@ -740,7 +740,7 @@ static void nv_crtc_commit(struct drm_crtc *crtc)
>>>   #endif
>>>
>>> funcs->dpms(crtc, DRM_MODE_DPMS_ON);
>>> -   drm_vblank_post_modeset(dev, nv_crtc->index);
>>> +   drm_vblank_on(dev, nv_crtc->index);
>>>   }
>>
>> The above hunk is probably correct, but i couldn't test it without
>> sufficiently old pre-nv 50 hardware.
>>
>>>
>>>   static void nv_crtc_destroy(struct drm_crtc *crtc)
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_display.c
>>> index 8670d90cdc11..d824023f9fc6 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
>>> @@ -620,10 +620,6 @@ nouveau_display_resume(struct drm_device *dev, bool 
>>> runtime)
>>> nv_crtc->lut.depth = 0;
>>> }
>>>
>>> -   /* Make sure that drm and hw vblank irqs get resumed if needed. */
>>> -   for (head = 0; head < dev->mode_config.num_crtc; head++)
>>> -   drm_vblank_on(dev, head);
>>> -
>>> /* This should ensure we don't hit a locking problem when someone
>>>  * wakes us up via a connector.  We should never go into suspend
>>>  * while the display is on anyways.
>>>
>>
>> Tested this one and this hunk breaks suspend/resume. After a suspend/resume
>> cycle, all OpenGL apps and composited desktop are dead, as the core can't
>> get any vblank irq's enabled anymore.
>>
>> So the drm_vblank_on() is still needed here.
>
> Hm that's very surprising. As mentioned above the force_mode_restore
> should be calling nv_crtc_prepare already and fix this all up for us. I
> guess I need to dig out my nv card and trace what's really going on here.
>
> Enabling interrupts when the crtc is off isn't a good idea.
> -Daniel
>

I think the nv_crtc_prepare() path modified in your first hunk is only 
for the original nv04 display engine for very old cards. nv50+ 
(GeForce-8 and later) take different paths.

-mario


[PATCH 1/3] drm/nouveau: Use drm_vblank_on/off consistently

2015-05-29 Thread Mario Kleiner
On 05/27/2015 11:04 AM, Daniel Vetter wrote:
> In
>
> commit 9cba5efab5a8145ae6c52ea273553f069c294482
> Author: Mario Kleiner 
> Date:   Tue Jul 29 02:36:44 2014 +0200
>
>  drm/nouveau: Dis/Enable vblank irqs during suspend/resume
>
> drm_vblank_on/off calls where added around suspend/resume to make sure
> vblank stay doesn't go boom over that transition. But nouveau already
> used drm_vblank_pre/post_modeset over modesets. Instead use
> drm_vblank_on/off everyhwere. The slight change here is that after
> _off drm_vblank_get will refuse to work right away, but nouveau
> doesn't seem to depend upon that anywhere outside of the pageflip
> paths.
>
> The longer-term plan here is to switch all kms drivers to
> drm_vblank_on/off so that common code like pending event cleanup can
> be done there, while drm_vblank_pre/post_modeset will be purely
> drm internal for the old UMS ioctl.
>
> Note that the drm_vblank_off still seems required in the suspend path
> since nouveau doesn't explicitly disable crtcs. But on the resume side
> drm_helper_resume_force_mode should end up calling drm_vblank_on
> through the nouveau crtc hooks already. Hence remove the call in the
> resume code.
>
> Cc: Mario Kleiner 
> Cc: Ben Skeggs 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/nouveau/dispnv04/crtc.c   | 4 ++--
>   drivers/gpu/drm/nouveau/nouveau_display.c | 4 
>   2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index 3d96b49fe662..dab24066fa21 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -708,7 +708,7 @@ static void nv_crtc_prepare(struct drm_crtc *crtc)
>   if (nv_two_heads(dev))
>   NVSetOwner(dev, nv_crtc->index);
>
> - drm_vblank_pre_modeset(dev, nv_crtc->index);
> + drm_vblank_off(dev, nv_crtc->index);
>   funcs->dpms(crtc, DRM_MODE_DPMS_OFF);
>
>   NVBlankScreen(dev, nv_crtc->index, true);
> @@ -740,7 +740,7 @@ static void nv_crtc_commit(struct drm_crtc *crtc)
>   #endif
>
>   funcs->dpms(crtc, DRM_MODE_DPMS_ON);
> - drm_vblank_post_modeset(dev, nv_crtc->index);
> + drm_vblank_on(dev, nv_crtc->index);
>   }

The above hunk is probably correct, but i couldn't test it without 
sufficiently old pre-nv 50 hardware.

>
>   static void nv_crtc_destroy(struct drm_crtc *crtc)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
> b/drivers/gpu/drm/nouveau/nouveau_display.c
> index 8670d90cdc11..d824023f9fc6 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
> @@ -620,10 +620,6 @@ nouveau_display_resume(struct drm_device *dev, bool 
> runtime)
>   nv_crtc->lut.depth = 0;
>   }
>
> - /* Make sure that drm and hw vblank irqs get resumed if needed. */
> - for (head = 0; head < dev->mode_config.num_crtc; head++)
> - drm_vblank_on(dev, head);
> -
>   /* This should ensure we don't hit a locking problem when someone
>* wakes us up via a connector.  We should never go into suspend
>* while the display is on anyways.
>

Tested this one and this hunk breaks suspend/resume. After a 
suspend/resume cycle, all OpenGL apps and composited desktop are dead, 
as the core can't get any vblank irq's enabled anymore.

So the drm_vblank_on() is still needed here.

thanks,
-mario


[Intel-gfx] [PATCH] drm/plane-helper: Adapt cursor hack to transitional helpers

2015-05-21 Thread Mario Kleiner
On 05/20/2015 10:36 AM, Daniel Vetter wrote:
> In
>
> commit f02ad907cd9e7fe3a6405d2d005840912f1ed258
> Author: Daniel Vetter 
> Date:   Thu Jan 22 16:36:23 2015 +0100
>
>  drm/atomic-helpers: Recover full cursor plane behaviour
>
> we've added a hack to atomic helpers to never to vblank waits for
> cursor updates through the legacy apis since that's what X expects.
> Unfortunately we've (again) forgotten to adjust the transitional
> helpers. Do this now.
>
> This fixes regressions for drivers only partially converted over to
> atomic (like i915).
>
> Reported-by: Pekka Paalanen 
> Cc: Pekka Paalanen 
> Cc: stable at vger.kernel.org
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/drm_plane_helper.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_plane_helper.c 
> b/drivers/gpu/drm/drm_plane_helper.c
> index 40c1db9ad7c3..2f0ed11024eb 100644
> --- a/drivers/gpu/drm/drm_plane_helper.c
> +++ b/drivers/gpu/drm/drm_plane_helper.c
> @@ -465,6 +465,9 @@ int drm_plane_helper_commit(struct drm_plane *plane,
>   if (!crtc[i])
>   continue;
>
> + if (crtc[i]->cursor == plane)
> + continue;
> +
>   /* There's no other way to figure out whether the crtc is 
> running. */
>       ret = drm_crtc_vblank_get(crtc[i]);
>   if (ret == 0) {
>

This one is

Reviewed-and-tested-by: Mario Kleiner 

I was looking into Weston performance and the cursor problem, so had 
necessary tracing in place to test this. I can confirm that cursor 
related blocking in Westons drm-backend execution are gone with this 
patch applied, whereas they are still present when using hardware 
overlays on Intel, as expected.

So hardware cursors should be fine again, once the patch also ends in 
stable kernels.

thanks,
-mario


[xf86-video-nouveau] dri2: Enable BufferAge support

2015-05-09 Thread Mario Kleiner


On 01/19/2015 12:00 PM, Chris Wilson wrote:
> For enable BufferAge support, we just have to be not using the
> DRI2Buffer->flags field for any purpose (i.e. it is always expected to
> be 0, as it is now) and to be sure to swap the flags field whenever we
> exchange buffers. As nouveau does not exactly support TripleBuffer, we
> don't have to worry about setting the copying the flags field when
> injecting the third buffer.
>
> Signed-off-by: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Mario Kleiner 
> ---
>   src/nouveau_dri2.c | 7 +++
>   1 file changed, 7 insertions(+)
>
> diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c
> index e3445b2..428ef92 100644
> --- a/src/nouveau_dri2.c
> +++ b/src/nouveau_dri2.c
> @@ -711,6 +711,7 @@ nouveau_dri2_finish_swap(DrawablePtr draw, unsigned int 
> frame,
>   }
>
>   SWAP(s->dst->name, s->src->name);
> + SWAP(s->dst->flags, s->src->flags);
>   SWAP(nouveau_pixmap(dst_pix)->bo, nouveau_pixmap(src_pix)->bo);
>
>   DamageRegionProcessPending(draw);
> @@ -1003,6 +1004,12 @@ nouveau_dri2_init(ScreenPtr pScreen)
>   dri2.DestroyBuffer2 = nouveau_dri2_destroy_buffer2;
>   dri2.CopyRegion2 = nouveau_dri2_copy_region2;
>   #endif
> +
> +#if DRI2INFOREC_VERSION >= 10
> + dri2.version = 10;
> + dri2.bufferAge = 1;
> +#endif
> +
>   return DRI2ScreenInit(pScreen, );
>   }
>
>

Seems ok to me.

Reviewed-by: Mario Kleiner 

-mario


Tiny fixes - resent

2015-05-09 Thread Mario Kleiner
On 05/04/2015 11:15 AM, Daniel Vetter wrote:
> On Mon, May 04, 2015 at 06:29:43AM +0200, Mario Kleiner wrote:
>> Hi, a resend of updated versions of the earlier patches:
>>
>> Patch 1 should really get into Linux 4.1 to avoid tegra breaking
>> user-space clients.
>>
>> Patch 2 reviewed-by Michel, updated to take his feedback into account.
>>
>> Patch 3 is modified to not conflict with Daniel Vetter's patch
>> "drm/vblank: Fixup and document timestamp update/read barriers"
>>
>> Patch 4 will be needed for drm/qxl to compile with Daniel's fixup
>> patch applied.
>
> Merged patch 2-4 to topic/drm-misc on top of my memory barrier patch. I'll
> leave 1 to Thierry for tegra-fixes. And there doesn't seem to be a patch 5
> somehow, was that intentional? The patches are numbered n/5.
> -Daniel
>

Thanks. There isn't a patch 5, that was just some hack of mine that 
accidentally slipped into the numbering.

-mario


[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-07 Thread Mario Kleiner
On 05/07/2015 01:56 PM, Peter Hurley wrote:
> On 05/06/2015 04:56 AM, Daniel Vetter wrote:
>> On Tue, May 05, 2015 at 11:57:42AM -0400, Peter Hurley wrote:
>>> On 05/05/2015 11:42 AM, Daniel Vetter wrote:
>>>> On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
>>>>> On 05/04/2015 12:52 AM, Mario Kleiner wrote:
>>>>>> On 04/16/2015 03:03 PM, Daniel Vetter wrote:
>>>>>>> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
>>>>>>>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
>>>>>>>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>>>>>>>>>>> This was a bit too much cargo-culted, so lets make it solid:
>>>>>>>>>>> - vblank->count doesn't need to be an atomic, writes are always done
>>>>>>>>>>> under the protection of dev->vblank_time_lock. Switch to an 
>>>>>>>>>>> unsigned
>>>>>>>>>>> long instead and update comments. Note that atomic_read is just 
>>>>>>>>>>> a
>>>>>>>>>>> normal read of a volatile variable, so no need to audit all the
>>>>>>>>>>> read-side access specifically.
>>>>>>>>>>>
>>>>>>>>>>> - The barriers for the vblank counter seqlock weren't complete: The
>>>>>>>>>>> read-side was missing the first barrier between the counter 
>>>>>>>>>>> read and
>>>>>>>>>>> the timestamp read, it only had a barrier between the ts and the
>>>>>>>>>>> counter read. We need both.
>>>>>>>>>>>
>>>>>>>>>>> - Barriers weren't properly documented. Since barriers only work if
>>>>>>>>>>> you have them on boths sides of the transaction it's prudent to
>>>>>>>>>>> reference where the other side is. To avoid duplicating the
>>>>>>>>>>> write-side comment 3 times extract a little store_vblank() 
>>>>>>>>>>> helper.
>>>>>>>>>>> In that helper also assert that we do indeed hold
>>>>>>>>>>> dev->vblank_time_lock, since in some cases the lock is acquired 
>>>>>>>>>>> a
>>>>>>>>>>> few functions up in the callchain.
>>>>>>>>>>>
>>>>>>>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath 
>>>>>>>>>>> to
>>>>>>>>>>> the vblank_wait ioctl.
>>>>>>>>>>>
>>>>>>>>>>> Cc: Chris Wilson 
>>>>>>>>>>> Cc: Mario Kleiner 
>>>>>>>>>>> Cc: Ville Syrjälä 
>>>>>>>>>>> Cc: Michel Dänzer 
>>>>>>>>>>> Signed-off-by: Daniel Vetter 
>>>>>>>>>>> ---
>>>>>>>>>>>drivers/gpu/drm/drm_irq.c | 92 
>>>>>>>>>>> ---
>>>>>>>>>>>include/drm/drmP.h|  8 +++--
>>>>>>>>>>>2 files changed, 54 insertions(+), 46 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>>>>>>>> index c8a34476570a..23bfbc61a494 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>>>>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
>>>>>>>>>>> drm_vblank_offdelay, int, 0600);
>>>>>>>>>>>module_param_named(timestamp_precision_usec, 
>>>>>>>>>>> drm_timestamp_precision, int, 0600);
>>>>>>>>>>>module_param_named(timestamp_monotonic, drm_timestamp_monotonic, 
>>>>>>>>>>>

[PATCH] drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)

2015-05-04 Thread Mario Kleiner


On 04/15/2015 03:03 AM, Mario Kleiner wrote:
> On 04/02/2015 01:34 PM, Chris Wilson wrote:
>> On vblank instant-off systems, we can get into a situation where the cost
>> of enabling and disabling the vblank IRQ around a drmWaitVblank query
>> dominates. However, we know that if the user wants the current vblank
>> counter, they are also very likely to immediately queue a vblank wait
>> and so we can keep the interrupt around and only turn it off if we have
>> no further vblank requests in the interrupt interval.
>>
>> After vblank event delivery there is a shadow of one vblank where the
>> interrupt is kept alive for the user to query and queue another vblank
>> event. Similarly, if the user is using blocking drmWaitVblanks, the
>> interrupt will be disabled on the IRQ following the wait completion.
>> However, if the user is simply querying the current vblank counter and
>> timestamp, the interrupt will be disabled after every IRQ and the user
>> will enabled it again on the first query following the IRQ.
>>
>> Testcase: igt/kms_vblank
>> Signed-off-by: Chris Wilson 
>> Cc: Ville Syrjälä 
>> Cc: Daniel Vetter 
>> Cc: Michel Dänzer 
>> Cc: Laurent Pinchart 
>> Cc: Dave Airlie ,
>> Cc: Mario Kleiner 
>> ---
>>   drivers/gpu/drm/drm_irq.c | 15 +--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index c8a34476570a..6f5dc18779e2 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -1091,9 +1091,9 @@ void drm_vblank_put(struct drm_device *dev, int
>> crtc)
>>   if (atomic_dec_and_test(>refcount)) {
>>   if (drm_vblank_offdelay == 0)
>>   return;
>> -else if (dev->vblank_disable_immediate || drm_vblank_offdelay
>> < 0)
>> +else if (drm_vblank_offdelay < 0)
>>   vblank_disable_fn((unsigned long)vblank);
>> -else
>> +else if (!dev->vblank_disable_immediate)
>>   mod_timer(>disable_timer,
>> jiffies + ((drm_vblank_offdelay * HZ)/1000));
>>   }
>> @@ -1697,6 +1697,17 @@ bool drm_handle_vblank(struct drm_device *dev,
>> int crtc)
>>
>>   spin_lock_irqsave(>event_lock, irqflags);
>>
>
> You could move the code before the spin_lock_irqsave(>event_lock,
> irqflags); i think it doesn't need that lock?
>
>> +if (dev->vblank_disable_immediate &&
>> !atomic_read(>refcount)) {
>
> Also check for (drm_vblank_offdelay > 0) to make sure we have a way out
> of instant disable here, and the same meaning of of drm_vblank_offdelay
> like we have in the current implementation.
>
> This hunk ...
>
>> +unsigned long vbl_lock_irqflags;
>> +
>> +spin_lock_irqsave(>vbl_lock, vbl_lock_irqflags);
>> +if (atomic_read(>refcount) == 0 && vblank->enabled) {
>> +DRM_DEBUG("disabling vblank on crtc %d\n", crtc);
>> +vblank_disable_and_save(dev, crtc);
>> +}
>> +spin_unlock_irqrestore(>vbl_lock, vbl_lock_irqflags);
>
> ... is the same as a call to vblank_disable_fn((unsigned long) vblank);
> Maybe replace by that call?
>
> You could also return here already, as the code below will just take a
> lock, realize vblanks are now disabled and then release the locks and exit.
>
>> +}
>> +
>>   /* Need timestamp lock to prevent concurrent execution with
>>* vblank enable/disable, as this would cause inconsistent
>>* or corrupted timestamps and vblank counts.
>>
>
> I think the logic itself is fine and at least basic testing of the patch
> on a Intel HD Ironlake didn't show problems, so with the above taken
> into account it would have my slightly uneasy reviewed-by.
>
> One thing that worries me a little bit about the disable inside vblank
> irq are the potential races between the disable code and the display
> engine which could cause really bad off-by-one errors for clients on a
> imperfect driver. These races can only happen if vblank enable or
> disable happens close to or inside the vblank. This approach lets the
> instant disable happen exactly inside vblank when there is the highest
> chance of triggering that condition.
>
> This doesn't seem to be a problem for intel kms, but other drivers don't
> have instant disable yet, so we don't know how well we could do it
> there. Additionally things like dynamic power management tend to operate
> inside vblank, some

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-04 Thread Mario Kleiner
On 04/16/2015 03:03 PM, Daniel Vetter wrote:
> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
>>>> Hi Daniel,
>>>>
>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>>>>> This was a bit too much cargo-culted, so lets make it solid:
>>>>> - vblank->count doesn't need to be an atomic, writes are always done
>>>>>under the protection of dev->vblank_time_lock. Switch to an unsigned
>>>>>long instead and update comments. Note that atomic_read is just a
>>>>>normal read of a volatile variable, so no need to audit all the
>>>>>read-side access specifically.
>>>>>
>>>>> - The barriers for the vblank counter seqlock weren't complete: The
>>>>>read-side was missing the first barrier between the counter read and
>>>>>the timestamp read, it only had a barrier between the ts and the
>>>>>counter read. We need both.
>>>>>
>>>>> - Barriers weren't properly documented. Since barriers only work if
>>>>>you have them on boths sides of the transaction it's prudent to
>>>>>reference where the other side is. To avoid duplicating the
>>>>>write-side comment 3 times extract a little store_vblank() helper.
>>>>>In that helper also assert that we do indeed hold
>>>>>dev->vblank_time_lock, since in some cases the lock is acquired a
>>>>>few functions up in the callchain.
>>>>>
>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
>>>>> the vblank_wait ioctl.
>>>>>
>>>>> Cc: Chris Wilson 
>>>>> Cc: Mario Kleiner 
>>>>> Cc: Ville Syrjälä 
>>>>> Cc: Michel Dänzer 
>>>>> Signed-off-by: Daniel Vetter 
>>>>> ---
>>>>>   drivers/gpu/drm/drm_irq.c | 92 
>>>>> ---
>>>>>   include/drm/drmP.h|  8 +++--
>>>>>   2 files changed, 54 insertions(+), 46 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>> index c8a34476570a..23bfbc61a494 100644
>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
>>>>> drm_vblank_offdelay, int, 0600);
>>>>>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
>>>>> int, 0600);
>>>>>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
>>>>> 0600);
>>>>>
>>>>> +static void store_vblank(struct drm_device *dev, int crtc,
>>>>> +  unsigned vblank_count_inc,
>>>>> +  struct timeval *t_vblank)
>>>>> +{
>>>>> + struct drm_vblank_crtc *vblank = >vblank[crtc];
>>>>> + u32 tslot;
>>>>> +
>>>>> + assert_spin_locked(>vblank_time_lock);
>>>>> +
>>>>> + if (t_vblank) {
>>>>> + tslot = vblank->count + vblank_count_inc;
>>>>> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> +  * vblank timestamp updates are protected on the write side with
>>>>> +  * vblank_time_lock, but on the read side done locklessly using a
>>>>> +  * sequence-lock on the vblank counter. Ensure correct ordering using
>>>>> +  * memory barrriers. We need the barrier both before and also after the
>>>>> +  * counter update to synchronize with the next timestamp write.
>>>>> +  * The read-side barriers for this are in drm_vblank_count_and_time.
>>>>> +  */
>>>>> + smp_wmb();
>>>>> + vblank->count += vblank_count_inc;
>>>>> + smp_wmb();
>>>>
>>>> The comment and the code are each self-contradictory.
>>>>
>>>> If vblank->count writes are always protected by vblank_time_lock 
>>>> (something I
>>>> did not verify but that the comment above asserts), then the trailing write
>>>> barrier is not required (and the assertion that it is in the comment is 
>>&

[PATCH 4/5] drm/qxl: Fix qxl_noop_get_vblank_counter()

2015-05-04 Thread Mario Kleiner
This breaks under the vblank timestamp cleanup patch
by Daniel Vetter. Also it is pointless to return anything
but zero (or any other constant) if the function doesn't
actually query a hw vblank counter. The bogus return of
the current drm vblank counter via direct readout or via
drm_vblank_count() is found in many of the new kms drivers,
but it does exactly nothing different from returning any
arbitrary constant - it's a no operation.

Let's simply return 0 - Easy and fast.

Signed-off-by: Mario Kleiner 
Cc: Dave Airlie 
---
 drivers/gpu/drm/qxl/qxl_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
index 1d9b80c..577dc45 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -198,7 +198,7 @@ static int qxl_pm_restore(struct device *dev)

 static u32 qxl_noop_get_vblank_counter(struct drm_device *dev, int crtc)
 {
-   return dev->vblank[crtc].count.counter;
+   return 0;
 }

 static int qxl_noop_enable_vblank(struct drm_device *dev, int crtc)
-- 
1.9.1



[PATCH 3/5] drm: Zero out invalid vblank timestamp in drm_update_vblank_count. (v2)

2015-05-04 Thread Mario Kleiner
Since commit 844b03f27739135fe1fed2fef06da0ffc4c7a081 we make
sure that after vblank irq off, we return the last valid
(vblank count, vblank timestamp) pair to clients, e.g., during
modesets, which is good.

An overlooked side effect of that commit for kms drivers without
support for precise vblank timestamping is that at vblank irq
enable, when we update the vblank counter from the hw counter, we
can't update the corresponding vblank timestamp, so now we have a
totally mismatched timestamp for the new count to confuse clients.

Restore old client visible behaviour from before Linux 3.18, but
zero out the timestamp at vblank counter update (instead of disable
as in original implementation) if we can't generate a meaningful
timestamp immediately for the new vblank counter. This will fix
this regression, so callers know they need to retry again later
if they need a valid timestamp, but at the same time preserves
the improvements made in the commit mentioned above.

v2: Rebased on top of Daniel Vetter's fixup and documentation
patch for timestamp updates. Drop request for stable kernel
backport as this would be more difficult, unless the original
patch would get applied to stable kernels.

Signed-off-by: Mario Kleiner 

Cc: Ville Syrjälä 
Cc: Daniel Vetter 
Cc: Dave Airlie 
---
 drivers/gpu/drm/drm_irq.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 152d1de..44e6a20b 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -161,10 +161,13 @@ static void drm_update_vblank_count(struct drm_device 
*dev, int crtc)

/*
 * Only reinitialize corresponding vblank timestamp if high-precision 
query
-* available and didn't fail. Will reinitialize delayed at next vblank
-* interrupt in that case.
+* available and didn't fail. Otherwise reinitialize delayed at next 
vblank
+* interrupt and assign 0 for now, to mark the vblanktimestamp as 
invalid.
 */
-   store_vblank(dev, crtc, diff, rc ? _vblank : NULL);
+   if (!rc)
+   t_vblank = (struct timeval) {0, 0};
+
+   store_vblank(dev, crtc, diff, _vblank);
 }

 /*
-- 
1.9.1



[PATCH 2/5] drm: Prevent invalid use of vblank_disable_immediate. (v2)

2015-05-04 Thread Mario Kleiner
For a kms driver to support immediate disable of vblank
irq's reliably without introducing off by one errors or
other mayhem for clients, it must not only support a
hardware vblank counter query, but also high precision
vblank timestamping, so vblank count and timestamp can be
instantaneously reinitialzed to valid values. Additionally
the exposed hardware counter must behave as if it is
incrementing at leading edge of vblank to avoid off by
one errors during reinitialization of the counter while
the display happens to be inside or close to vblank.

Check during drm_vblank_init that a driver which claims to
be capable of vblank_disable_immediate at least supports
high precision timestamping and prevent use of instant
disable if that isn't present as a minimum requirement.

v2: Changed from DRM_ERROR to DRM_INFO and made message
more clear, as suggested by Michel Dänzer.

Signed-off-by: Mario Kleiner 
Reviewed-by: Michel Dänzer 

Cc: Dave Airlie 
---
 drivers/gpu/drm/drm_irq.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 9c166b4..152d1de 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -352,6 +352,13 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs)
else
DRM_INFO("No driver support for vblank timestamp query.\n");

+   /* Must have precise timestamping for reliable vblank instant disable */
+   if (dev->vblank_disable_immediate && 
!dev->driver->get_vblank_timestamp) {
+   dev->vblank_disable_immediate = false;
+   DRM_INFO("Setting vblank_disable_immediate to false because "
+"get_vblank_timestamp == NULL\n");
+   }
+
dev->vblank_disable_allowed = false;

return 0;
-- 
1.9.1



[PATCH 1/5] drm/tegra: Don't use vblank_disable_immediate on incapable driver.

2015-05-04 Thread Mario Kleiner
Tegra would not only need a hardware vblank counter that
increments at leading edge of vblank, but also support
for instantaneous high precision vblank timestamp queries, ie.
a proper implementation of dev->driver->get_vblank_timestamp().

Without these, there can be off-by-one errors during vblank
disable/enable if the scanout is inside vblank at en/disable
time, and additionally clients will never see any useable
vblank timestamps when querying via drmWaitVblank ioctl. This
would negatively affect swap scheduling under X11 and Wayland.

Signed-off-by: Mario Kleiner 
Cc: Thierry Reding 
Cc: Dave Airlie 
---
 drivers/gpu/drm/tegra/drm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 1833abd..bfad15a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -173,7 +173,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned 
long flags)
drm->irq_enabled = true;

/* syncpoints are used for full 32-bit hardware VBLANK counters */
-   drm->vblank_disable_immediate = true;
drm->max_vblank_count = 0x;

err = drm_vblank_init(drm, drm->mode_config.num_crtc);
-- 
1.9.1



Tiny fixes - resent

2015-05-04 Thread Mario Kleiner
Hi, a resend of updated versions of the earlier patches:

Patch 1 should really get into Linux 4.1 to avoid tegra breaking
user-space clients.

Patch 2 reviewed-by Michel, updated to take his feedback into account.

Patch 3 is modified to not conflict with Daniel Vetter's patch
"drm/vblank: Fixup and document timestamp update/read barriers"

Patch 4 will be needed for drm/qxl to compile with Daniel's fixup
patch applied.

thanks,
-mario



[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Mario Kleiner
On 04/16/2015 03:29 AM, Peter Hurley wrote:
> On 04/15/2015 05:26 PM, Mario Kleiner wrote:
>> A couple of questions to educate me and one review comment.
>>
>> On 04/15/2015 07:34 PM, Daniel Vetter wrote:
>>> This was a bit too much cargo-culted, so lets make it solid:
>>> - vblank->count doesn't need to be an atomic, writes are always done
>>> under the protection of dev->vblank_time_lock. Switch to an unsigned
>>> long instead and update comments. Note that atomic_read is just a
>>> normal read of a volatile variable, so no need to audit all the
>>> read-side access specifically.
>>>
>>> - The barriers for the vblank counter seqlock weren't complete: The
>>> read-side was missing the first barrier between the counter read and
>>> the timestamp read, it only had a barrier between the ts and the
>>> counter read. We need both.
>>>
>>> - Barriers weren't properly documented. Since barriers only work if
>>> you have them on boths sides of the transaction it's prudent to
>>> reference where the other side is. To avoid duplicating the
>>> write-side comment 3 times extract a little store_vblank() helper.
>>> In that helper also assert that we do indeed hold
>>> dev->vblank_time_lock, since in some cases the lock is acquired a
>>> few functions up in the callchain.
>>>
>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
>>> the vblank_wait ioctl.
>>>
>>> v2: Add comment to better explain how store_vblank works, suggested by
>>> Chris.
>>>
>>> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
>>> implicit barrier in the spin_unlock. But that can only be proven by
>>> auditing all callers and my point in extracting this little helper was
>>> to localize all the locking into just one place. Hence I think that
>>> additional optimization is too risky.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Mario Kleiner 
>>> Cc: Ville Syrjälä 
>>> Cc: Michel Dänzer 
>>> Cc: Peter Hurley 
>>> Signed-off-by: Daniel Vetter 
>>> ---
>>>drivers/gpu/drm/drm_irq.c | 95 
>>> +--
>>>include/drm/drmP.h|  8 +++-
>>>2 files changed, 57 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index c8a34476570a..8694b77d0002 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
>>> int, 0600);
>>>module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
>>> int, 0600);
>>>module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
>>> 0600);
>>>
>>> +static void store_vblank(struct drm_device *dev, int crtc,
>>> + unsigned vblank_count_inc,
>>> + struct timeval *t_vblank)
>>> +{
>>> +struct drm_vblank_crtc *vblank = >vblank[crtc];
>>> +u32 tslot;
>>> +
>>> +assert_spin_locked(>vblank_time_lock);
>>> +
>>> +if (t_vblank) {
>>> +/* All writers hold the spinlock, but readers are serialized by
>>> + * the latching of vblank->count below.
>>> + */
>>> +tslot = vblank->count + vblank_count_inc;
>>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>>> +}
>>> +
>>> +/*
>>> + * vblank timestamp updates are protected on the write side with
>>> + * vblank_time_lock, but on the read side done locklessly using a
>>> + * sequence-lock on the vblank counter. Ensure correct ordering using
>>> + * memory barrriers. We need the barrier both before and also after the
>>> + * counter update to synchronize with the next timestamp write.
>>> + * The read-side barriers for this are in drm_vblank_count_and_time.
>>> + */
>>> +smp_wmb();
>>> +vblank->count += vblank_count_inc;
>>> +smp_wmb();
>>> +}
>>> +
>>>/**
>>> * drm_update_vblank_count - update the master vblank counter
>>> * @dev: DRM device
>>> @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
>>> drm_timestamp_monotonic, int, 0600);
>>>static void drm_update_vblank_co

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Mario Kleiner
A couple of questions to educate me and one review comment.

On 04/15/2015 07:34 PM, Daniel Vetter wrote:
> This was a bit too much cargo-culted, so lets make it solid:
> - vblank->count doesn't need to be an atomic, writes are always done
>under the protection of dev->vblank_time_lock. Switch to an unsigned
>long instead and update comments. Note that atomic_read is just a
>normal read of a volatile variable, so no need to audit all the
>read-side access specifically.
>
> - The barriers for the vblank counter seqlock weren't complete: The
>read-side was missing the first barrier between the counter read and
>the timestamp read, it only had a barrier between the ts and the
>counter read. We need both.
>
> - Barriers weren't properly documented. Since barriers only work if
>you have them on boths sides of the transaction it's prudent to
>reference where the other side is. To avoid duplicating the
>write-side comment 3 times extract a little store_vblank() helper.
>In that helper also assert that we do indeed hold
>dev->vblank_time_lock, since in some cases the lock is acquired a
>few functions up in the callchain.
>
> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> the vblank_wait ioctl.
>
> v2: Add comment to better explain how store_vblank works, suggested by
> Chris.
>
> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
> implicit barrier in the spin_unlock. But that can only be proven by
> auditing all callers and my point in extracting this little helper was
> to localize all the locking into just one place. Hence I think that
> additional optimization is too risky.
>
> Cc: Chris Wilson 
> Cc: Mario Kleiner 
> Cc: Ville Syrjälä 
> Cc: Michel Dänzer 
> Cc: Peter Hurley 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/drm_irq.c | 95 
> +--
>   include/drm/drmP.h|  8 +++-
>   2 files changed, 57 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..8694b77d0002 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> int, 0600);
>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> 0600);
>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);
>
> +static void store_vblank(struct drm_device *dev, int crtc,
> +  unsigned vblank_count_inc,
> +  struct timeval *t_vblank)
> +{
> + struct drm_vblank_crtc *vblank = >vblank[crtc];
> + u32 tslot;
> +
> + assert_spin_locked(>vblank_time_lock);
> +
> + if (t_vblank) {
> + /* All writers hold the spinlock, but readers are serialized by
> +  * the latching of vblank->count below.
> +  */
> + tslot = vblank->count + vblank_count_inc;
> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> + }
> +
> + /*
> +  * vblank timestamp updates are protected on the write side with
> +  * vblank_time_lock, but on the read side done locklessly using a
> +  * sequence-lock on the vblank counter. Ensure correct ordering using
> +  * memory barrriers. We need the barrier both before and also after the
> +  * counter update to synchronize with the next timestamp write.
> +  * The read-side barriers for this are in drm_vblank_count_and_time.
> +  */
> + smp_wmb();
> + vblank->count += vblank_count_inc;
> + smp_wmb();
> +}
> +
>   /**
>* drm_update_vblank_count - update the master vblank counter
>* @dev: DRM device
> @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
> drm_timestamp_monotonic, int, 0600);
>   static void drm_update_vblank_count(struct drm_device *dev, int crtc)
>   {
>   struct drm_vblank_crtc *vblank = >vblank[crtc];
> - u32 cur_vblank, diff, tslot;
> + u32 cur_vblank, diff;
>   bool rc;
>   struct timeval t_vblank;
>
> @@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, int crtc)
>   if (diff == 0)
>   return;
>
> - /* Reinitialize corresponding vblank timestamp if high-precision query
> -  * available. Skip this step if query unsupported or failed. Will
> -  * reinitialize delayed at next vblank interrupt in that case.
> + /*
> +  * Only reinitialize corresponding vblank timestamp if high-precision 
> query
> +  * available and didn't fail. Will reinitialize delayed at next vb

[PATCH] drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)

2015-04-15 Thread Mario Kleiner
On 04/02/2015 01:34 PM, Chris Wilson wrote:
> On vblank instant-off systems, we can get into a situation where the cost
> of enabling and disabling the vblank IRQ around a drmWaitVblank query
> dominates. However, we know that if the user wants the current vblank
> counter, they are also very likely to immediately queue a vblank wait
> and so we can keep the interrupt around and only turn it off if we have
> no further vblank requests in the interrupt interval.
>
> After vblank event delivery there is a shadow of one vblank where the
> interrupt is kept alive for the user to query and queue another vblank
> event. Similarly, if the user is using blocking drmWaitVblanks, the
> interrupt will be disabled on the IRQ following the wait completion.
> However, if the user is simply querying the current vblank counter and
> timestamp, the interrupt will be disabled after every IRQ and the user
> will enabled it again on the first query following the IRQ.
>
> Testcase: igt/kms_vblank
> Signed-off-by: Chris Wilson 
> Cc: Ville Syrjälä 
> Cc: Daniel Vetter 
> Cc: Michel Dänzer 
> Cc: Laurent Pinchart 
> Cc: Dave Airlie ,
> Cc: Mario Kleiner 
> ---
>   drivers/gpu/drm/drm_irq.c | 15 +--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..6f5dc18779e2 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1091,9 +1091,9 @@ void drm_vblank_put(struct drm_device *dev, int crtc)
>   if (atomic_dec_and_test(>refcount)) {
>   if (drm_vblank_offdelay == 0)
>   return;
> - else if (dev->vblank_disable_immediate || drm_vblank_offdelay < 
> 0)
> + else if (drm_vblank_offdelay < 0)
>   vblank_disable_fn((unsigned long)vblank);
> - else
> + else if (!dev->vblank_disable_immediate)
>   mod_timer(>disable_timer,
> jiffies + ((drm_vblank_offdelay * HZ)/1000));
>   }
> @@ -1697,6 +1697,17 @@ bool drm_handle_vblank(struct drm_device *dev, int 
> crtc)
>
>   spin_lock_irqsave(>event_lock, irqflags);
>

You could move the code before the spin_lock_irqsave(>event_lock, 
irqflags); i think it doesn't need that lock?

> + if (dev->vblank_disable_immediate && !atomic_read(>refcount)) {

Also check for (drm_vblank_offdelay > 0) to make sure we have a way out 
of instant disable here, and the same meaning of of drm_vblank_offdelay 
like we have in the current implementation.

This hunk ...

> + unsigned long vbl_lock_irqflags;
> +
> + spin_lock_irqsave(>vbl_lock, vbl_lock_irqflags);
> + if (atomic_read(>refcount) == 0 && vblank->enabled) {
> + DRM_DEBUG("disabling vblank on crtc %d\n", crtc);
> + vblank_disable_and_save(dev, crtc);
> + }
> + spin_unlock_irqrestore(>vbl_lock, vbl_lock_irqflags);

... is the same as a call to vblank_disable_fn((unsigned long) vblank);
Maybe replace by that call?

You could also return here already, as the code below will just take a 
lock, realize vblanks are now disabled and then release the locks and exit.

> + }
> +
>   /* Need timestamp lock to prevent concurrent execution with
>* vblank enable/disable, as this would cause inconsistent
>* or corrupted timestamps and vblank counts.
>

I think the logic itself is fine and at least basic testing of the patch 
on a Intel HD Ironlake didn't show problems, so with the above taken 
into account it would have my slightly uneasy reviewed-by.

One thing that worries me a little bit about the disable inside vblank 
irq are the potential races between the disable code and the display 
engine which could cause really bad off-by-one errors for clients on a 
imperfect driver. These races can only happen if vblank enable or 
disable happens close to or inside the vblank. This approach lets the 
instant disable happen exactly inside vblank when there is the highest 
chance of triggering that condition.

This doesn't seem to be a problem for intel kms, but other drivers don't 
have instant disable yet, so we don't know how well we could do it 
there. Additionally things like dynamic power management tend to operate 
inside vblank, sometimes with "funny" side effects to other stuff, e.g., 
dpm on AMD, as i remember from some long debug session with Michel and 
Alex last summer where dpm played a role. Therefore it seems more safe 
to me to avoid actions inside vblank that could be done outside. E.g., 
instead of doing the disable inside the vblank irq one could maybe just 
schedule an exact timer to do the disable a few milliseconds later in 
the middle of active scanout to avoid these potential issues?

-mario


[PATCH 1/2] drm: Shortcircuit vblank queries

2015-04-14 Thread Mario Kleiner


On 04/14/2015 08:36 PM, Chris Wilson wrote:
> On Tue, Apr 14, 2015 at 08:22:20PM +0200, Mario Kleiner wrote:
>> On 04/05/2015 05:40 PM, Chris Wilson wrote:
>>> Avoid adding to the waitqueue and reprobing the current vblank if the
>>> caller is only querying the current vblank sequence and timestamp and
>>> so we would return immediately.
>>>
>>> Signed-off-by: Chris Wilson 
>>> Cc: Ville Syrjälä 
>>> Cc: Daniel Vetter 
>>> Cc: Michel Dänzer 
>>> Cc: Laurent Pinchart 
>>> Cc: Dave Airlie ,
>>> Cc: Mario Kleiner 
>>> ---
>>>   drivers/gpu/drm/drm_irq.c | 18 ++
>>>   1 file changed, 10 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index 6f5dc18779e2..ba80b51b4b00 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void 
>>> *data,
>>> vblwait->request.sequence = seq + 1;
>>> }
>>>
>>> -   DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
>>> - vblwait->request.sequence, crtc);
>>> -   vblank->last_wait = vblwait->request.sequence;
>>> -   DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
>>> -   (((drm_vblank_count(dev, crtc) -
>>> -  vblwait->request.sequence) <= (1 << 23)) ||
>>> -!vblank->enabled ||
>>> -!dev->irq_enabled));
>>> +   if (vblwait->request.sequence != seq) {
>>> +   DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
>>> + vblwait->request.sequence, crtc);
>>> +   vblank->last_wait = vblwait->request.sequence;
>>> +   DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
>>> +   (((drm_vblank_count(dev, crtc) -
>>> +  vblwait->request.sequence) <= (1 << 23)) ||
>>> +!vblank->enabled ||
>>> +!dev->irq_enabled));
>>> +   }
>>>
>>> if (ret != -EINTR) {
>>> struct timeval now;
>>>
>>
>> It would be good to have some DRM_DEBUG output for the skip-the-wait
>> case as well, so one can still follow from dmesg output when a
>> client does a drmWaitVblank call even if it is only a query.
>
> We still have DRM_DEBUG("returning %d to client"), as well as the
> drmIoctl:DRM_DEBUG(ioctl->name), is that not sufficient?
> -Chris
>

Oh right, that's good enough. Maybe add "on crtc %d" to that DRM_DEBUG, 
to make it unambiguous for which crtc some count is returned?

-mario


[PATCH 2/2] drm: Shortcircuit vblank queries

2015-04-14 Thread Mario Kleiner
On 04/05/2015 05:40 PM, Chris Wilson wrote:
> Bypass all the spinlocks and return the last timestamp and counter from
> the last vblank if the driver delcares that it is accurate (and stable
> across on/off), and the vblank is currently enabled.
>
> Signed-off-by: Chris Wilson 
> Cc: Ville Syrjälä 
> Cc: Daniel Vetter 
> Cc: Michel Dänzer 
> Cc: Laurent Pinchart 
> Cc: Dave Airlie ,
> Cc: Mario Kleiner 
> ---
>   drivers/gpu/drm/drm_irq.c | 26 ++
>   1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index ba80b51b4b00..be9c210bb22e 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1538,6 +1538,17 @@ err_put:
>   return ret;
>   }
>
> +static bool drm_wait_vblank_is_query(union drm_wait_vblank *vblwait)
> +{
> + if (vblwait->request.sequence)
> + return false;
> +
> + return _DRM_VBLANK_RELATIVE ==
> + (vblwait->request.type & (_DRM_VBLANK_TYPES_MASK |
> +   _DRM_VBLANK_EVENT |
> +   _DRM_VBLANK_NEXTONMISS));
> +}
> +
>   /*
>* Wait for VBLANK.
>*
> @@ -1587,6 +1598,21 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
>
>   vblank = >vblank[crtc];
>
> + /* If the counter is currently enabled and accurate, short-circuit 
> queries
> +  * to return the cached timestamp of the last vblank.
> +  */

Maybe somehow stress in the comment that this location in 
drm_wait_vblank is really the only place where it is ok'ish to call
drm_vblank_count_and_time() without wrapping it into a 
drm_vblank_get/put(), so nobody thinks this approach is ok anywhere else.

> + if (dev->vblank_disable_immediate &&
> + drm_wait_vblank_is_query(vblwait) &&
> + vblank->enabled) {

You should also check for (drm_vblank_offdelay != 0) whenever checking 
for dev->vblank_disable_immediate. This is so one can override all the
vblank_disable_immediate related logic via the drm vblankoffdelay module 
parameter, both for debugging and as a safety switch for desparate users 
in case some driver+gpu combo screws up wrt. immediate disable and that 
makes it into distro kernels.

The other thing i'm not sure is if it wouldn't be a good idea to have 
some kind of write memory barrier in vblank_disable_and_save() after 
setting vblank->enabled = false; and some read memory barrier here 
before your check for vblank->enabled? I don't have a feeling for how 
much time can pass between one core executing the disable and the other 
core receiving the news that vblank->enabled is no longer true if those 
bits run on different cores?

I've run your patches through my standard tests on x86_64 and they don't 
seem to introduce errors or more skipped frames. Normally it would be so 
wrong to do this without drm_vblank_get/put(), but i think here 
potential errors introduced wouldn't be worse than what a userspace 
client would see due to preemption or other execution delays at the 
wrong moment, so it's probably ok. But i don't know if lack of memory 
barriers etc. could introduce large delays and trouble on other 
architectures?

> + struct timeval now;
> +
> + vblwait->reply.sequence =
> + drm_vblank_count_and_time(dev, crtc, );
> + vblwait->reply.tval_sec = now.tv_sec;
> + vblwait->reply.tval_usec = now.tv_usec;

Have some DRM_DEBUG here, so one can follow the client doing the instant 
query through this path.

> + return 0;
> + }
> +
>   ret = drm_vblank_get(dev, crtc);
>   if (ret) {
>   DRM_DEBUG("failed to acquire vblank counter, %d\n", ret);
>

With the above addressed i'd give you a Reviewed-and-tested-by, but it 
would be good if somebody else could look over it as well.

-mario


[PATCH 1/2] drm: Shortcircuit vblank queries

2015-04-14 Thread Mario Kleiner
On 04/05/2015 05:40 PM, Chris Wilson wrote:
> Avoid adding to the waitqueue and reprobing the current vblank if the
> caller is only querying the current vblank sequence and timestamp and
> so we would return immediately.
>
> Signed-off-by: Chris Wilson 
> Cc: Ville Syrjälä 
> Cc: Daniel Vetter 
> Cc: Michel Dänzer 
> Cc: Laurent Pinchart 
> Cc: Dave Airlie ,
> Cc: Mario Kleiner 
> ---
>   drivers/gpu/drm/drm_irq.c | 18 ++
>   1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 6f5dc18779e2..ba80b51b4b00 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void 
> *data,
>   vblwait->request.sequence = seq + 1;
>   }
>
> - DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
> -   vblwait->request.sequence, crtc);
> - vblank->last_wait = vblwait->request.sequence;
> - DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
> - (((drm_vblank_count(dev, crtc) -
> -vblwait->request.sequence) <= (1 << 23)) ||
> -  !vblank->enabled ||
> -  !dev->irq_enabled));
> + if (vblwait->request.sequence != seq) {
> + DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
> +   vblwait->request.sequence, crtc);
> + vblank->last_wait = vblwait->request.sequence;
> + DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
> + (((drm_vblank_count(dev, crtc) -
> +vblwait->request.sequence) <= (1 << 23)) ||
> +  !vblank->enabled ||
> +  !dev->irq_enabled));
> + }
>
>   if (ret != -EINTR) {
>   struct timeval now;
>

It would be good to have some DRM_DEBUG output for the skip-the-wait 
case as well, so one can still follow from dmesg output when a client 
does a drmWaitVblank call even if it is only a query.

Other than that, this one [1/2] is

Reviewed-and-tested-by: Mario Kleiner 

-mario


[PATCH 1/2] drm: Shortcircuit vblank queries

2015-04-14 Thread Mario Kleiner
On 04/14/2015 04:21 PM, Chris Wilson wrote:
> On Tue, Apr 14, 2015 at 06:42:03PM +0900, Michel Dänzer wrote:
>> Also, the two patches should have different and more specific shortlogs.
>
> Second patch:
>
> drm: Query vblank counters directly for known accurate state
>
> ?
>

When i applied your patches, both patches showed a shortlog of
"drm: Shortcircuit vblank queries", i think that's what Michel means.

-mario


[PATCH 3/3] drm/tegra: Don't use vblank_disable_immediate on incapable driver.

2015-04-14 Thread Mario Kleiner
Tegra would not only need a hardware vblank counter that
increments at leading edge of vblank, but also support
for instantaneous high precision vblank timestamp queries, ie.
a proper implementation of dev->driver->get_vblank_timestamp().

Without these, there can be off-by-one errors during vblank
disable/enable if the scanout is inside vblank at en/disable
time, and additionally clients will never see any useable
vblank timestamps when querying via drmWaitVblank ioctl. This
would negatively affect swap scheduling under X11 and Wayland.

Signed-off-by: Mario Kleiner 
Cc: Thierry Reding 
Cc: Dave Airlie 
---
 drivers/gpu/drm/tegra/drm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 1833abd..bfad15a 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -173,7 +173,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned 
long flags)
drm->irq_enabled = true;

/* syncpoints are used for full 32-bit hardware VBLANK counters */
-   drm->vblank_disable_immediate = true;
drm->max_vblank_count = 0x;

err = drm_vblank_init(drm, drm->mode_config.num_crtc);
-- 
1.9.1



[PATCH 2/3] drm: Prevent invalid use of vblank_disable_immediate.

2015-04-14 Thread Mario Kleiner
For a kms driver to support immediate disable of vblank
irq's reliably without introducing off by one errors or
other mayhem for clients, it must not only support a
hardware vblank counter query, but also high precision
vblank timestamping, so vblank count and timestamp can be
instantaneously reinitialzed to valid values. Additionally
the exposed hardware counter must behave as if it is
incrementing at leading edge of vblank to avoid off by
one errors during reinitialization of the counter while
the display happens to be inside or close to vblank.

Check during drm_vblank_init that a driver which claims to
be capable of vblank_disable_immediate at least supports
high precision timestamping and prevent use of instant
disable if that isn't present as a minimum requirement.

Signed-off-by: Mario Kleiner 
Cc: Ville Syrjälä 
Cc: Michel Dänzer 
Cc: Thierry Reding 
Cc: Dave Airlie 
---
 drivers/gpu/drm/drm_irq.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index af9662e..6efe822 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -336,6 +336,12 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs)
else
DRM_INFO("No driver support for vblank timestamp query.\n");

+   /* Must have precise timestamping for reliable vblank instant disable */
+   if (dev->vblank_disable_immediate && 
!dev->driver->get_vblank_timestamp) {
+   dev->vblank_disable_immediate = false;
+   DRM_ERROR("Set vblank_disable_immediate, but not supported.\n");
+   }
+
dev->vblank_disable_allowed = false;

return 0;
-- 
1.9.1



[PATCH 1/3] drm: Zero out invalid vblank timestamp in drm_update_vblank_count.

2015-04-14 Thread Mario Kleiner
Since commit 844b03f27739135fe1fed2fef06da0ffc4c7a081 we make
sure that after vblank irq off, we return the last valid
(vblank count, vblank timestamp) pair to clients, e.g., during
modesets, which is good.

An overlooked side effect of that commit for kms drivers without
support for precise vblank timestamping is that at vblank irq
enable, when we update the vblank counter from the hw counter, we
can't update the corresponding vblank timestamp, so now we have a
totally mismatched timestamp for the new count to confuse clients.

Restore old client visible behaviour from before Linux 3.17, but
zero out the timestamp at vblank counter update (instead of disable
as in original implementation) if we can't generate a meaningful
timestamp immediately for the new vblank counter. This will fix
this regression, so callers know they need to retry again later
if they need a valid timestamp, but at the same time preserves
the improvements made in the commit mentioned above.

Signed-off-by: Mario Kleiner 
Cc:  #v3.17+

Cc: Ville Syrjälä 
Cc: Daniel Vetter 
Cc: Dave Airlie 
---
 drivers/gpu/drm/drm_irq.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8a3447..af9662e 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -131,12 +131,11 @@ static void drm_update_vblank_count(struct drm_device 
*dev, int crtc)

/* Reinitialize corresponding vblank timestamp if high-precision query
 * available. Skip this step if query unsupported or failed. Will
-* reinitialize delayed at next vblank interrupt in that case.
+* reinitialize delayed at next vblank interrupt in that case and
+* assign 0 for now, to mark the vblanktimestamp as invalid.
 */
-   if (rc) {
-   tslot = atomic_read(>count) + diff;
-   vblanktimestamp(dev, crtc, tslot) = t_vblank;
-   }
+   tslot = atomic_read(>count) + diff;
+   vblanktimestamp(dev, crtc, tslot) = rc ? t_vblank : (struct timeval) 
{0, 0};

smp_mb__before_atomic();
atomic_add(diff, >count);
-- 
1.9.1



[Intel-gfx] [PATCH] drm: Return current vblank value for drmWaitVBlank queries

2015-03-19 Thread Mario Kleiner
On 03/19/2015 04:04 PM, Ville Syrjälä wrote:
> On Thu, Mar 19, 2015 at 03:33:11PM +0100, Daniel Vetter wrote:
>> On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
>>> On 03/18/2015 10:30 AM, Chris Wilson wrote:
>>>> On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
>>>>> drm_vblank_count_and_time() doesn't return the correct sequence number
>>>>> while the vblank interrupt is disabled, does it? It returns the sequence
>>>>> number from the last time vblank_disable_and_save() was called (when the
>>>>> vblank interrupt was disabled). That's why drm_vblank_get() is needed 
>>>>> here.
>>>>
>>>> Ville enlightened me as well. I thought the value was cooked so that
>>>> time did not pass whilst the IRQ was disabled. Hopefully, I can impress
>>>> upon the Intel folks, at least, that enabling/disabling the interrupts
>>>> just to read the current hw counter is interesting to say the least and
>>>> sits at the top of the profiles when benchmarking Present.
>>>> -Chris
>>>>
>>>
>>> drm_wait_vblank() not only gets the counter but also the corresponding
>>> vblank timestamp. Counters are recalculated in vblank_disable_and_save() for
>>> irq off, then in the vblank irq on path, and every refresh in
>>> drm_handle_vblank at vblank irq time.
>>>
>>> The timestamps can be recalculated at any time iff the driver supports high
>>> precision timestamping, which currently intel kms, radeon kms, and nouveau
>>> kms do. But for other parts, like most SoC's, afaik you only get a valid
>>> timestamp by sampling system time in the vblank irq handler, so there you'd
>>> have a problem.
>>>
>>> There are also some races around the enable/disable path which require a lot
>>> of care and exact knowledge of when each hardware fires its vblanks, updates
>>> its hardware counters etc. to get rid of them. Ville did that - successfully
>>> as far as my tests go - for Intel kms, but other drivers would be less
>>> forgiving.
>>>
>>> Our current method is to:
>>>
>>> a) Only disable vblank irqs after a default idle period of 5 seconds, so we
>>> don't get races frequent/likely enough to cause problems for clients. And we
>>> save the overhead for all the vblank irq on/off.
>>>
>>> b) On drivers which have high precision timestamping and have been carefully
>>> checked to be race free (== intel kms only atm.) we have instant disable, so
>>> things like blinking cursors don't keep vblank irq on forever.
>>>
>>> If b) causes so much overhead, maybe we could change the "instant disable"
>>> into a "disable after a very short time", e.g., lowering the timeout from
>>> 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still
>>> disable vblank irqs for power saving if the desktop is really idle, but
>>> avoid on/off storms for the various drm_wait_vblank's that happen when
>>> preparing a swap.
>>
>> Yeah I think we could add code which only gets run for drivers which
>> support instant disable (i915 doesn't do that on gen2 because the hw is
>> lacking). There we should be able to update the vblank counter/timestamp
>> correctly without enabling interrupts temporarily. Ofc we need to make
>> sure we have enough nasty igt testcase to ensure there's not going to be
>> jumps and missed frame numbers in that case.

I'd rather go for the very simple "fast disable with short timeout" 
method. That would only be a tiny almost one-liner patch that reuses the 
existing timer for the default slow case, and we'd know already that it 
will work reliably on "instant off" capable drivers - no extra tests 
required. Those drm_vblank_get/put calls usually come in short bursts 
which should be covered by a timeout of maybe 1 to max. 3 refresh durations.

When we query the hw timestamps, we always have a little bit of 
unavoidable noise, even if it's often only +/- 1 usec on modern hw, so 
clients querying the timestamp for the same vblank would get slightly 
different results on repeated queries. On hw which only allows scanline 
granularity for queries, we can get variability up to 1 scanline 
duration. If the caller does things like delta calculations on those 
results (dT = currentts - lastts) it can get confusing results like time 
going backwards by a few microseconds. That's why the current code 
caches the last vblank ts, to save overhead and to make sure that 
repeated queries of the same vblank give identical results.

>
> 

[PATCH] drm: Return current vblank value for drmWaitVBlank queries

2015-03-18 Thread Mario Kleiner
On 03/18/2015 10:30 AM, Chris Wilson wrote:
> On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
>> drm_vblank_count_and_time() doesn't return the correct sequence number
>> while the vblank interrupt is disabled, does it? It returns the sequence
>> number from the last time vblank_disable_and_save() was called (when the
>> vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
>
> Ville enlightened me as well. I thought the value was cooked so that
> time did not pass whilst the IRQ was disabled. Hopefully, I can impress
> upon the Intel folks, at least, that enabling/disabling the interrupts
> just to read the current hw counter is interesting to say the least and
> sits at the top of the profiles when benchmarking Present.
> -Chris
>

drm_wait_vblank() not only gets the counter but also the corresponding 
vblank timestamp. Counters are recalculated in vblank_disable_and_save() 
for irq off, then in the vblank irq on path, and every refresh in 
drm_handle_vblank at vblank irq time.

The timestamps can be recalculated at any time iff the driver supports 
high precision timestamping, which currently intel kms, radeon kms, and 
nouveau kms do. But for other parts, like most SoC's, afaik you only get 
a valid timestamp by sampling system time in the vblank irq handler, so 
there you'd have a problem.

There are also some races around the enable/disable path which require a 
lot of care and exact knowledge of when each hardware fires its vblanks, 
updates its hardware counters etc. to get rid of them. Ville did that - 
successfully as far as my tests go - for Intel kms, but other drivers 
would be less forgiving.

Our current method is to:

a) Only disable vblank irqs after a default idle period of 5 seconds, so 
we don't get races frequent/likely enough to cause problems for clients. 
And we save the overhead for all the vblank irq on/off.

b) On drivers which have high precision timestamping and have been 
carefully checked to be race free (== intel kms only atm.) we have 
instant disable, so things like blinking cursors don't keep vblank irq 
on forever.

If b) causes so much overhead, maybe we could change the "instant 
disable" into a "disable after a very short time", e.g., lowering the 
timeout from 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That 
would still disable vblank irqs for power saving if the desktop is 
really idle, but avoid on/off storms for the various drm_wait_vblank's 
that happen when preparing a swap.

-mario


[PATCH] drm/qxl: Use drm_vblank_count()

2014-12-17 Thread Mario Kleiner
On 12/17/2014 10:37 AM, Ville Syrjälä wrote:
> On Wed, Dec 17, 2014 at 03:57:51AM +0100, Mario Kleiner wrote:
>> On 12/15/2014 04:56 PM, Thierry Reding wrote:
>>> From: Thierry Reding 
>>>
>>> The QXL driver duplicates part of the core's drm_vblank_count(), so it
>>> might as well use the core's variant for the extra goodies.
>>>
>>> Signed-off-by: Thierry Reding 
>>> ---
>>>drivers/gpu/drm/qxl/qxl_drv.c | 7 +--
>>>1 file changed, 1 insertion(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
>>> index 1d9b80c91a15..497024461a3c 100644
>>> --- a/drivers/gpu/drm/qxl/qxl_drv.c
>>> +++ b/drivers/gpu/drm/qxl/qxl_drv.c
>>> @@ -196,11 +196,6 @@ static int qxl_pm_restore(struct device *dev)
>>> return qxl_drm_resume(drm_dev, false);
>>>}
>>>
>>> -static u32 qxl_noop_get_vblank_counter(struct drm_device *dev, int crtc)
>>> -{
>>> -   return dev->vblank[crtc].count.counter;
>>> -}
>>> -
>>>static int qxl_noop_enable_vblank(struct drm_device *dev, int crtc)
>>>{
>>> return 0;
>>> @@ -231,7 +226,7 @@ static struct drm_driver qxl_driver = {
>>>DRIVER_HAVE_IRQ | DRIVER_IRQ_SHARED,
>>> .load = qxl_driver_load,
>>> .unload = qxl_driver_unload,
>>> -   .get_vblank_counter = qxl_noop_get_vblank_counter,
>>> +   .get_vblank_counter = drm_vblank_count,
>>> .enable_vblank = qxl_noop_enable_vblank,
>>> .disable_vblank = qxl_noop_disable_vblank,
>>>
>> Hi
>>
>> That doesn't really help, although it doesn't hurt either. Just wanted
>> to point out that both the old and new method implement a no-op. The
>> get_vblank_counter() driver function is meant to implement a hardware
>> vblank counter query. It's only use case atm. is to reinitialize the
>> dev->vblank[crtc].count.counter counter returned by drm_vblank_count().
>>
>> The most honest implementation if there isn't any way to get a hw vblank
>> count would be to just "return 0;" - Same net effect, but at least a
>> marker in the code that there is future work to do.
> Yeah 'return 0' is what we do in i915 when there's no hw counter. I did
> consider changing it to drm_vblank_count() since that seems to be the
> current fad. I was hoping it might allow removing some code from
> drm_irq.c, but after some more thought that might not be the case.
> I'll probably need to take another look at it.
>
>> I think a better solution would be if we wouldn't require
>> .get_vblank_counter to be non-NULL, don't fake implement it in
>> kms-drivers which can't do it, and make the drm core deal with lack of
>> hw counter queries, e.g., by not disabling vblank irqs.
> That seems a bit drastic. The current delayed disable
> seems quite reasonable to me. The count will remain accurate as long
> as the vblank irq is enabled, and if you wait for so long that the
> irq gets disabled, well, I don't think a very precise answer was
> needed anyway.

Agreed. The 5 second timeout is imho reasonable in practice. I just 
meant maybe we just check in the drm if that function is non-NULL, so 
drivers are not forced to implement no-op stubs of that function if they 
don't actually support it, just to avoid oopses.

> I was hunting some bugs in the vblank code recently, and while doing
> that I thought that I might change the code to use the timestamp
> difference between disable->enable to calculate an approximate number
> of vblanks lost and bump the counter appropriately. Didn't try it
> yet though, but seems like a reasonable idea when there's no hw
> counter. Though some care will be needed when dealing with
> drm_vblank_off/on.
>

Would be an option. Unsure if it is worth it or not in practice.

-mario



[PATCH] drm/qxl: Use drm_vblank_count()

2014-12-17 Thread Mario Kleiner
On 12/15/2014 04:56 PM, Thierry Reding wrote:
> From: Thierry Reding 
>
> The QXL driver duplicates part of the core's drm_vblank_count(), so it
> might as well use the core's variant for the extra goodies.
>
> Signed-off-by: Thierry Reding 
> ---
>   drivers/gpu/drm/qxl/qxl_drv.c | 7 +--
>   1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
> index 1d9b80c91a15..497024461a3c 100644
> --- a/drivers/gpu/drm/qxl/qxl_drv.c
> +++ b/drivers/gpu/drm/qxl/qxl_drv.c
> @@ -196,11 +196,6 @@ static int qxl_pm_restore(struct device *dev)
>   return qxl_drm_resume(drm_dev, false);
>   }
>   
> -static u32 qxl_noop_get_vblank_counter(struct drm_device *dev, int crtc)
> -{
> - return dev->vblank[crtc].count.counter;
> -}
> -
>   static int qxl_noop_enable_vblank(struct drm_device *dev, int crtc)
>   {
>   return 0;
> @@ -231,7 +226,7 @@ static struct drm_driver qxl_driver = {
>  DRIVER_HAVE_IRQ | DRIVER_IRQ_SHARED,
>   .load = qxl_driver_load,
>   .unload = qxl_driver_unload,
> - .get_vblank_counter = qxl_noop_get_vblank_counter,
> + .get_vblank_counter = drm_vblank_count,
>   .enable_vblank = qxl_noop_enable_vblank,
>   .disable_vblank = qxl_noop_disable_vblank,
>   

Hi

That doesn't really help, although it doesn't hurt either. Just wanted 
to point out that both the old and new method implement a no-op. The 
get_vblank_counter() driver function is meant to implement a hardware 
vblank counter query. It's only use case atm. is to reinitialize the 
dev->vblank[crtc].count.counter counter returned by drm_vblank_count().

The most honest implementation if there isn't any way to get a hw vblank 
count would be to just "return 0;" - Same net effect, but at least a 
marker in the code that there is future work to do.

I think a better solution would be if we wouldn't require 
.get_vblank_counter to be non-NULL, don't fake implement it in 
kms-drivers which can't do it, and make the drm core deal with lack of 
hw counter queries, e.g., by not disabling vblank irqs.

-mario



[PATCH weston 1/1] compositor: Abort on bad page flip timestamps

2014-11-07 Thread Mario Kleiner
On 06/11/14 17:42, Pekka Paalanen wrote:
> On Thu, 06 Nov 2014 16:08:56 +0900
> Michel Dänzer  wrote:
>
>> On 06.11.2014 03:06, Frederic Plourde wrote:
>>> Many features, like animations, hardly depend on page flip timestamps
>>> to work properly, but some DRM drivers do not correctly support page flip
>>> timestamps (or not at all) and in that case, things start to go wrong.
>>>
>>> This patch adds sanity check to weston_output_finish_frame. By solely
>>> verifying that page flip timestamps are monotonically increasing, we
>>> make sure that :
>>>
>>> 1) Underlying driver is not throwing zeroed-out timestamp series at us.
>>> 2) We have not mistakenly jumped backwards because of integer overflow.
>>>
>>> If a pathological case is detected, we gracefully exit Weston
>>> with an appropriate exit code to help developers debug their drivers.
>>
>> That seems a bit harsh. IIRC, zero can be returned for the timestamp
>> intermittently if no accurate timestamp value can be determined, e.g.
>> because the CRTC is disabled. At the very least, I'd recommend
>> double-checking this with Mario Kleiner (Cc'd) and the dri-devel mailing
>> list.
>
> Can that really happen if we are not doing stupid things like
> attempting to flip on a disabled crtc or output?
>

I don't think it could happen for non-stupid regular use and would 
indicate a driver bug.

> Or can it happen, if we schedule a flip, then disable the crtc
> before the flip completes? Or maybe when an output is hot-unplugged?
>

On kernels <= 3.17 disabling vblank irqs will clear the cached timestamp 
to zero, so a waitvblank ioctl() for a pure query of msc/ust could 
return zero as a signal of "invalid/undefined timestamp" on some drivers 
under some circumstances. Basically a -EAGAIN error code.

But that shouldn't ever happen for kms-pageflip completion timestamps, 
because vblank irqs don't get disabled while a pageflip is pending, as 
the pending pageflip keeps the vblank reference count > 0.

There are also some new patches into Linux 3.18rc which should prevent 
returning zero timestamps even for pure waitvblank ioctl() queries if 
vblank irqs get disabled for one reason or the other, cfe.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/drm_irq.c?id=844b03f27739135fe1fed2fef06da0ffc4c7a081

kms drivers should usually use the drm_send_vblank_event() helper in 
their pageflip completion routine. That helper will ideally send out the 
cached vblank count and high precision timestamps which were computed 
during the most recent vblank interrupt for the crtc that pageflipped. 
There's also some fallback there which, if no crtc is specified (crtc == 
-1), will simply return a msc of zero and the current system time as 
timestamp. The fallback is for gpus with weird flip completion 
behaviour. Currently nouveau uses it for old < NV-50 gpu's where we 
couldn't find a better way to make pageflip completion behave 
properly/reliably due to some hardware weirdness.

> Is zero a special timestamp that simply cannot be produced during
> normal operations, like due to clock wrap-around?
>

The timestamps are CLOCK_MONOTONIC uint32 seconds.microseconds since 
bootup, so a wraparound would ony happen after 2^32 seconds or 136 
years, so normal operation shouldn't cause a zero timestamp.

So i think observing zero pageflip timestamps would be a sign that the 
kms-driver needs fixing.

-mario

>
> Thanks,
> pq
>



[Intel-gfx] [PATCH 14/19] drm: Don't update vblank timestamp when the counter didn't change

2014-09-23 Thread Mario Kleiner
On 23/09/14 15:51, Daniel Vetter wrote:
> On Tue, Sep 23, 2014 at 03:48:25PM +0300, Jani Nikula wrote:
>> On Mon, 15 Sep 2014, Daniel Vetter  wrote:
>>> On Sat, Sep 13, 2014 at 06:25:54PM +0200, Mario Kleiner wrote:
>>>> The current drm-next misses Ville's original Patch 14/19, the one i first
>>>> objected, then objected to my objection. It is needed to avoid actual
>>>> regressions. Attached a trivially rebased (v2) of Ville's patch to go on 
>>>> top
>>>> of drm-next, also as tgz in case my e-mail client mangles the patch again,
>>>> because it's one of those "email hates me" weeks.
>>>
>>> Oh dear, I've made a decent mess of all of this really. Picked up to make
>>> sure it doesn't get lost again.
>>
>> After all this nice ping pong our QA has reported a bisected regression
>> on this commit: https://bugs.freedesktop.org/show_bug.cgi?id=84161
>
> Looks like a minuscule timing change which resulted in us detecting a fifo
> underrun. Or at least I don't see any other related information that would
> indicate otherwise ...
> -Daniel
>

There's nothing in that code path which could cause this - except for 
altered execution timing. I've seen that warning as well on my Intel HD 
Ironlake Mobile (MBP 2010), but only spuriously when plugging/unplugging 
an external display into the laptop iirc, so i thought it would be 
unrelated.

-mario



[PATCH 14/19] drm: Don't update vblank timestamp when the counter didn't change

2014-09-13 Thread Mario Kleiner
The current drm-next misses Ville's original Patch 14/19, the one i 
first objected, then objected to my objection. It is needed to avoid 
actual regressions. Attached a trivially rebased (v2) of Ville's patch 
to go on top of drm-next, also as tgz in case my e-mail client mangles 
the patch again, because it's one of those "email hates me" weeks.

-mario



On 08/06/2014 01:49 PM, ville.syrjala at linux.intel.com wrote:
> From: Ville Syrj?l? 
>
> If we already have a timestamp for the current vblank counter, don't
> update it with a new timestmap. Small errors can creep in between two
> timestamp queries for the same vblank count, which could be confusing to
> userspace when it queries the timestamp for the same vblank sequence
> number twice.
>
> This problem gets exposed when the vblank disable timer is not used
> (or is set to expire quickly) and thus we can get multiple vblank
> disable<->enable transition during the same frame which would all
> attempt to update the timestamp with the latest estimate.
>
> Testcase: igt/kms_flip/flip-vs-expired-vblank
> Signed-off-by: Ville Syrj?l? 
> ---
>   drivers/gpu/drm/drm_irq.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index af33df1..0523f5b 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -106,6 +106,9 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, int crtc)
>   DRM_DEBUG("enabling vblank interrupts on crtc %d, missed %d\n",
> crtc, diff);
>   
> + if (diff == 0)
> + return;
> +
>   /* Reinitialize corresponding vblank timestamp if high-precision query
>* available. Skip this step if query unsupported or failed. Will
>* reinitialize delayed at next vblank interrupt in that case.

-- next part --
A non-text attachment was scrubbed...
Name: 0001-drm-Don-t-update-vblank-timestamp-when-the-counter-d.patch
Type: text/x-patch
Size: 1862 bytes
Desc: not available
URL: 

-- next part --
A non-text attachment was scrubbed...
Name: 0001-drm-Don-t-update-vblank-timestamp-when-the-counter-d.patch.tar.gz
Type: application/gzip
Size: 1171 bytes
Desc: not available
URL: 



[PATCH 3/4] drm: Simplify return value of drm_get_last_vbltimestamp

2014-09-11 Thread Mario Kleiner
On 09/10/2014 05:36 PM, Daniel Vetter wrote:
> Imo u32 hints at a register value, but in reality all callers only
> care whether the sampled timestamp is precise or not. So give them
> just a bool.
>
> Also move the declaration out of drmP.h, it's only used in drm_irq.c.

All good. Maybe then also remove

EXPORT_SYMBOL(drm_get_last_vbltimestamp);

in this patch if the method is now static to drm_irq.c ? Up to you.

For all 4 patches...

Reviewed-by: Mario Kleiner 

-mario



> Cc: Mario Kleiner 
> Cc: Ville Syrj?l? 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/drm_irq.c | 24 +++-
>   include/drm/drmP.h|  2 --
>   2 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 922721ead29a..b16f0bcef959 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -70,6 +70,10 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> int, 0600);
>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> 0600);
>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);
>   
> +static bool
> +drm_get_last_vbltimestamp(struct drm_device *dev, int crtc,
> +   struct timeval *tvblank, unsigned flags);
> +
>   /**
>* drm_update_vblank_count - update the master vblank counter
>* @dev: DRM device
> @@ -89,7 +93,8 @@ module_param_named(timestamp_monotonic, 
> drm_timestamp_monotonic, int, 0600);
>   static void drm_update_vblank_count(struct drm_device *dev, int crtc)
>   {
>   struct drm_vblank_crtc *vblank = >vblank[crtc];
> - u32 cur_vblank, diff, tslot, rc;
> + u32 cur_vblank, diff, tslot;
> + bool rc;
>   struct timeval t_vblank;
>   
>   /*
> @@ -147,7 +152,7 @@ static void vblank_disable_and_save(struct drm_device 
> *dev, int crtc)
>   unsigned long irqflags;
>   u32 vblcount;
>   s64 diff_ns;
> - int vblrc;
> + bool vblrc;
>   struct timeval tvblank;
>   int count = DRM_TIMESTAMP_MAXRETRIES;
>   
> @@ -171,7 +176,7 @@ static void vblank_disable_and_save(struct drm_device 
> *dev, int crtc)
>* vblank interrupt is disabled.
>*/
>   if (!vblank->enabled &&
> - drm_get_last_vbltimestamp(dev, crtc, , 0) > 0) {
> + drm_get_last_vbltimestamp(dev, crtc, , 0)) {
>   drm_update_vblank_count(dev, crtc);
>   spin_unlock_irqrestore(>vblank_time_lock, irqflags);
>   return;
> @@ -219,7 +224,7 @@ static void vblank_disable_and_save(struct drm_device 
> *dev, int crtc)
>* available. In that case we can't account for this and just
>* hope for the best.
>*/
> - if ((vblrc > 0) && (abs64(diff_ns) > 100)) {
> + if (vblrc && (abs64(diff_ns) > 100)) {
>   /* Store new timestamp in ringbuffer. */
>   vblanktimestamp(dev, crtc, vblcount + 1) = tvblank;
>   
> @@ -786,10 +791,11 @@ static struct timeval get_drm_timestamp(void)
>* call, i.e., it isn't very precisely locked to the true vblank.
>*
>* Returns:
> - * Non-zero if timestamp is considered to be very precise, zero otherwise.
> + * True if timestamp is considered to be very precise, false otherwise.
>*/
> -u32 drm_get_last_vbltimestamp(struct drm_device *dev, int crtc,
> -   struct timeval *tvblank, unsigned flags)
> +static bool
> +drm_get_last_vbltimestamp(struct drm_device *dev, int crtc,
> +   struct timeval *tvblank, unsigned flags)
>   {
>   int ret;
>   
> @@ -801,7 +807,7 @@ u32 drm_get_last_vbltimestamp(struct drm_device *dev, int 
> crtc,
>   ret = dev->driver->get_vblank_timestamp(dev, crtc, _error,
>   tvblank, flags);
>   if (ret > 0)
> - return (u32) ret;
> + return true;
>   }
>   
>   /* GPU high precision timestamp query unsupported or failed.
> @@ -809,7 +815,7 @@ u32 drm_get_last_vbltimestamp(struct drm_device *dev, int 
> crtc,
>*/
>   *tvblank = get_drm_timestamp();
>   
> - return 0;
> + return false;
>   }
>   EXPORT_SYMBOL(drm_get_last_vbltimestamp);
>   
> diff --git a/include/drm/drmP.h b/include/drm/drmP.h
> index ad952b08711e..2ccb0e715569 100644
> --- a/include/drm/drmP.h
> +++ b/include/drm/drmP.h
> @@ -1004,8 +1004,6 @@ extern void drm_crtc_vblank_off(struct drm_crtc *crtc);
>   extern void drm_crtc_vblank_on(struct drm_crtc *crtc);
>   extern void drm_vblank_clean

[Intel-gfx] [PULL] topic/vblank-rework

2014-09-10 Thread Mario Kleiner
On Wed, Sep 10, 2014 at 5:29 PM, Daniel Vetter  
wrote:
> On Wed, Sep 10, 2014 at 4:19 PM, Mario Kleiner
>  wrote:
>> Hmm, not quite an ack from my side for the pull in its current form. I
>> said if the two remaining issues i mentioned are addressed, then i'm
>> happy with it and can have my reviewed/acked-by. Looking at the code
>> they haven't been adressed.
>
> Sorry about the confusion, I've somehow thought that you've retracted
> those comments in Message-ID:
> 
>
> But I've missed that that was about just one of the issues.
>

Thought so. That one patch turns out to be crucial. My own software
immediately complained loudly about broken vblank irqs and switched to
lower performance fallbacks when that patch was missing.

I'll test the patches on a few more cards in the next days - but so
far things look good at least as far as my special test cases go.

>> However, this is easily fixable on top of the current patches:
>>
>> 1. A vblank_disable_timeout module parameter of zero should always
>> leave vblank irq's enabled and also override the drivers choice,
>> otherwise a user can't override the driver on a broken driver/gpu
>> combo, which is the only use case for having that module parameter.
>> Currenty the disable_immediately flag overrides the users override ->
>> Ouch.
>>
>> So in drm_vblank_put():
>>
>> ...
>>
>> /* Last user schedules interrupt disable */
>> if (atomic_dec_and_test(>refcount)) {
>>>>> Insert zero -> opt-out check <<<
>>if (drm_vblank_offdelay == 0)
>>return;
>>>>> Remaining code continues <<<
>>if (dev->vblank_disable_immediate || drm_vblank_offdelay < 0)
>>vblank_disable_fn((unsigned long)vblank);
>>else if (drm_vblank_offdelay > 0)
>>mod_timer(>disable_timer, jiffies +
>> ((drm_vblank_offdelay * HZ)/1000));
>
> Yeah, I guess that makes sense. I'm not really a fan of giving users
> too powerful module options to hack around driver bugs since often
> that means they'll never report the bug :( But we have the support now
> to mark certain module options as debug-only and they'll taint the
> kernel if set, so this is fixable.
>
> I'll follow up with the patch you've suggested.
>

Thanks. I think the modules parameters i usually care about will get
proper testing and reporting, because while my software and users are
good at detecting such problems, they wouldn't know how to fix them
themselves, and at the same time they crucially depend on this stuff
working, so this gets reported to me quickly and i can give them the
module param workaround in private e-mail and take it from there with
proper bug reports or patches.

>> ...
>>
>> 2. For the "drm: Have the vblank counter account for the time ... "
>> patch, we must opt-out of that last timestamp/counter update/bump if
>> the driver doesn't support high-precision vblank timestamping,
>> otherwise the vblank count and timestamp will be inconsistent with
>> each other - or outright wrong in case of the timestamp. Rather
>> deliver a slightly outdated, but correct count+timestamp pair to
>> userspace, which is still useable for practical purposes, than a pair
>> that's outright wrong and will definitely confuse clients.
>>
>> A simple fix in static void vblank_disable_and_save() would be to
>> replace the new...
>>
>> if (!vblank->enabled) {
>>
>> ... check by ...
>>
>> if (!vblank->enabled &&
>> ) {
>
> Yeah, makes sense (well the follow-up one ofc). I'll do a patch which
> adds this and adds a comment. Aside I think it would be useful to add
> a #define for the 0 return value, since the magic checks all over are
> imo fairly hard to understand.
>
> I'll also float a patch for rfc about that.
>

Good!

thanks,
-mario

> Thanks for your comments and again my apologies for missing that
> there's still outstanding work left to do on this.
>
> Cheers, Daniel
>
>>
>>
>> On Wed, Sep 10, 2014 at 2:05 PM, Daniel Vetter  
>> wrote:
>>> Hi Dave,
>>>
>>> So here's the final bits of Ville's vblank rework with a bit of cleanup
>>> from Mario on top.
>>>
>>> The neat thing this finally allows is to immediately disable the vblank
>>> interrupt on the last drm_vblank_put if the hardware has perfectly
>>> accurate vblank counter and timestamp readout support. On i915 that
>>> required piles of small adjustements from Ville since depending upon the
>>> platform and port the vblank happens at different scanout lines.
>>>
>>> Of cou

[Intel-gfx] [PULL] topic/vblank-rework

2014-09-10 Thread Mario Kleiner
e-mail snafu, sent it too early by accident, and from a gmail web
interface which i'm apparently incapable of using properly...

The second fix should look like this:

> A simple fix in static void vblank_disable_and_save() would be to
> replace the new...
>
> if (!vblank->enabled) {
>
> ... check by ...

if (!vblank->enabled &&
   drm_get_last_vbltimestamp(dev, crtc, , 0)) {

... We need to make sure timestamp queries work and are actually
locked to the vblank, otherwise we can't do that last update there in
vblank_disable_and_save().


With these two fixes or similar applied i'd be happy, otherwise it
will inflict pain and real bugs on real users.

thanks,
-mario



On Wed, Sep 10, 2014 at 4:19 PM, Mario Kleiner
 wrote:
> Hmm, not quite an ack from my side for the pull in its current form. I
> said if the two remaining issues i mentioned are addressed, then i'm
> happy with it and can have my reviewed/acked-by. Looking at the code
> they haven't been adressed.
>
> However, this is easily fixable on top of the current patches:
>
> 1. A vblank_disable_timeout module parameter of zero should always
> leave vblank irq's enabled and also override the drivers choice,
> otherwise a user can't override the driver on a broken driver/gpu
> combo, which is the only use case for having that module parameter.
> Currenty the disable_immediately flag overrides the users override ->
> Ouch.
>
> So in drm_vblank_put():
>
> ...
>
> /* Last user schedules interrupt disable */
> if (atomic_dec_and_test(>refcount)) {
>>>> Insert zero -> opt-out check <<<
>if (drm_vblank_offdelay == 0)
>return;
>>>> Remaining code continues <<<
>if (dev->vblank_disable_immediate || drm_vblank_offdelay < 0)
>vblank_disable_fn((unsigned long)vblank);
>else if (drm_vblank_offdelay > 0)
>mod_timer(>disable_timer, jiffies +
> ((drm_vblank_offdelay * HZ)/1000));
>
> ...
>
> 2. For the "drm: Have the vblank counter account for the time ... "
> patch, we must opt-out of that last timestamp/counter update/bump if
> the driver doesn't support high-precision vblank timestamping,
> otherwise the vblank count and timestamp will be inconsistent with
> each other - or outright wrong in case of the timestamp. Rather
> deliver a slightly outdated, but correct count+timestamp pair to
> userspace, which is still useable for practical purposes, than a pair
> that's outright wrong and will definitely confuse clients.
>
> A simple fix in static void vblank_disable_and_save() would be to
> replace the new...
>
> if (!vblank->enabled) {
>
> ... check by ...
>
> if (!vblank->enabled &&
> ) {
>
>
> On Wed, Sep 10, 2014 at 2:05 PM, Daniel Vetter  
> wrote:
>> Hi Dave,
>>
>> So here's the final bits of Ville's vblank rework with a bit of cleanup
>> from Mario on top.
>>
>> The neat thing this finally allows is to immediately disable the vblank
>> interrupt on the last drm_vblank_put if the hardware has perfectly
>> accurate vblank counter and timestamp readout support. On i915 that
>> required piles of small adjustements from Ville since depending upon the
>> platform and port the vblank happens at different scanout lines.
>>
>> Of course this is fully opt-in and per-device (we need that since gen2
>> doesn't have a hw vblank counter).
>>
>> Mario reviewed the entire pile too and after some initial hesitation
>> (about drivers without accurate timestampt support) acked it.
>>
>> Cheers, Daniel
>>
>>
>> The following changes since commit 21d70354bba9965a098382fc4d7fb17e138111f3:
>>
>>   drm: move drm_stub.c to drm_drv.c (2014-08-06 19:10:44 +1000)
>>
>> are available in the git repository at:
>>
>>   git://anongit.freedesktop.org/drm-intel tags/topic/vblank-rework-2014-09-10
>>
>> for you to fetch changes up to 2368ffb18b1d2b04eb80478d225676caa7a3c4c8:
>>
>>   drm: Use vblank_disable_and_save in drm_vblank_cleanup() (2014-09-10 
>> 09:41:29 +0200)
>>
>> 
>> Mario Kleiner (2):
>>   drm: Remove drm_vblank_cleanup from drm_vblank_init error path.
>>   drm: Use vblank_disable_and_save in drm_vblank_cleanup()
>>
>> Ville Syrj?l? (16):
>>   drm: Always reject drm_vblank_get() after drm_vblank_off()
>>   drm/i915: Warn if drm_vblank_get() still works after drm_vblank_off()
>>   drm: Don't clear vblank timestamps when vblank interrupt is disabled
>>   drm: Move drm_update_vblank_count()
>>   drm: Have the vblank counter 

[Intel-gfx] [PULL] topic/vblank-rework

2014-09-10 Thread Mario Kleiner
Hmm, not quite an ack from my side for the pull in its current form. I
said if the two remaining issues i mentioned are addressed, then i'm
happy with it and can have my reviewed/acked-by. Looking at the code
they haven't been adressed.

However, this is easily fixable on top of the current patches:

1. A vblank_disable_timeout module parameter of zero should always
leave vblank irq's enabled and also override the drivers choice,
otherwise a user can't override the driver on a broken driver/gpu
combo, which is the only use case for having that module parameter.
Currenty the disable_immediately flag overrides the users override ->
Ouch.

So in drm_vblank_put():

...

/* Last user schedules interrupt disable */
if (atomic_dec_and_test(>refcount)) {
>>> Insert zero -> opt-out check <<<
   if (drm_vblank_offdelay == 0)
   return;
>>> Remaining code continues <<<
   if (dev->vblank_disable_immediate || drm_vblank_offdelay < 0)
   vblank_disable_fn((unsigned long)vblank);
   else if (drm_vblank_offdelay > 0)
   mod_timer(>disable_timer, jiffies +
((drm_vblank_offdelay * HZ)/1000));

...

2. For the "drm: Have the vblank counter account for the time ... "
patch, we must opt-out of that last timestamp/counter update/bump if
the driver doesn't support high-precision vblank timestamping,
otherwise the vblank count and timestamp will be inconsistent with
each other - or outright wrong in case of the timestamp. Rather
deliver a slightly outdated, but correct count+timestamp pair to
userspace, which is still useable for practical purposes, than a pair
that's outright wrong and will definitely confuse clients.

A simple fix in static void vblank_disable_and_save() would be to
replace the new...

if (!vblank->enabled) {

... check by ...

if (!vblank->enabled &&
) {


On Wed, Sep 10, 2014 at 2:05 PM, Daniel Vetter  
wrote:
> Hi Dave,
>
> So here's the final bits of Ville's vblank rework with a bit of cleanup
> from Mario on top.
>
> The neat thing this finally allows is to immediately disable the vblank
> interrupt on the last drm_vblank_put if the hardware has perfectly
> accurate vblank counter and timestamp readout support. On i915 that
> required piles of small adjustements from Ville since depending upon the
> platform and port the vblank happens at different scanout lines.
>
> Of course this is fully opt-in and per-device (we need that since gen2
> doesn't have a hw vblank counter).
>
> Mario reviewed the entire pile too and after some initial hesitation
> (about drivers without accurate timestampt support) acked it.
>
> Cheers, Daniel
>
>
> The following changes since commit 21d70354bba9965a098382fc4d7fb17e138111f3:
>
>   drm: move drm_stub.c to drm_drv.c (2014-08-06 19:10:44 +1000)
>
> are available in the git repository at:
>
>   git://anongit.freedesktop.org/drm-intel tags/topic/vblank-rework-2014-09-10
>
> for you to fetch changes up to 2368ffb18b1d2b04eb80478d225676caa7a3c4c8:
>
>   drm: Use vblank_disable_and_save in drm_vblank_cleanup() (2014-09-10 
> 09:41:29 +0200)
>
> 
> Mario Kleiner (2):
>   drm: Remove drm_vblank_cleanup from drm_vblank_init error path.
>   drm: Use vblank_disable_and_save in drm_vblank_cleanup()
>
> Ville Syrj?l? (16):
>   drm: Always reject drm_vblank_get() after drm_vblank_off()
>   drm/i915: Warn if drm_vblank_get() still works after drm_vblank_off()
>   drm: Don't clear vblank timestamps when vblank interrupt is disabled
>   drm: Move drm_update_vblank_count()
>   drm: Have the vblank counter account for the time between vblank irq 
> disable and drm_vblank_off()
>   drm: Avoid random vblank counter jumps if the hardware counter has been 
> reset
>   drm: Reduce the amount of dev->vblank[crtc] in the code
>   drm: Fix deadlock between event_lock and vbl_lock/vblank_time_lock
>   drm: Fix race between drm_vblank_off() and drm_queue_vblank_event()
>   drm: Disable vblank interrupt immediately when drm_vblank_offdelay<0
>   drm: Add dev->vblank_disable_immediate flag
>   drm/i915: Opt out of vblank disable timer on >gen2
>   drm: Kick start vblank interrupts at drm_vblank_on()
>   drm/i915: Update scanline_offset only for active crtcs
>   drm: Fix confusing debug message in drm_update_vblank_count()
>   drm: Store the vblank timestamp when adjusting the counter during 
> disable
>
>  Documentation/DocBook/drm.tmpl   |   7 +
>  drivers/gpu/drm/drm_drv.c|   4 +-
>  drivers/gpu/drm/drm_irq.c| 345 
> ++-
>  drivers/gpu/drm/i915/i915_irq.c  |   8 +
>  drivers/gpu/drm/i915/intel_display.c |  17 +-

[PATCH 14/19] drm: Don't update vblank timestamp when the counter didn't change

2014-09-04 Thread Mario Kleiner
I thought about this one again and opposed to my previous comment now think
it's fine, also for drivers without hw vblank counter queries.

-mario



On Wed, Aug 6, 2014 at 1:49 PM,  wrote:

> From: Ville Syrj?l? 
>
> If we already have a timestamp for the current vblank counter, don't
> update it with a new timestmap. Small errors can creep in between two
> timestamp queries for the same vblank count, which could be confusing to
> userspace when it queries the timestamp for the same vblank sequence
> number twice.
>
> This problem gets exposed when the vblank disable timer is not used
> (or is set to expire quickly) and thus we can get multiple vblank
> disable<->enable transition during the same frame which would all
> attempt to update the timestamp with the latest estimate.
>
> Testcase: igt/kms_flip/flip-vs-expired-vblank
> Signed-off-by: Ville Syrj?l? 
> ---
>  drivers/gpu/drm/drm_irq.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index af33df1..0523f5b 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -106,6 +106,9 @@ static void drm_update_vblank_count(struct drm_device
> *dev, int crtc)
> DRM_DEBUG("enabling vblank interrupts on crtc %d, missed %d\n",
>   crtc, diff);
>
> +   if (diff == 0)
> +   return;
> +
> /* Reinitialize corresponding vblank timestamp if high-precision
> query
>  * available. Skip this step if query unsupported or failed. Will
>  * reinitialize delayed at next vblank interrupt in that case.
> --
> 1.8.5.5
>
>
-- next part --
An HTML attachment was scrubbed...
URL: 



[PATCH 05/19] drm: Have the vblank counter account for the time between vblank irq disable and drm_vblank_off()

2014-09-02 Thread Mario Kleiner
Hi Ville,

went through the vblank rework patch set, mostly looks good to me. I 
couldn't find any bugs in the code. A first quick test-run on my old 
Intel GMA-950 (Gen 3'ish i think?) also didn't show apparent problems 
with the OML_sync_control functions. I'll try to test more carefully 
with that card and maybe with a few more cards in the next days, if i 
can get my hands on something more recent.

The problematic bits:

Patch 3/19 [Don't clear vblank timestamp...] in combination with [5/19 
below]:

I agree that not clearing the timestamps during drm_vblank_off() is 
probably the better thing to do for userspace. The idea behind clearing 
the timestamps was that a ust timestamp of zero signals to userspace 
that the timestamp is invalid/undefined at the moment, so the client 
should retry the query if it needs a valid timestamp. This worked in 
practice insofar as a value of zero can't happen normally, unless a 
client would query a timestamp during the first microsecond since 
machine powerup. But i guess returning the last valid (msc, ust) pair to 
a client during vblank off may be better for things like compositors 
etc. I also wonder if we ever documented this ust == 0 -> -EAGAIN behaviour?

The problem with patch 5/19 is gpus/drivers which don't provide precise 
instantaneous vblank timestamps - which are afaik everything except 
intel, radeon and nouveau. On such drivers, the old code would return a 
zero ust timestamp during queries after the first drm_vblank_get() and 
until the first vblank irq happens and initializes the timestamps to 
something valid. The zero ust would signal "please retry" to the client. 
With patch 5/19 you'd get an updated vblank counter with an outdated 
vblank timestamp - whatever is stored in the ringbuffer from the past, 
because drm_update_vblank_count() can't update the timestamp without 
support for the optional vblank-timestamp driver function. A mismatched 
msc, ust would be very confusing to clients.

The only way one could get valid msc + ust on such drivers would be to 
enable vblank irq's and then wait for the next vblank irq to actually 
update the timestamp, at the cost of a couple of msecs waiting time.

So either have drm_update_vblank_count() itself sleep until next vblank 
"if (!rc) ..." at the very end, as a rc == 0 would signal an 
imprecise/wrong vblank timestamp. Or have all callers of it do this, if 
locking makes it neccessary. Or only care about it for the 
drm_vblank_off() special case, e.g., if !vblank->enabled there, then 
drm_vblank_get() -> wait for a valid timestamp and count to show up -> 
drm_vblank_put() -> vblank_disable_and_save().


For Patch 11/19 [Add dev->vblank_disable_immediate flag]: Can we make it 
so that a drm_vblank_offdelay module parameter of zero always overrides 
the flag? The only reason why a user wants to set drm_vblank_offdelay to 
zero is if that user absolutely needs precise and reliable vblank 
counts/timestamps and finds out that something is broken with his 
driver+gpu, so uses this as an override to temporarily fix a broken 
driver. That doesn't work if the vblank_disable_immediate flag overrides 
the override from the user - the user couldn't opt out of the trouble.

This might not be such an issue with Intel cards, as you have test 
suites and a QA team, and i assume/hope you tested every single intel 
gpu shipped in the last decade or so if the whole vblank off/on logic 
really is perfectly race-free now? At least it seems to work with that 
one gen-3 card i quickly tested. But for most other drivers with small 
teams and no dedicated QA this could end badly quickly for the user 
without any manual override.

The docs should probably clarify that a hw vblank counter isn't enough 
for the vblank_disable_immediate flag to be set. Their vblank 
off/on/hardware counter query implementation must be completely race 
free. iirc this means the hw counter query must behave as if the vblank 
counter always increments at the leading edge of vblank. E.g., radeon 
has hw counter queries, but the counter increments either at the 
trailing edge, or somewhere in the middle of vblank, so there it 
wouldn't work without races, causing off-by-one errors sometimes.

For Patch 14/19 [Don't update vblank timestamp when the counter didn't 
change]

That would go wrong if a driver doesn't implement a proper vblank 
counter query. E.g., nouveau has precise vblank timestamping since Linux 
3.14, but still no functional hw counter query.

Almost all embedded gpu drivers currently implement completely bogus hw 
vblank counter queries, because that driver hook is mandatory. I think 
it would make sense if we would make that hook optional, allow a NULL 
function pointer and adapt to the lack of that query, e.g., by never 
disabling vblank irq's, except in drm_vblank_off() when a kms-driver 
insists on disabling its irq during modeset/dpms off/suspend etc.

With these remarks somehow taken into a

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-13 Thread Mario Kleiner
On 08/13/2014 03:50 AM, Michel D?nzer wrote:
> On 12.08.2014 00:17, Jerome Glisse wrote:
>> On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:
>>> On 08/10/2014 08:02 PM, Mario Kleiner wrote:
>>>> On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:
>>>>> On 08/10/2014 05:11 AM, Mario Kleiner wrote:
>>>>>> The other problem is that probably TTM does not reuse pages from the
>>>>>> DMA pool. If i trace the __ttm_dma_alloc_page
>>>>>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>
>>>>>> and
>>>>>> __ttm_dma_free_page
>>>>>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>
>>>>>> calls for
>>>>>> those single page allocs/frees, then over a 20 second interval of
>>>>>> tracing and switching tabs in firefox, scrolling things around etc. i
>>>>>> find about as many alloc's as i find free's, e.g., 1607 allocs vs.
>>>>>> 1648 frees.
>>>>> This is because historically the pools have been designed to keep only
>>>>> pages with nonstandard caching attributes since changing page caching
>>>>> attributes have been very slow but the kernel page allocators have been
>>>>> reasonably fast.
>>>>>
>>>>> /Thomas
>>>> Ok. A bit more ftraceing showed my hang problem case goes through the
>>>> "if (is_cached)" paths, so the pool doesn't recycle anything and i see
>>>> it bouncing up and down by 4 pages all the time.
>>>>
>>>> But for the non-cached case, which i don't hit with my problem, could
>>>> one of you look at line 954...
>>>>
>>>> https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60
>>>>
>>>>
>>>> ... and tell me why that unconditional npages = count; assignment
>>>> makes sense? It seems to essentially disable all recycling for the dma
>>>> pool whenever the pool isn't filled up to/beyond its maximum with free
>>>> pages? When the pool is filled up, lots of stuff is recycled, but when
>>>> it is already somewhat below capacity, it gets "punished" by not
>>>> getting refilled? I'd just like to understand the logic behind that line.
>>>>
>>>> thanks,
>>>> -mario
>>> I'll happily forward that question to Konrad who wrote the code (or it
>>> may even stem from the ordinary page pool code which IIRC has Dave
>>> Airlie / Jerome Glisse as authors)
>> This is effectively bogus code, i now wonder how it came to stay alive.
>> Attached patch will fix that.
> I haven't tested Mario's scenario specifically, but it survived piglit
> and the UE4 Effects Cave Demo (for which 1GB of VRAM isn't enough, so
> some BOs ended up in GTT instead with write-combined CPU mappings) on
> radeonsi without any noticeable issues.
>
> Tested-by: Michel D?nzer 
>
>

I haven't tested the patch yet. For the original bug it won't help 
directly, because the super-slow allocations which cause the desktop 
stall are tt_cached allocations, so they go through the if (is_cached) 
code path which isn't improved by Jerome's patch. is_cached always 
releases memory immediately, so the tt_cached pool just bounces up and 
down between 4 and 7 pages. So this was an independent issue. The slow 
allocations i noticed were mostly caused by exa allocating new gem bo's, 
i don't know which path is taken by 3d graphics?

However, the fixed ttm path could indirectly solve the DMA_CMA stalls by 
completely killing CMA for its intended purpose. Typical CMA sizes are 
probably around < 100 MB (kernel default is 16 MB, Ubuntu config is 64 
MB), and the limit for the page pool seems to be more like 50% of all 
system RAM? Iow. if the ttm dma pool is allowed to grow that big with 
recycled pages, it probably will almost compl

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-12 Thread Mario Kleiner
On 08/11/2014 05:17 PM, Jerome Glisse wrote:
> On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:
>> On 08/10/2014 08:02 PM, Mario Kleiner wrote:
>>> On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:
>>>> On 08/10/2014 05:11 AM, Mario Kleiner wrote:
>>>>> Resent this time without HTML formatting which lkml doesn't like.
>>>>> Sorry.
>>>>>
>>>>> On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:
>>>>>> On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>> On August 9, 2014 1:39:39 AM EDT, Thomas
>>>>>>> Hellstrom  wrote:
>>>>>>>> Hi.
>>>>>>>>
>>>>>>> Hey Thomas!
>>>>>>>
>>>>>>>> IIRC I don't think the TTM DMA pool allocates coherent pages more
>>>>>>>> than
>>>>>>>> one page at a time, and _if that's true_ it's pretty unnecessary for
>>>>>>>> the
>>>>>>>> dma subsystem to route those allocations to CMA. Maybe Konrad could
>>>>>>>> shed
>>>>>>>> some light over this?
>>>>>>> It should allocate in batches and keep them in the TTM DMA pool for
>>>>>>> some time to be reused.
>>>>>>>
>>>>>>> The pages that it gets are in 4kb granularity though.
>>>>>> Then I feel inclined to say this is a DMA subsystem bug. Single page
>>>>>> allocations shouldn't get routed to CMA.
>>>>>>
>>>>>> /Thomas
>>>>> Yes, seems you're both right. I read through the code a bit more and
>>>>> indeed the TTM DMA pool allocates only one page during each
>>>>> dma_alloc_coherent() call, so it doesn't need CMA memory. The current
>>>>> allocators don't check for single page CMA allocations and therefore
>>>>> try to get it from the CMA area anyway, instead of skipping to the
>>>>> much cheaper fallback.
>>>>>
>>>>> So the callers of dma_alloc_from_contiguous() could need that little
>>>>> optimization of skipping it if only one page is requested. For
>>>>>
>>>>> dma_generic_alloc_coherent
>>>>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Ddma_generic_alloc_coherent=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d1852625e2ab2ff07eb34a7f33fc1f55f7f13959912d5a6ce9316d23070ce939>
>>>>>
>>>>> andintel_alloc_coherent
>>>>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Dintel_alloc_coherent=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=82d587e9b6aeced5cf9a7caefa91bf47fba809f3522b7379d22e45a2d5d35ebd>
>>>>> this
>>>>> seems easy to do. Looking at the arm arch variants, e.g.,
>>>>>
>>>>> https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c%23L1194=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=4c178257eab9b5d7ca650dedba76cf27abeb49ddc7aebb9433f52b6c8bb3bbac
>>>>>
>>>>>
>>>>> and
>>>>>
>>>>> https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c%23L44=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=5f62f4cbe8cee1f1dd4cbba656354efe6867bcdc664cf90e9719e2f42a85de08
>>>>>
>>>>>
>>>>> i'm not sure if it is that easily done, as there aren't any fallbacks
>>>>> for such a case and the code looks to me as if that's at least
>>>>> somewhat intentional.
>>>>>
>>>>> As far as TTM goes, one quick one-line fix to prevent it from using
>>>>> the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
>>>>> above methods) would be to clear the __GFP_WAIT
>>>>> <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d56d076770d3416264be6c9ea28

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-10 Thread Mario Kleiner
On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:
> On 08/10/2014 05:11 AM, Mario Kleiner wrote:
>> Resent this time without HTML formatting which lkml doesn't like. Sorry.
>>
>> On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:
>>> On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:
>>>> On August 9, 2014 1:39:39 AM EDT, Thomas
>>>> Hellstrom  wrote:
>>>>> Hi.
>>>>>
>>>> Hey Thomas!
>>>>
>>>>> IIRC I don't think the TTM DMA pool allocates coherent pages more than
>>>>> one page at a time, and _if that's true_ it's pretty unnecessary for
>>>>> the
>>>>> dma subsystem to route those allocations to CMA. Maybe Konrad could
>>>>> shed
>>>>> some light over this?
>>>> It should allocate in batches and keep them in the TTM DMA pool for
>>>> some time to be reused.
>>>>
>>>> The pages that it gets are in 4kb granularity though.
>>> Then I feel inclined to say this is a DMA subsystem bug. Single page
>>> allocations shouldn't get routed to CMA.
>>>
>>> /Thomas
>> Yes, seems you're both right. I read through the code a bit more and
>> indeed the TTM DMA pool allocates only one page during each
>> dma_alloc_coherent() call, so it doesn't need CMA memory. The current
>> allocators don't check for single page CMA allocations and therefore
>> try to get it from the CMA area anyway, instead of skipping to the
>> much cheaper fallback.
>>
>> So the callers of dma_alloc_from_contiguous() could need that little
>> optimization of skipping it if only one page is requested. For
>>
>> dma_generic_alloc_coherent
>> <http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent>
>> andintel_alloc_coherent
>> <http://lxr.free-electrons.com/ident?i=intel_alloc_coherent>  this
>> seems easy to do. Looking at the arm arch variants, e.g.,
>>
>> http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194
>>
>> and
>>
>> http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44
>>
>> i'm not sure if it is that easily done, as there aren't any fallbacks
>> for such a case and the code looks to me as if that's at least
>> somewhat intentional.
>>
>> As far as TTM goes, one quick one-line fix to prevent it from using
>> the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
>> above methods) would be to clear the __GFP_WAIT
>> <http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the
>> passed gfp_t flags. That would trigger the well working fallback. So, is
>>
>> __GFP_WAIT  <http://lxr.free-electrons.com/ident?i=__GFP_WAIT>  needed
>> for those single page allocations that go through__ttm_dma_alloc_page
>> <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>?
>>
>> It would be nice to have such a simple, non-intrusive one-line patch
>> that we still could get into 3.17 and then backported to older stable
>> kernels to avoid the same desktop hangs there if CMA is enabled. It
>> would be also nice for actual users of CMA to not use up lots of CMA
>> space for gpu's which don't need it. I think DMA_CMA was introduced
>> around 3.12.
>>
> I don't think that's a good idea. Omitting __GFP_WAIT would cause
> unnecessary memory allocation errors on systems under stress.
> I think this should be filed as a DMA subsystem kernel bug / regression
> and an appropriate solution should be worked out together with the DMA
> subsystem maintainers and then backported.

Ok, so it is needed. I'll file a bug report.

>> The other problem is that probably TTM does not reuse pages from the
>> DMA pool. If i trace the __ttm_dma_alloc_page
>> <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and
>> __ttm_dma_free_page
>> <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for
>> those single page allocs/frees, then over a 20 second interval of
>> tracing and switching tabs in firefox, scrolling things around etc. i
>> find about as many alloc's as i find free's, e.g., 1607 allocs vs.
>> 1648 frees.
> This is because historically the pools have been designed to keep only
> pages with nonstandard caching attributes since changing page caching
> attributes have been very slow but the kernel page allocators have been
> reasonably fast.
>
> /Thomas

Ok. A bit more ftraceing showed my hang problem case goes through the 
"if (is_cached)" paths, so the pool doesn't recycle anything and i see 
it bouncing up and

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-10 Thread Mario Kleiner
Resent this time without HTML formatting which lkml doesn't like. Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:
> On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:
>> On August 9, 2014 1:39:39 AM EDT, Thomas Hellstrom 
>>  wrote:
>>> Hi.
>>>
>> Hey Thomas!
>>
>>> IIRC I don't think the TTM DMA pool allocates coherent pages more than
>>> one page at a time, and _if that's true_ it's pretty unnecessary for
>>> the
>>> dma subsystem to route those allocations to CMA. Maybe Konrad could
>>> shed
>>> some light over this?
>> It should allocate in batches and keep them in the TTM DMA pool for some 
>> time to be reused.
>>
>> The pages that it gets are in 4kb granularity though.
> Then I feel inclined to say this is a DMA subsystem bug. Single page
> allocations shouldn't get routed to CMA.
>
> /Thomas

Yes, seems you're both right. I read through the code a bit more and 
indeed the TTM DMA pool allocates only one page during each 
dma_alloc_coherent() call, so it doesn't need CMA memory. The current 
allocators don't check for single page CMA allocations and therefore try 
to get it from the CMA area anyway, instead of skipping to the much 
cheaper fallback.

So the callers of dma_alloc_from_contiguous() could need that little 
optimization of skipping it if only one page is requested. For

dma_generic_alloc_coherent  
<http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent>  
andintel_alloc_coherent  
<http://lxr.free-electrons.com/ident?i=intel_alloc_coherent>  this seems easy 
to do. Looking at the arm arch variants, e.g.,

http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194

and

http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44

i'm not sure if it is that easily done, as there aren't any fallbacks 
for such a case and the code looks to me as if that's at least somewhat 
intentional.

As far as TTM goes, one quick one-line fix to prevent it from using the 
CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above 
methods) would be to clear the __GFP_WAIT 
<http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the passed 
gfp_t flags. That would trigger the well working fallback. So, is

__GFP_WAIT  <http://lxr.free-electrons.com/ident?i=__GFP_WAIT>  needed for 
those single page allocations that go through__ttm_dma_alloc_page  
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>?

It would be nice to have such a simple, non-intrusive one-line patch 
that we still could get into 3.17 and then backported to older stable 
kernels to avoid the same desktop hangs there if CMA is enabled. It 
would be also nice for actual users of CMA to not use up lots of CMA 
space for gpu's which don't need it. I think DMA_CMA was introduced 
around 3.12.


The other problem is that probably TTM does not reuse pages from the DMA 
pool. If i trace the __ttm_dma_alloc_page 
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and 
__ttm_dma_free_page 
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for 
those single page allocs/frees, then over a 20 second interval of 
tracing and switching tabs in firefox, scrolling things around etc. i 
find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 
frees.

This bit of code fromttm_dma_unpopulate 
<http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate>()  (line 954 
in 3.16) looks suspicious:

http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954

Alloc's from a tt_cached cached pool ( if (is_cached)...) always get 
freed and are not given back to the cached pool. But in the uncached 
case, there's logic to make sure the pool doesn't grow forever (line 
955, checking against _manager->options.max_size), but before that check 
in line 954 there's an uncoditional assignment of npages = count; which 
seems to force freeing all pages as well, instead of recycling? Is this 
some debug code left over, or intentional and just me not understanding 
what happens there?

thanks,
-mario


>>> /Thomas
>>>
>>>
>>> On 08/08/2014 07:42 PM, Mario Kleiner wrote:
>>>> Hi all,
>>>>
>>>> there is a rather severe performance problem i accidentally found
>>> when
>>>> trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under
>>>> Ubuntu 14.04 LTS with nouveau as graphics driver.
>>>>
>>>> I was lazy and just installed the Ubuntu precompiled mainline kernel.
>>>> That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA
>>>> (contiguous memory allocator) size of 64 MB. Older Ubuntu kernels
>>>> weren't compiled with CMA, so i only observed this on 3.16, but
>>>> p

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-10 Thread Mario Kleiner
On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:
>
> On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:
>> On August 9, 2014 1:39:39 AM EDT, Thomas Hellstrom > vmware.com> wrote:
>>> Hi.
>>>
>> Hey Thomas!
>>
>>> IIRC I don't think the TTM DMA pool allocates coherent pages more than
>>> one page at a time, and _if that's true_ it's pretty unnecessary for
>>> the
>>> dma subsystem to route those allocations to CMA. Maybe Konrad could
>>> shed
>>> some light over this?
>> It should allocate in batches and keep them in the TTM DMA pool for some 
>> time to be reused.
>>
>> The pages that it gets are in 4kb granularity though.
> Then I feel inclined to say this is a DMA subsystem bug. Single page
> allocations shouldn't get routed to CMA.
>
> /Thomas

Yes, seems you're both right. I read through the code a bit more and 
indeed the TTM DMA pool allocates only one page during each 
dma_alloc_coherent() call, so it doesn't need CMA memory. The current 
allocators don't check for single page CMA allocations and therefore try 
to get it from the CMA area anyway, instead of skipping to the much 
cheaper fallback.

So the callers of dma_alloc_from_contiguous() could need that little 
optimization of skipping it if only one page is requested. For

dma_generic_alloc_coherent  
<http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent>  
andintel_alloc_coherent  
<http://lxr.free-electrons.com/ident?i=intel_alloc_coherent>  this seems easy 
to do. Looking at the arm arch variants, e.g.,

http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194

and

http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44

i'm not sure if it is that easily done, as there aren't any fallbacks 
for such a case and the code looks to me as if that's at least somewhat 
intentional.

As far as TTM goes, one quick one-line fix to prevent it from using the 
CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above 
methods) would be to clear the __GFP_WAIT 
<http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the passed 
gfp_t flags. That would trigger the well working fallback. So, is

__GFP_WAIT  <http://lxr.free-electrons.com/ident?i=__GFP_WAIT>  needed for 
those single page allocations that go through__ttm_dma_alloc_page  
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>?

It would be nice to have such a simple, non-intrusive one-line patch 
that we still could get into 3.17 and then backported to older stable 
kernels to avoid the same desktop hangs there if CMA is enabled. It 
would be also nice for actual users of CMA to not use up lots of CMA 
space for gpu's which don't need it. I think DMA_CMA was introduced 
around 3.12.


The other problem is that probably TTM does not reuse pages from the DMA 
pool. If i trace the __ttm_dma_alloc_page 
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and 
__ttm_dma_free_page 
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for 
those single page allocs/frees, then over a 20 second interval of 
tracing and switching tabs in firefox, scrolling things around etc. i 
find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 
frees.

This bit of code fromttm_dma_unpopulate 
<http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate>()  (line 954 
in 3.16) looks suspicious:

http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954

Alloc's from a tt_cached cached pool ( if (is_cached)...) always get 
freed and are not given back to the cached pool. But in the uncached 
case, there's logic to make sure the pool doesn't grow forever (line 
955, checking against _manager->options.max_size), but before that check 
in line 954 there's an uncoditional assignment of npages = count; which 
seems to force freeing all pages as well, instead of recycling? Is this 
some debug code left over, or intentional and just me not understanding 
what happens there?

thanks,
-mario


>
>>> /Thomas
>>>
>>>
>>> On 08/08/2014 07:42 PM, Mario Kleiner wrote:
>>>> Hi all,
>>>>
>>>> there is a rather severe performance problem i accidentally found
>>> when
>>>> trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under
>>>> Ubuntu 14.04 LTS with nouveau as graphics driver.
>>>>
>>>> I was lazy and just installed the Ubuntu precompiled mainline kernel.
>>>> That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA
>>>> (contiguous memory allocator) size of 64 MB. Older Ubuntu kernels
>>>> weren't compiled with CMA, so i only observed this on 3.16, but
>>>> previous kernels would likely be affected too.
>&g

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-08 Thread Mario Kleiner
Hi all,

there is a rather severe performance problem i accidentally found when 
trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under 
Ubuntu 14.04 LTS with nouveau as graphics driver.

I was lazy and just installed the Ubuntu precompiled mainline kernel. 
That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA 
(contiguous memory allocator) size of 64 MB. Older Ubuntu kernels 
weren't compiled with CMA, so i only observed this on 3.16, but previous 
kernels would likely be affected too.

After a few minutes of regular desktop use like switching workspaces, 
scrolling text in a terminal window, Firefox with multiple tabs open, 
Thunderbird etc. (tested with KDE/Kwin, with/without desktop 
composition), i get chunky desktop updates, then multi-second freezes, 
after a few minutes the desktop hangs for over a minute on almost any 
GUI action like switching windows etc. --> Unuseable.

ftrace'ing shows the culprit being this callchain (typical good/bad 
example ftrace snippets at the end of this mail):

...ttm dma coherent memory allocations, e.g., from 
__ttm_dma_alloc_page() ... --> dma_alloc_coherent() --> platform 
specific hooks ... -> dma_generic_alloc_coherent() [on x86_64] --> 
dma_alloc_from_contiguous()

dma_alloc_from_contiguous() is a no-op without CONFIG_DMA_CMA, or when 
the machine is booted with kernel boot cmdline parameter "cma=0", so it 
triggers the fast alloc_pages_node() fallback at least on x86_64.

With CMA, this function becomes progressively more slow with every 
minute of desktop use, e.g., runtimes going up from < 0.3 usecs to 
hundreds or thousands of microseconds (before it gives up and 
alloc_pages_node() fallback is used), so this causes the 
multi-second/minute hangs of the desktop.

So it seems ttm memory allocations quickly fragment and/or exhaust the 
CMA memory area, and dma_alloc_from_contiguous() tries very hard to find 
a fitting hole big enough to satisfy allocations with a retry loop (see 
http://lxr.free-electrons.com/source/drivers/base/dma-contiguous.c#L339) 
that takes forever.

This is not good, also not for other devices which actually need a 
non-fragmented CMA for DMA, so what to do? I doubt most current gpus 
still need physically contiguous dma memory, maybe with exception of 
some embedded gpus?

My naive approach would be to add a new gfp_t flag a la ___GFP_AVOIDCMA, 
and make callers of dma_alloc_from_contiguous() refrain from doing so if 
they have some fallback for getting memory. And then add that flag to 
ttm's ttm_dma_populate() gfp_flags, e.g., around here: 
http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L884

However i'm not familiar enough with memory management, so likely 
greater minds here have much better ideas on how to deal with this?

thanks,
-mario

Typical snippet from an example trace of a badly stalling desktop with 
CMA (alloc_pages_node() fallback may have been missing in this traces 
ftrace_filter settings):

1)   |  ttm_dma_pool_get_pages [ttm]() {
  1)   | ttm_dma_page_pool_fill_locked [ttm]() {
  1)   | ttm_dma_pool_alloc_new_pages [ttm]() {
  1)   | __ttm_dma_alloc_page [ttm]() {
  1)   | dma_generic_alloc_coherent() {
  1) ! 1873.071 us | dma_alloc_from_contiguous();
  1) ! 1874.292 us |  }
  1) ! 1875.400 us |}
  1)   | __ttm_dma_alloc_page [ttm]() {
  1)   | dma_generic_alloc_coherent() {
  1) ! 1868.372 us | dma_alloc_from_contiguous();
  1) ! 1869.586 us |  }
  1) ! 1870.053 us |}
  1)   | __ttm_dma_alloc_page [ttm]() {
  1)   | dma_generic_alloc_coherent() {
  1) ! 1871.085 us | dma_alloc_from_contiguous();
  1) ! 1872.240 us |  }
  1) ! 1872.669 us |}
  1)   | __ttm_dma_alloc_page [ttm]() {
  1)   | dma_generic_alloc_coherent() {
  1) ! 1888.934 us | dma_alloc_from_contiguous();
  1) ! 1890.179 us |  }
  1) ! 1890.608 us |}
  1)   0.048 us| ttm_set_pages_caching [ttm]();
  1) ! 7511.000 us |  }
  1) ! 7511.306 us |}
  1) ! 7511.623 us |  }

The good case (with cma=0 kernel cmdline, so dma_alloc_from_contiguous() 
no-ops,)

0)   |  ttm_dma_pool_get_pages [ttm]() {
  0)   | ttm_dma_page_pool_fill_locked [ttm]() {
  0)   | ttm_dma_pool_alloc_new_pages [ttm]() {
  0)   | __ttm_dma_alloc_page [ttm]() {
  0)   | dma_generic_alloc_coherent() {
  0)   0.171 us| dma_alloc_from_contiguous();
  0)   0.849 us| __alloc_pages_nodemask();
  0)   3.029 us|  }
  0)   

[PATCH 3/3] drm: Use vblank_disable_and_save in drm_vblank_cleanup()

2014-08-07 Thread Mario Kleiner
On 08/07/2014 08:50 AM, Daniel Vetter wrote:
> On Thu, Aug 7, 2014 at 2:50 AM, Mario Kleiner
>  wrote:
>> I'm not sure about all the new embedded drivers, if they have hw vblank
>> counters?
> Quick grep says a lot don't have it or at least not implemented - they
> use drm_vblank_count. Thinking about this, should we use that as a
> signal to also set dev->vblnka_disable_allowed = false in
> drm_vblank_init?
> -Daniel

dev->vblank_disable_allowed = false; is already the default set in 
drm_vblank_init().
I think it was the idea of that flag that drivers which support a somewhat 
correct vblank dis/enable (= have useable hw vblank counters) opt-in to the 
vblank disable after some idle time by setting it to true.

The strange thing is that unconditional dev->vblank_disable_allowed = 
true in drm_vblank_post_modeset()?
It's there since the first introduction of the flag.

The i915, gma500, armada and exynos drivers explicitely set the flag to 
true to opt-in to the auto vblank disable. radeon gets it implicitely 
set by calling the vblank_post_modeset function. Tegra also gets it via 
the post_modeset, and nouveau on old cards with nv04 display engine.

armada, exynos and tegra don't have proper hw vblank counter queries, 
but opt-in, so those will lose vblank counts whenever vblank irqs get 
turned off.

It's a bit all over the place.
-mario



[PATCH 3/3] drm: Use vblank_disable_and_save in drm_vblank_cleanup()

2014-08-07 Thread Mario Kleiner
On 08/06/2014 03:57 PM, Daniel Vetter wrote:
> On Wed, Aug 06, 2014 at 01:51:41PM +0300, Ville Syrj?l? wrote:
>> On Wed, Aug 06, 2014 at 03:22:46AM +0200, Mario Kleiner wrote:
>>> Calling vblank_disable_fn() will cause that function to no-op
>>> if !dev->vblank_disable_allowed for some kms drivers, e.g.,
>>> on nouveau-kms. This can cause the gpu vblank irq's to not get
>>> disabled before freeing the dev->vblank array, so if a
>>> vblank irq fires and calls into drm_handle_vblank() after
>>> drm_vblank_cleanup() completes, it will cause use-after-free
>>> access to dev->vblank array.
>>>
>>> Call vblank_disable_and_save unconditionally, so vblank irqs
>>> are guaranteed to be off, before we delete the data structures
>>> on which they operate.
>>>
>>> Signed-off-by: Mario Kleiner 
>>> Cc: stable at vger.kernel.org
> Imo cc: stable isn't justified for these patches which fix stuff that
> normal users don't really see (driver load failure and module reload for
> kms drivers never tends to happen for normal users).
>
> So I've dropped that and pulled the 2 patches Ville reviewd into my
> topic/vblank-rework branch for 3.18.
>
> Thanks, Daniel

Ok, good with me, thanks. Ville, thanks for the review. I'll review and 
test your vblank series next week when i have access to suitable 
machines and enough time. I need to go through this in single-step mode, 
vblank on/off changes always make me nervous, given how dependent my 
main application is on this for its timing, so i want to move through it 
in slow motion.

Btw. wrt to nouveau "No idea what games nouveau is playign with that 
flag, but this patch should be fine at least for drivers that don't do 
such things." (Villes comment).

Nouveau currently doesn't support hw vblank counter queries at all. The 
dev->driver->get_vblank_count() is just hooked up to drm_vblank_count(), 
so it's a no-op. Therefore nouveau can't allow disabling of vblank irq 
during "normal" operation as it would lose all vblank counts during the 
off period. That's why it leaves dev->vblank_disable_allowed = false;

Pre NV-50 apparently doesn't have any hw vblank counter register, but 
NV50+ seems to have one. I'll probably give implementing this a try for 
3.18 if nobody else does.

I'm not sure about all the new embedded drivers, if they have hw vblank 
counters?

thanks,
-mario

>
>> No idea what games nouveau is playign with that flag, but this patch
>> should be fine at least for drivers that don't do such things.
>>
>> Reviewed-by: Ville Syrj?l? 
>>
>>> ---
>>>   drivers/gpu/drm/drm_irq.c | 5 -
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index 89e91e3..22e2bba9 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -164,6 +164,7 @@ static void vblank_disable_fn(unsigned long arg)
>>>   void drm_vblank_cleanup(struct drm_device *dev)
>>>   {
>>> int crtc;
>>> +   unsigned long irqflags;
>>>   
>>> /* Bail if the driver didn't call drm_vblank_init() */
>>> if (dev->num_crtcs == 0)
>>> @@ -171,7 +172,9 @@ void drm_vblank_cleanup(struct drm_device *dev)
>>>   
>>> for (crtc = 0; crtc < dev->num_crtcs; crtc++) {
>>> del_timer_sync(>vblank[crtc].disable_timer);
>>> -   vblank_disable_fn((unsigned long)>vblank[crtc]);
>>> +   spin_lock_irqsave(>vbl_lock, irqflags);
>>> +   vblank_disable_and_save(dev, crtc);
>>> +   spin_unlock_irqrestore(>vbl_lock, irqflags);
>>> }
>>>   
>>> kfree(dev->vblank);
>>> -- 
>>> 1.9.1
>>>
>>> ___
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>> -- 
>> Ville Syrj?l?
>> Intel OTC
>> ___
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel



[PATCH 3/3] drm: Use vblank_disable_and_save in drm_vblank_cleanup()

2014-08-06 Thread Mario Kleiner
Calling vblank_disable_fn() will cause that function to no-op
if !dev->vblank_disable_allowed for some kms drivers, e.g.,
on nouveau-kms. This can cause the gpu vblank irq's to not get
disabled before freeing the dev->vblank array, so if a
vblank irq fires and calls into drm_handle_vblank() after
drm_vblank_cleanup() completes, it will cause use-after-free
access to dev->vblank array.

Call vblank_disable_and_save unconditionally, so vblank irqs
are guaranteed to be off, before we delete the data structures
on which they operate.

Signed-off-by: Mario Kleiner 
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/drm_irq.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 89e91e3..22e2bba9 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -164,6 +164,7 @@ static void vblank_disable_fn(unsigned long arg)
 void drm_vblank_cleanup(struct drm_device *dev)
 {
int crtc;
+   unsigned long irqflags;

/* Bail if the driver didn't call drm_vblank_init() */
if (dev->num_crtcs == 0)
@@ -171,7 +172,9 @@ void drm_vblank_cleanup(struct drm_device *dev)

for (crtc = 0; crtc < dev->num_crtcs; crtc++) {
del_timer_sync(>vblank[crtc].disable_timer);
-   vblank_disable_fn((unsigned long)>vblank[crtc]);
+   spin_lock_irqsave(>vbl_lock, irqflags);
+   vblank_disable_and_save(dev, crtc);
+   spin_unlock_irqrestore(>vbl_lock, irqflags);
}

kfree(dev->vblank);
-- 
1.9.1



[PATCH 2/3] drm: Fix emitted vblank timestamps in drm_vblank_off()

2014-08-06 Thread Mario Kleiner
Move the query for vblank count and time before the
vblank_disable_and_save(), because the disable fn
will invalidate the vblank timestamps, so all emitted
events would carry an invalid zero timestamp instead of
the timestamp of the vblank of vblank disable. This could
confuse clients.

Signed-off-by: Mario Kleiner 
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/drm_irq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 553a58c..89e91e3 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1019,13 +1019,14 @@ void drm_vblank_off(struct drm_device *dev, int crtc)
unsigned long irqflags;
unsigned int seq;

+   /* Get 'now' vblank ts before it gets cleared by vblank disable */
+   seq = drm_vblank_count_and_time(dev, crtc, );
+
spin_lock_irqsave(>vbl_lock, irqflags);
vblank_disable_and_save(dev, crtc);
wake_up(>vblank[crtc].queue);

/* Send any queued vblank events, lest the natives grow disquiet */
-   seq = drm_vblank_count_and_time(dev, crtc, );
-
spin_lock(>event_lock);
list_for_each_entry_safe(e, t, >vblank_event_list, base.link) {
if (e->pipe != crtc)
-- 
1.9.1



[PATCH 1/3] drm: Remove drm_vblank_cleanup from drm_vblank_init error path.

2014-08-06 Thread Mario Kleiner
drm_vblank_cleanup() would operate on non-existent dev->vblank
data structure, as failure to allocate that data structure is
what triggers the error path in the first place.

Signed-off-by: Mario Kleiner 
Cc: stable at vger.kernel.org
---
 drivers/gpu/drm/drm_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 0de123a..553a58c 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -224,7 +224,7 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs)
return 0;

 err:
-   drm_vblank_cleanup(dev);
+   dev->num_crtcs = 0;
return ret;
 }
 EXPORT_SYMBOL(drm_vblank_init);
-- 
1.9.1



drm vblank fixes

2014-08-06 Thread Mario Kleiner
Hi all,

some small bug fixes for some small bugs i saw
when looking at the current drm vblank handling code.

All patches are rather simple, all compile-tested against
drm-next, but only the drm_vblank_off() one (patch 2)
tested in "real life" so far.

thanks,
-mario



[PATCH 13/14] drm/radeon: Move the early vblank IRQ fixup to radeon_get_crtc_scanoutpos()

2014-01-13 Thread Mario Kleiner
On 29/10/13 19:06, ville.syrjala at linux.intel.com wrote:
> From: Ville Syrj?l? 
>
> i915 doesn't need this kludge for most platforms. Although we do
> appear to need something similar on certain platforms, but we can
> be more accurate when we apply the adjustment since we know exactly
> why the scanline counter doesn't always quite match the vblank
> status.
>
> Also the current code doesn't handle interlaced modes correctly,
> and we already deal with interlaced modes in i915 code.
>
> So let's just move the current code to radeon_get_crtc_scanoutpos()
> since that's why it was added. For i915 we'll add a more finely
> targeted variant.
>

The logic itself looks correct and should work, although i couldn't test 
it because of the dying PC.

But see below for some bugfix and some little nit-pick.

Other than that

Reviewed-by: mario.kleiner.de at gmail.com


> Signed-off-by: Ville Syrj?l? 
> ---
>   drivers/gpu/drm/drm_irq.c   | 25 ++---
>   drivers/gpu/drm/radeon/radeon_display.c | 22 ++
>   2 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index b39255f..a1cc1a3 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -542,7 +542,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
> drm_device *dev, int crtc,
>   {
>   ktime_t stime, etime, mono_time_offset;
>   struct timeval tv_etime;
> - int vbl_status, vtotal, vdisplay;
> + int vbl_status;
>   int vpos, hpos, i;
>   int framedur_ns, linedur_ns, pixeldur_ns, delta_ns, duration_ns;
>   bool invbl;
> @@ -558,9 +558,6 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
> drm_device *dev, int crtc,
>   return -EIO;
>   }
>
> - vtotal = mode->crtc_vtotal;
> - vdisplay = mode->crtc_vdisplay;
> -
>   /* Durations of frames, lines, pixels in nanoseconds. */
>   framedur_ns = refcrtc->framedur_ns;
>   linedur_ns  = refcrtc->linedur_ns;
> @@ -569,7 +566,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
> drm_device *dev, int crtc,
>   /* If mode timing undefined, just return as no-op:
>* Happens during initial modesetting of a crtc.
>*/
> - if (vtotal <= 0 || vdisplay <= 0 || framedur_ns == 0) {
> + if (framedur_ns == 0) {
>   DRM_DEBUG("crtc %d: Noop due to uninitialized mode.\n", crtc);
>   return -EAGAIN;
>   }
> @@ -631,24 +628,6 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
> drm_device *dev, int crtc,
>*/
>   delta_ns = vpos * linedur_ns + hpos * pixeldur_ns;
>
> - /* Is vpos outside nominal vblank area, but less than
> -  * 1/100 of a frame height away from start of vblank?
> -  * If so, assume this isn't a massively delayed vblank
> -  * interrupt, but a vblank interrupt that fired a few
> -  * microseconds before true start of vblank. Compensate
> -  * by adding a full frame duration to the final timestamp.
> -  * Happens, e.g., on ATI R500, R600.
> -  *
> -  * We only do this if DRM_CALLED_FROM_VBLIRQ.
> -  */
> - if ((flags & DRM_CALLED_FROM_VBLIRQ) && !invbl &&
> - ((vdisplay - vpos) < vtotal / 100)) {
> - delta_ns = delta_ns - framedur_ns;
> -
> - /* Signal this correction as "applied". */
> - vbl_status |= 0x8;
> - }
> -
>   if (!drm_timestamp_monotonic)
>   etime = ktime_sub(etime, mono_time_offset);
>
> diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
> b/drivers/gpu/drm/radeon/radeon_display.c
> index 3581570..9d02fa7 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -1709,5 +1709,27 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, 
> int crtc, unsigned int fl
>   if (in_vbl)
>   ret |= DRM_SCANOUTPOS_INVBL;
>
> + /* Is vpos outside nominal vblank area, but less than
> +  * 1/100 of a frame height away from start of vblank?
> +  * If so, assume this isn't a massively delayed vblank
> +  * interrupt, but a vblank interrupt that fired a few
> +  * microseconds before true start of vblank. Compensate
> +  * by adding a full frame duration to the final timestamp.
> +  * Happens, e.g., on ATI R500, R600.
> +  *
> +  * We only do this if DRM_CALLED_FROM_VBLIRQ.
> +  */
> + if ((flags & DRM_CALLED_FROM_VBLIRQ) && !in_vbl) {
> + vbl_start = 
> rdev->mode_info.crtcs[crtc]->base.hwmode.crtc_vdisplay;

vbl_start gets already initialized by the code above, so the vbl_start 
assignment here shouldn't be neccessary. Only the vtotal assignment 
below is really needed.

> + vtotal = rdev->mode_info.crtcs[crtc]->base.hwmode.crtc_vtotal;
> +
> + if (vbl_start - *vpos < vtotal / 100) {
> + vpos -= vtotal;

Here vpos is an int*, so the following line will corrupt kernel memory 

[PATCH 12/14] drm: Pass 'flags' from the caller to .get_scanout_position()

2014-01-13 Thread Mario Kleiner
On 29/10/13 19:06, ville.syrjala at linux.intel.com wrote:
> From: Ville Syrj?l? 
>
> Preparation for moving the early vblank IRQ logic into
> radeon_get_crtc_scanoutpos().
>
> Signed-off-by: Ville Syrj?l? 

Tiny compile fix needed for this one. The function prototype for 
radeon_get_crtc_scanoutpos() is also defined in radeon_drv.c, so it 
needs the same update as the one in radeon_mode.h

Other than that

Reviewed-by: mario.kleiner.de at gmail.com

-mario


> ---
>   drivers/gpu/drm/drm_irq.c   | 2 +-
>   drivers/gpu/drm/i915/i915_irq.c | 3 ++-
>   drivers/gpu/drm/radeon/radeon_display.c | 7 ---
>   drivers/gpu/drm/radeon/radeon_mode.h| 1 +
>   drivers/gpu/drm/radeon/radeon_pm.c  | 2 +-
>   include/drm/drmP.h  | 2 ++
>   6 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index b5c4d42..b39255f 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -585,7 +585,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
> drm_device *dev, int crtc,
>   /* Get vertical and horizontal scanout position vpos, hpos,
>* and bounding timestamps stime, etime, pre/post query.
>*/
> - vbl_status = dev->driver->get_scanout_position(dev, crtc, ,
> + vbl_status = dev->driver->get_scanout_position(dev, crtc, 
> flags, ,
>  , , 
> );
>
>   /* Get correction for CLOCK_MONOTONIC -> CLOCK_REALTIME if
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f6b3206..70daf3c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -657,7 +657,8 @@ static bool intel_pipe_in_vblank_locked(struct drm_device 
> *dev, enum pipe pipe)
>   }
>
>   static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe,
> -  int *vpos, int *hpos, ktime_t *stime, ktime_t 
> *etime)
> + unsigned int flags, int *vpos, int *hpos,
> + ktime_t *stime, ktime_t *etime)
>   {
>   struct drm_i915_private *dev_priv = dev->dev_private;
>   struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
> diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
> b/drivers/gpu/drm/radeon/radeon_display.c
> index ccd8751..3581570 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -305,7 +305,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, 
> int crtc_id)
>* to complete in this vblank?
>*/
>   if (update_pending &&
> - (DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, 
> crtc_id,
> + (DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, 
> crtc_id, 0,
>  , , 
> NULL, NULL)) &&
>   ((vpos >= (99 * 
> rdev->mode_info.crtcs[crtc_id]->base.hwmode.crtc_vdisplay)/100) ||
>(vpos < 0 && !ASIC_IS_AVIVO(rdev {
> @@ -1544,6 +1544,7 @@ bool radeon_crtc_scaling_mode_fixup(struct drm_crtc 
> *crtc,
>*
>* \param dev Device to query.
>* \param crtc Crtc to query.
> + * \param flags Flags from caller (DRM_CALLED_FROM_VBLIRQ or 0).
>* \param *vpos Location where vertical scanout position should be stored.
>* \param *hpos Location where horizontal scanout position should go.
>* \param *stime Target location for timestamp taken immediately before
> @@ -1565,8 +1566,8 @@ bool radeon_crtc_scaling_mode_fixup(struct drm_crtc 
> *crtc,
>* unknown small number of scanlines wrt. real scanout position.
>*
>*/
> -int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, 
> int *hpos,
> -ktime_t *stime, ktime_t *etime)
> +int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, unsigned 
> int flags,
> +int *vpos, int *hpos, ktime_t *stime, ktime_t 
> *etime)
>   {
>   u32 stat_crtc = 0, vbl = 0, position = 0;
>   int vbl_start, vbl_end, vtotal, ret = 0;
> diff --git a/drivers/gpu/drm/radeon/radeon_mode.h 
> b/drivers/gpu/drm/radeon/radeon_mode.h
> index 3bfa910..c4016dc 100644
> --- a/drivers/gpu/drm/radeon/radeon_mode.h
> +++ b/drivers/gpu/drm/radeon/radeon_mode.h
> @@ -758,6 +758,7 @@ extern int radeon_crtc_cursor_move(struct drm_crtc *crtc,
>  int x, int y);
>
>   extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc,
> +   unsigned int flags,
> int *vpos, int *hpos, ktime_t *stime,
> ktime_t *etime);
>
> diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
> b/drivers/gpu/drm/radeon/radeon_pm.c
> index 98bf63b..a394049 100644
> --- 

[PATCH 00/14] drm: Some more vblank timestampi changes

2014-01-13 Thread Mario Kleiner
On 29/10/13 19:06, ville.syrjala at linux.intel.com wrote:
 > So I took another look at the vblank timestamping code, and got a bit
 > excited. The result is this patchset.
 >
 > Summary of changes:
 > - kill crtc->hwmode dependency
 > - eliminate a bunch of 64bit math
 > - fix timestamps for stereo and interlaced modes (on i915 at least)
 > - move the "early vbl irq" hack into radeon code
 > - add a similar hack to i915, but make it as finely targeted
 >as possibly to minimize the chance of accidentally
 >applying it in the wrong place
 >
 > The s/clock/crtc_clock change could use some radeon people to verify
 > whether changing radeon_atom_get_tv_timings() is enough to make
 > crtc_clock always populated.
 >
 > This series applies on top of Mario's
 > "Vblank timestamping improvements/fixes for Linux drm." series.
 >
 > Ville Syrj?l? (14):
 >drm: Pass the display mode to drm_calc_timestamping_constants()
 >drm: Pass the display mode to 
drm_calc_vbltimestamp_from_scanoutpos()
 >drm/i915: Kill hwmode save/restore
 >drm/i915: Call drm_calc_timestamping_constants() earlier
 >drm: Improve drm_calc_timestamping_constants() documentation
 >drm: Simplify the math in drm_calc_timestamping_constants()
 >drm/radeon: Populate crtc_clock in radeon_atom_get_tv_timings()
 >drm: Use crtc_clock in drm_calc_timestamping_constants()
 >drm: Change {pixel,line,frame}dur_ns from s64 to int
 >drm/i915: Fix scanoutpos calculations for interlaced modes
 >drm: Fix vblank timestamping constants for interlaced modes
 >drm: Pass 'flags' from the caller to .get_scanout_position()
 >drm/radeon: Move the early vblank IRQ fixup to 
radeon_get_crtc_scanoutpos()
 >drm/i915: Add a kludge for DSL incrementing too late and ISR 
not working
 >

Hi Ville,

sorry this took way longer than expected. I've reviewed all of your 
patches. Nice cleanups, nice improvements!

You can add a ...

Reviewed-by: mario.kleiner.de at gmail.com

... to all of them.

Patches 0 - 11 and 14 are fine as they are. Only tiny formatting/comment 
fixes needed so they apply cleanly against the current drm-next.

Patch 12 and 13 need some small fixes, after applying those i'm fine 
with them. I'll send separate e-mails for those.

As far as testing goes, i had more encounters with Murphy's law in the 
last weeks than ever before, hence the long delay. You can add

Tested-by: mario.kleiner.de at gmail.com

to the drm core and intel patches with the following restrictions:

I was able to "sort of" test the patchset on Intel GMA-950 (Gen-3 hw).

- I didn't test if your interlaced scanout patches 10 and 11 work as 
expected, because i was testing the patches first, then reviewing them, 
so i didn't realize at that point testing interlaced mode would be 
neccessary. The patches look correct to me though. I no longer have easy 
access to that machine.

- My photodiode test equipment, which i need for Intel testing 
malfunctioned. Not sure if my testing hardware is dying, or if it is a 
bug in the kernels usb or serial/tty stack, or some kernel 
misconfiguration wrt. low-latency, but there was so much timing noise in 
my equipment that i couldn't test with it.

- As a workaround I ran the kms-timestamping for regular non-interlaced 
mode against the original userspace implementation of the same code in 
my own toolkit Psychtoolbox, which itself was verified with testing 
equipment to do the right thing on that GMA-950 netbook earlier this 
year. Difference was less than 40 microseconds and more likely caused 
due to userspace noisyness and off-by-one errors in Psychtoolbox than 
your code, so i assume that your code is essentially correct at least 
for non-interlaced scanout, and that the DRM core changes are therefore 
also correct. If you or somebody would want to try this test yourself i 
can guide you through the steps. Psychtoolbox is easily apt-get'able for 
Debian and at least Ubuntu.

- The next limitation of my testing is wrt. to your "early vbl irq 
handling" improvements (patch 14). I currently only have Gen3 hardware 
which doesn't exercise those code path at all, so while the patch looks 
correct, it's not really tested by me.

As far as Radeon testing goes, i can't test it at all atm. After already 
not working very stable at all for the last half year, my last machine 
with an AMD card died during bootup for this test, but not without 
trying to corrupt the filesystem on my development drive as a little 
post-christmas gift to me. If somebody has a AMD card and wants to test 
this, it could be tested against the Psychtoolbox userspace reference 
implementation, which was verified with very precise external hardware 
last time a couple of months ago. However, patch 13 needs some fixes or 
it would crash. The now dead PC wasn't mine, but i still have the AMD card.

I will try to hunt for a new PC soon, and hopefully will get your 
patches 

[PATCH 00/14] drm: Some more vblank timestampi changes

2013-11-30 Thread Mario Kleiner
On 29/11/13 14:36, Ville Syrj?l? wrote:
> On Wed, Nov 06, 2013 at 01:46:41PM +1000, Dave Airlie wrote:
>> On Wed, Oct 30, 2013 at 4:06 AM,   wrote:
>>> So I took another look at the vblank timestamping code, and got a bit
>>> excited. The result is this patchset.
>>
>> I'd like to merge this, I was hoping Mario could ack it at least as it
>> seems mostly sane to my eyes.
>
> So we missed that boat, but maybe we'll get the next one...
>
> Pinging Mario. Any chance you can take a look at this stuff at some
> point?
>

I will, including testing. Hopefully within the coming week, but 
definitely safely before christmas.

> Hmm. Do I have the wrong email addres for Mario? Adding the other one
> too just to make sure...
>

Both work, but the tuebingen.mpg.de one will probably soon turn into a 
pure forward to the gmail one.

-mario


[PATCH 4/4] drm/intel: Push get_scanout_position() timestamping into kms driver.

2013-10-30 Thread Mario Kleiner
Move the ktime_get() clock readouts and potential preempt_disable()
calls from drm core into kms driver to make it compatible with the
api changes in the drm core.

The intel-kms driver needs to take the uncore.lock inside
i915_get_crtc_scanoutpos() and intel_pipe_in_vblank().
This is incompatible with the preempt_disable() on a
PREEMPT_RT patched kernel, as regular spin locks must not
be taken within a preempt_disable'd section. Lock contention
on the uncore.lock also introduced too much uncertainty in vblank
timestamps.

Push the ktime_get() timestamping for scanoutpos queries and
potential preempt_disable_rt() into i915_get_crtc_scanoutpos(),
so these problems can be avoided:

1. First lock the uncore.lock (might sleep on a PREEMPT_RT kernel).
2. preempt_disable_rt() (will be added by the rt-linux folks).
3. ktime_get() a timestamp before scanout pos query.
4. Do all mmio reads as fast as possible without grabbing any new locks!
5. ktime_get() a post-query timestamp.
6. preempt_enable_rt()
7. Unlock the uncore.lock.

This reduces timestamp uncertainty on a low-end HP Atom Mini netbook
with Intel GMA-950 nicely:

Before: 3-8 usecs with spikes > 20 usecs, triggering query retries.
After : Typically 1 usec (98% of all samples), occassionally 2 usecs
(2% of all samples), with maximum of 3 usecs (a handful).

v2: Fix formatting of new multi-line code comments.

Signed-off-by: Mario Kleiner 
Reviewed-by: Ville Syrj?l? 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/i915/i915_irq.c |   54 +++
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 156a1a4..7cafe64 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -599,35 +599,40 @@ static u32 gm45_get_vblank_counter(struct drm_device 
*dev, int pipe)
return I915_READ(reg);
 }

-static bool intel_pipe_in_vblank(struct drm_device *dev, enum pipe pipe)
+/* raw reads, only for fast reads of display block, no need for forcewake etc. 
*/
+#define __raw_i915_read32(dev_priv__, reg__) readl((dev_priv__)->regs + 
(reg__))
+#define __raw_i915_read16(dev_priv__, reg__) readw((dev_priv__)->regs + 
(reg__))
+
+static bool intel_pipe_in_vblank_locked(struct drm_device *dev, enum pipe pipe)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
uint32_t status;
+   int reg;

if (IS_VALLEYVIEW(dev)) {
status = pipe == PIPE_A ?
I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT :
I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;

-   return I915_READ(VLV_ISR) & status;
+   reg = VLV_ISR;
} else if (IS_GEN2(dev)) {
status = pipe == PIPE_A ?
I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT :
I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;

-   return I915_READ16(ISR) & status;
+   reg = ISR;
} else if (INTEL_INFO(dev)->gen < 5) {
status = pipe == PIPE_A ?
I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT :
I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;

-   return I915_READ(ISR) & status;
+   reg = ISR;
} else if (INTEL_INFO(dev)->gen < 7) {
status = pipe == PIPE_A ?
DE_PIPEA_VBLANK :
DE_PIPEB_VBLANK;

-   return I915_READ(DEISR) & status;
+   reg = DEISR;
} else {
switch (pipe) {
default:
@@ -642,12 +647,17 @@ static bool intel_pipe_in_vblank(struct drm_device *dev, 
enum pipe pipe)
break;
}

-   return I915_READ(DEISR) & status;
+   reg = DEISR;
}
+
+   if (IS_GEN2(dev))
+   return __raw_i915_read16(dev_priv, reg) & status;
+   else
+   return __raw_i915_read32(dev_priv, reg) & status;
 }

 static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe,
-int *vpos, int *hpos)
+int *vpos, int *hpos, ktime_t *stime, ktime_t 
*etime)
 {
struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
@@ -657,6 +667,7 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, 
int pipe,
int vbl_start, vbl_end, htotal, vtotal;
bool in_vbl = true;
int ret = 0;
+   unsigned long irqflags;

if (!intel_crtc->active) {
DRM_DEBUG_DRIVER("trying to get scanoutpos for disabled "
@@ -671,14 +682,27 @@ static int i915_get_crtc_scanoutpos(struct drm_device 
*dev, int pipe,

ret |= DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE;

+   /*
+* Lock uncore.lock, as we will do multiple tim

[PATCH 3/4] drm/radeon: Push get_scanout_position() timestamping into kms driver.

2013-10-30 Thread Mario Kleiner
Move the ktime_get() clock readouts and potential preempt_disable()
calls from drm core into kms driver to make it compatible with the
api changes in the drm core.

This should not introduce any change in functionality or behaviour
in radeon-kms, just a reshuffling of code.

Signed-off-by: Mario Kleiner 
Reviewed-by: Ville Syrj?l? 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/radeon/radeon_display.c |   24 +---
 drivers/gpu/drm/radeon/radeon_drv.c |3 ++-
 drivers/gpu/drm/radeon/radeon_mode.h|3 ++-
 drivers/gpu/drm/radeon/radeon_pm.c  |2 +-
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
b/drivers/gpu/drm/radeon/radeon_display.c
index 0d1aa05..ccd8751 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -306,7 +306,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, 
int crtc_id)
 */
if (update_pending &&
(DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, 
crtc_id,
-  , )) &&
+  , , 
NULL, NULL)) &&
((vpos >= (99 * 
rdev->mode_info.crtcs[crtc_id]->base.hwmode.crtc_vdisplay)/100) ||
 (vpos < 0 && !ASIC_IS_AVIVO(rdev {
/* crtc didn't flip in this target vblank interval,
@@ -1539,12 +1539,17 @@ bool radeon_crtc_scaling_mode_fixup(struct drm_crtc 
*crtc,
 }

 /*
- * Retrieve current video scanout position of crtc on a given gpu.
+ * Retrieve current video scanout position of crtc on a given gpu, and
+ * an optional accurate timestamp of when query happened.
  *
  * \param dev Device to query.
  * \param crtc Crtc to query.
  * \param *vpos Location where vertical scanout position should be stored.
  * \param *hpos Location where horizontal scanout position should go.
+ * \param *stime Target location for timestamp taken immediately before
+ *   scanout position query. Can be NULL to skip timestamp.
+ * \param *etime Target location for timestamp taken immediately after
+ *   scanout position query. Can be NULL to skip timestamp.
  *
  * Returns vpos as a positive number while in active scanout area.
  * Returns vpos as a negative number inside vblank, counting the number
@@ -1560,7 +1565,8 @@ bool radeon_crtc_scaling_mode_fixup(struct drm_crtc *crtc,
  * unknown small number of scanlines wrt. real scanout position.
  *
  */
-int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, 
int *hpos)
+int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, 
int *hpos,
+  ktime_t *stime, ktime_t *etime)
 {
u32 stat_crtc = 0, vbl = 0, position = 0;
int vbl_start, vbl_end, vtotal, ret = 0;
@@ -1568,6 +1574,12 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, 
int crtc, int *vpos, int

struct radeon_device *rdev = dev->dev_private;

+   /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+
+   /* Get optional system timestamp before query. */
+   if (stime)
+   *stime = ktime_get();
+
if (ASIC_IS_DCE4(rdev)) {
if (crtc == 0) {
vbl = RREG32(EVERGREEN_CRTC_V_BLANK_START_END +
@@ -1650,6 +1662,12 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, 
int crtc, int *vpos, int
}
}

+   /* Get optional system timestamp after query. */
+   if (etime)
+   *etime = ktime_get();
+
+   /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+
/* Decode into vertical and horizontal scanout position. */
*vpos = position & 0x1fff;
*hpos = (position >> 16) & 0x1fff;
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index 22f6858..101e7c0 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -106,7 +106,8 @@ int radeon_gem_object_open(struct drm_gem_object *obj,
 void radeon_gem_object_close(struct drm_gem_object *obj,
struct drm_file *file_priv);
 extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc,
- int *vpos, int *hpos);
+ int *vpos, int *hpos, ktime_t *stime,
+ ktime_t *etime);
 extern const struct drm_ioctl_desc radeon_ioctls_kms[];
 extern int radeon_max_kms_ioctl;
 int radeon_mmap(struct file *filp, struct vm_area_struct *vma);
diff --git a/drivers/gpu/drm/radeon/radeon_mode.h 
b/drivers/gpu/drm/radeon/radeon_mode.h
index ef63d3f..3bfa910 100644
--- a/drivers/gpu/drm/radeon/radeon_mode.h
+++ b/drivers/gpu/drm/radeon/radeon_mode.h
@@ -758,

[PATCH 2/4] drm: Push latency sensitive bits of vblank scanoutpos timestamping into kms drivers.

2013-10-30 Thread Mario Kleiner
A change in locking of some kms drivers (currently intel-kms) make
the old approach too inaccurate and also incompatible with the
PREEMPT_RT realtime kernel patchset.

The driver->get_scanout_position() method of intel-kms now needs
to aquire a spinlock, which clashes badly with the former
preempt_disable() calls in the drm, and it also introduces larger
delays and timing uncertainty on a contended lock than acceptable.

This patch changes the prototype of driver->get_scanout_position()
to require/allow kms drivers to perform the ktime_get() system time
queries which go along with actual scanout position readout in a way
that provides maximum precision and to return those timestamps to
the drm. kms drivers implementations of get_scanout_position() are
asked to implement timestamping and scanoutpos readout in a way
that is as precise as possible and compatible with preempt_disable()
on a PREMPT_RT kernel. A driver should follow this pattern in
get_scanout_position() for precision and compatibility:

spin_lock...(...);
preempt_disable_rt(); // On a PREEMPT_RT kernel, otherwise omit.
if (stime) *stime = ktime_get();
... Minimum amount of MMIO register reads to get scanout position ...
... no taking of locks allowed here! ...
if (etime) *etime = ktime_get();
preempt_enable_rt(); // On PREEMPT_RT kernel, otherwise omit.
spin_unlock...(...);

v2: Fix formatting of new multi-line code comments.

Signed-off-by: Mario Kleiner 
Reviewed-by: Ville Syrj?l? 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/drm_irq.c |   20 
 include/drm/drmP.h|   10 --
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 33ee515..d80d952 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -219,7 +219,7 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs)
for (i = 0; i < num_crtcs; i++)
init_waitqueue_head(>vblank[i].queue);

-   DRM_INFO("Supports vblank timestamp caching Rev 1 (10.10.2010).\n");
+   DRM_INFO("Supports vblank timestamp caching Rev 2 (21.10.2013).\n");

/* Driver specific high-precision vblank timestamping supported? */
if (dev->driver->get_vblank_timestamp)
@@ -586,14 +586,17 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
drm_device *dev, int crtc,
 * code gets preempted or delayed for some reason.
 */
for (i = 0; i < DRM_TIMESTAMP_MAXRETRIES; i++) {
-   /* Get system timestamp before query. */
-   stime = ktime_get();
-
-   /* Get vertical and horizontal scanout pos. vpos, hpos. */
-   vbl_status = dev->driver->get_scanout_position(dev, crtc, 
, );
+   /*
+* Get vertical and horizontal scanout position vpos, hpos,
+* and bounding timestamps stime, etime, pre/post query.
+*/
+   vbl_status = dev->driver->get_scanout_position(dev, crtc, ,
+  , , 
);

-   /* Get system timestamp after query. */
-   etime = ktime_get();
+   /*
+* Get correction for CLOCK_MONOTONIC -> CLOCK_REALTIME if
+* CLOCK_REALTIME is requested.
+*/
if (!drm_timestamp_monotonic)
mono_time_offset = ktime_get_monotonic_offset();

@@ -604,6 +607,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct drm_device 
*dev, int crtc,
return -EIO;
}

+   /* Compute uncertainty in timestamp of scanout position query. 
*/
duration_ns = ktime_to_ns(etime) - ktime_to_ns(stime);

/* Accept result with <  max_error nsecs timing uncertainty. */
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 2b954ad..48d15f0 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -835,12 +835,17 @@ struct drm_driver {
/**
 * Called by vblank timestamping code.
 *
-* Return the current display scanout position from a crtc.
+* Return the current display scanout position from a crtc, and an
+* optional accurate ktime_get timestamp of when position was measured.
 *
 * \param dev  DRM device.
 * \param crtc Id of the crtc to query.
 * \param *vpos Target location for current vertical scanout position.
 * \param *hpos Target location for current horizontal scanout position.
+* \param *stime Target location for timestamp taken immediately before
+*   scanout position query. Can be NULL to skip timestamp.
+* \param *etime Target location for timestamp taken immediately after
+*   scanout position query. Can be NULL to skip timestamp.
 *
 * Returns vpos as a positive n

[PATCH 1/4] drm: Remove preempt_disable() from vblank timestamping code.

2013-10-30 Thread Mario Kleiner
Preemption handling will get pushed into the kms
drivers in followup patches, to make timestamping
more robust and PREEMPT_RT friendly.

Signed-off-by: Mario Kleiner 
Reviewed-by: Ville Syrj?l? 
Reviewed-by: Alex Deucher 
---
 drivers/gpu/drm/drm_irq.c |7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index f9af048..33ee515 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -586,11 +586,6 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct 
drm_device *dev, int crtc,
 * code gets preempted or delayed for some reason.
 */
for (i = 0; i < DRM_TIMESTAMP_MAXRETRIES; i++) {
-   /* Disable preemption to make it very likely to
-* succeed in the first iteration even on PREEMPT_RT kernel.
-*/
-   preempt_disable();
-
/* Get system timestamp before query. */
stime = ktime_get();

@@ -602,8 +597,6 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct drm_device 
*dev, int crtc,
if (!drm_timestamp_monotonic)
mono_time_offset = ktime_get_monotonic_offset();

-   preempt_enable();
-
/* Return as no-op if scanout query unsupported or failed. */
if (!(vbl_status & DRM_SCANOUTPOS_VALID)) {
DRM_DEBUG("crtc %d : scanoutpos query failed [%d].\n",
-- 
1.7.10.4



Vblank timestamping improvements/fixes for Linux drm. [v2]

2013-10-30 Thread Mario Kleiner
Hi Dave,

this is v2 of the patch set for improving/restoring accuracy and
robustness of vblank timestamping and for fixing incompatibilities
with the PREEMPT_RT patches.

Could you please merge this for the next kernel? Would be good to have
the old accuracy restored as soon as possible. Thanks.

v2: Added the reviewed-by's of Ville and Alex, thanks for the review!
Fixed multi-line code formatting as suggested by Ville.

Successfully tested on Intel and AMD Radeon hardware.

thanks,
-mario



<    1   2   3   4   5   6   >