subject:"Funky new vblank counter regressions in Linux 4.4\-rc1"

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-27 Thread Ville Syrjälä

On Wed, Nov 25, 2015 at 02:38:04PM -0500, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
>  wrote:
> > On 11/25/2015 06:58 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>
> >> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> >>>
> >>> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
> 
>  On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >
> >
> >
> > On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>
> >> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >>>
> >>> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> 
>  On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>>
> >>>
> >>> ...
> >>> Ok, but why would that be a bad thing? I think we want it to think it
> >>> is
> >>> in the previous frame if it is called outside the vblank irq context.
> >>> The only reason we fudge it to the next frames vblank if i vblank irq
> >>> is
> >>> because we know the vblank irq handler we are executing atm. was
> >>> meant
> >>> to execute within the upcoming vblank for the next frame, so we fudge
> >>> the scanout positions and thereby timestamp to correspond to that new
> >>> frame. But if something called outside irq context it should get a
> >>> scanout position/timestamp that corresponds to "reality".
> >>
> >>
> >> It would be a bad thing since it would cause the timestamp to jump
> >> backwards, and that would also cause the frame count guesstimate to go
> >> backwards.
> >>
> >
> > But only if we don't use the dev->driver->get_vblank_counter() method,
> > which we try to use on AMD.
> 
> 
>  Well, if you do it that way then you have the problem of the hw counter
>  seeming to jump forward by one after crossing the start of vblank (when
>  compared to the value you sampled when you processed the early vblank
>  interrupt).
> 
> >>>
> >>> Ok, finally i see the bad scenario that wouldn't get prevented by our
> >>> current locking with the new vblank counting in the core. The vblank
> >>> enable path is safe due to locking and discounting of redundant
> >>> timestamps etc. But the disable path could go wrong:
> >>>
> >>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> >>> updates timestamps and counts "as if" in vblank -> incremented vblank
> >>> count and timestamp now set in the future.
> >>>
> >>> 2. After vblank irq finishes, but just before leading edge of vblank,
> >>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
> >>> count because before vblank and not in vblank irq. Now
> >>> drm_update_vblank_count() would process a
> >>> "new" timestamp and count from the past and we'd have time and counts
> >>> going backwards, and bad things would happen.
> >>>
> >>> I haven't observed such a thing happening during testing so far,
> >>> probably because the time window in which it could happen is tiny, but
> >>> given how awfully bad it would be, it needs to be prevented.
> >>>
> >>> I had a look at the description of the Vblank irq in the "M76 Register
> >>> Reference Guide" for older asics and the description suggests that the
> >>> vblank irq fires when the crtc's line buffer is finished reading pixel
> >>> data from the scanout buffer in memory for a frame, ie., when the line
> >>> buffer read "enters" vblank.
> >>
> >>
> >> Hmm. Does that mean there's always at least one fullscreen plane enabled
> >> in the hw? As in you can't turn off the primary plane or make it smaller
> >> than the active video area? Othwewise it sounds like you'd could either
> >> not get it at all, or get it somewhere in the middle of the screen.
> >>
> >
> > It says "Interrupt that can be programmed to be generated by the
> > primary display controller's line buffer logic either when the
> > source image line counter is not requesting any active
> > display data (i.e. in the vertical blank) or the output CRTC
> > timing generator is within the vertical blanking region."
> >
> > So my statements were my interpretation of this quote, so i can make some
> > sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
> > M76 reference refers to some older asics, i just assume it is the same for
> > the current ones, given that observed behaviour would be consistent with the
> > line buffer causing this lead of a couple of scanlines. I see about 2
> > scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
> > line buffer is, how quickly it refills etc., but it sounds reasonable.
> 
> The size of the line buffer varies by generation, but the LB logic is
> still responsible for generating the vblank interrupt even on newer
> hw.

So there's no actual vblank interrupt available from the timing
generator?

-- 
Ville SyrjÃ¤lÃ¤
Intel OTC

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-27 Thread Mario Kleiner

On 11/25/2015 08:38 PM, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
>  wrote:
>> On 11/25/2015 06:58 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>>
>>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:

 On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
>
> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>
>>
>>
>> On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>>
>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:

 On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
>
> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:


 ...
 Ok, but why would that be a bad thing? I think we want it to think it
 is
 in the previous frame if it is called outside the vblank irq context.
 The only reason we fudge it to the next frames vblank if i vblank irq
 is
 because we know the vblank irq handler we are executing atm. was
 meant
 to execute within the upcoming vblank for the next frame, so we fudge
 the scanout positions and thereby timestamp to correspond to that new
 frame. But if something called outside irq context it should get a
 scanout position/timestamp that corresponds to "reality".
>>>
>>>
>>> It would be a bad thing since it would cause the timestamp to jump
>>> backwards, and that would also cause the frame count guesstimate to go
>>> backwards.
>>>
>>
>> But only if we don't use the dev->driver->get_vblank_counter() method,
>> which we try to use on AMD.
>
>
> Well, if you do it that way then you have the problem of the hw counter
> seeming to jump forward by one after crossing the start of vblank (when
> compared to the value you sampled when you processed the early vblank
> interrupt).
>

 Ok, finally i see the bad scenario that wouldn't get prevented by our
 current locking with the new vblank counting in the core. The vblank
 enable path is safe due to locking and discounting of redundant
 timestamps etc. But the disable path could go wrong:

 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
 updates timestamps and counts "as if" in vblank -> incremented vblank
 count and timestamp now set in the future.

 2. After vblank irq finishes, but just before leading edge of vblank,
 vblank_disable_and_save() executes, doesn't get bumped timestamp or
 count because before vblank and not in vblank irq. Now
 drm_update_vblank_count() would process a
 "new" timestamp and count from the past and we'd have time and counts
 going backwards, and bad things would happen.

 I haven't observed such a thing happening during testing so far,
 probably because the time window in which it could happen is tiny, but
 given how awfully bad it would be, it needs to be prevented.

 I had a look at the description of the Vblank irq in the "M76 Register
 Reference Guide" for older asics and the description suggests that the
 vblank irq fires when the crtc's line buffer is finished reading pixel
 data from the scanout buffer in memory for a frame, ie., when the line
 buffer read "enters" vblank.
>>>
>>>
>>> Hmm. Does that mean there's always at least one fullscreen plane enabled
>>> in the hw? As in you can't turn off the primary plane or make it smaller
>>> than the active video area? Othwewise it sounds like you'd could either
>>> not get it at all, or get it somewhere in the middle of the screen.
>>>
>>
>> It says "Interrupt that can be programmed to be generated by the
>> primary display controller's line buffer logic either when the
>> source image line counter is not requesting any active
>> display data (i.e. in the vertical blank) or the output CRTC
>> timing generator is within the vertical blanking region."
>>
>> So my statements were my interpretation of this quote, so i can make some
>> sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
>> M76 reference refers to some older asics, i just assume it is the same for
>> the current ones, given that observed behaviour would be consistent with the
>> line buffer causing this lead of a couple of scanlines. I see about 2
>> scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
>> line buffer is, how quickly it refills etc., but it sounds reasonable.
>
> The size of the line buffer varies by generation, but the LB logic is
> still responsible for generating the vblank interrupt even on newer
> hw.
>
> Alex
>

Thanks for the pointer. I digged through the 
evergreen_line_buffer_adjust() function and its siblings from classic to 
DCE11. Seems the line buffer capacity goes up to a max 16384 pixels on 
single display, e.g., evergreen, and more like 8192 pixels on DCE8+? 
It's expressed as 8192 * 2 and talking

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-27 Thread Mario Kleiner

On 11/25/2015 08:36 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
>> On 11/25/2015 06:46 PM, Ville SyrjÃ¤lÃ¤ wrote:

...

>> Attached is my current patch i wanted to submit for the drm core's
>> drm_update_vblank_count(). I think it's good to make the core somewhat
>> robust against potential kms driver bugs or glitches. But if you
>> wouldn't like that patch, there wouldn't be much of a point sending it
>> out at all.
>>
>> thanks,
>> -mario
>>
>
>> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
>> From: Mario Kleiner 
>> Date: Wed, 25 Nov 2015 18:48:31 +0100
>> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.
>>
>> The changes to drm_update_vblank_count() for Linux 4.4-rc
>> made the function more fragile wrt. some hw quirks. E.g.,
>> at dev->driver->enable_vblank(), AMD gpu's fire a spurious
>> redundant vblank irq shortly after enabling vblank irqs, not
>> locked to vblank. This causes a redundant call which needs
>> to be suppressed to avoid miscounting.
>>
>> To increase robustness, shuffle things around a bit:
>>
>> On drivers with high precision vblank timestamping always
>> evaluate the timestamp difference between current timestamp
>> and previously recorded timestamp to detect such redundant
>> invocations and no-op in that case.
>>
>> Also detect and warn about timestamps going backwards to
>> catch potential kms driver bugs.
>>
>> This patch is meant for Linux 4.4-rc and later.
>>
>> Signed-off-by: Mario Kleiner 
>> ---
>>   drivers/gpu/drm/drm_irq.c | 53 
>> ---
>>   1 file changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 819b8c1..8728c3c 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, unsigned int pipe,
>>  unsigned long flags)
>>   {
>>  struct drm_vblank_crtc *vblank = >vblank[pipe];
>> -u32 cur_vblank, diff;
>> +u32 cur_vblank, diff = 0;
>>  bool rc;
>>  struct timeval t_vblank;
>> +const struct timeval *t_old;
>> +u64 diff_ns;
>>  int count = DRM_TIMESTAMP_MAXRETRIES;
>>  int framedur_ns = vblank->framedur_ns;
>>
>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, unsigned int pipe,
>>  rc = drm_get_last_vbltimestamp(dev, pipe, _vblank, flags);
>>  } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && 
>> --count > 0);
>>
>> -if (dev->max_vblank_count != 0) {
>> -/* trust the hw counter when it's around */
>> -diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> -} else if (rc && framedur_ns) {
>> -const struct timeval *t_old;
>> -u64 diff_ns;
>> -
>> +/*
>> + * Always use vblank timestamping based method if supported to reject
>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>> + * due to some irq handling quirk.
>> + *
>> + * This also sets the diff value for use as fallback below in case the
>> + * hw does not support a suitable hw vblank counter.
>> + */
>> +if (rc && framedur_ns) {
>
> If you fudged everything properly why do you still need this? With
> working hw counter there should be no need to do this stuff.
>

As far as testing on one DCE4 card goes, i don't need it anymore with my 
fudged hw counters and timestamps. The fudging so far seems to work 
nicely. I just wanted to have a bit of extra robustness and a bit of 
extra available debug output against future or other broken drivers, or 
mistakes in fudging on the current driver, e.g., against things like 
timestamps going backwards. Especially since i can only test on two AMD 
cards atm., quite a limited sample. There are 3 display engine 
generations before and 5 generations after my test sample.

-mario


>>  t_old = (dev, pipe, vblank->count);
>>  diff_ns = timeval_to_ns(_vblank) - timeval_to_ns(t_old);
>>
>> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, unsigned int pipe,
>>   */
>>  diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>
>> -if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>> +/* Catch driver timestamping bugs and prevent worse. */
>> +if ((s32) diff < 0) {
>> +DRM_DEBUG_VBL("crtc %u: Timestamp going backward!"
>> +" diff_ns = %lld, framedur_ns = %d)\n",
>> +pipe, (long long) diff_ns, framedur_ns);
>> +return;
>> +}
>> +
>> +if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>  DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>> -  " diff_ns

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Ville Syrjälä

On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
> On 11/25/2015 06:46 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> >> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> 
> 
>  On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>
> >> ...
> >> Ok, but why would that be a bad thing? I think we want it to think it 
> >> is
> >> in the previous frame if it is called outside the vblank irq context.
> >> The only reason we fudge it to the next frames vblank if i vblank irq 
> >> is
> >> because we know the vblank irq handler we are executing atm. was meant
> >> to execute within the upcoming vblank for the next frame, so we fudge
> >> the scanout positions and thereby timestamp to correspond to that new
> >> frame. But if something called outside irq context it should get a
> >> scanout position/timestamp that corresponds to "reality".
> >
> > It would be a bad thing since it would cause the timestamp to jump
> > backwards, and that would also cause the frame count guesstimate to go
> > backwards.
> >
> 
>  But only if we don't use the dev->driver->get_vblank_counter() method,
>  which we try to use on AMD.
> >>>
> >>> Well, if you do it that way then you have the problem of the hw counter
> >>> seeming to jump forward by one after crossing the start of vblank (when
> >>> compared to the value you sampled when you processed the early vblank
> >>> interrupt).
> >>>
> >>
> >> Ok, finally i see the bad scenario that wouldn't get prevented by our
> >> current locking with the new vblank counting in the core. The vblank
> >> enable path is safe due to locking and discounting of redundant
> >> timestamps etc. But the disable path could go wrong:
> >>
> >> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> >> updates timestamps and counts "as if" in vblank -> incremented vblank
> >> count and timestamp now set in the future.
> >>
> >> 2. After vblank irq finishes, but just before leading edge of vblank,
> >> vblank_disable_and_save() executes, doesn't get bumped timestamp or
> >> count because before vblank and not in vblank irq. Now
> >> drm_update_vblank_count() would process a
> >> "new" timestamp and count from the past and we'd have time and counts
> >> going backwards, and bad things would happen.
> >>
> >> I haven't observed such a thing happening during testing so far,
> >> probably because the time window in which it could happen is tiny, but
> >> given how awfully bad it would be, it needs to be prevented.
> >>
> >> I had a look at the description of the Vblank irq in the "M76 Register
> >> Reference Guide" for older asics and the description suggests that the
> >> vblank irq fires when the crtc's line buffer is finished reading pixel
> >> data from the scanout buffer in memory for a frame, ie., when the line
> >> buffer read "enters" vblank. That would explain why the irq happens a
> >> few scanlines before actual vblank, because line buffer refills must
> >> obviously happen before the crtc can send pixel data from the line
> >> buffer to the encoders, so it would lead a bit in time. That also means
> >> we can't delay the vblank irq to actually happen at start of vblank and
> >> have to deal with the early vblank irq.
> >>
> >>> I guess one silly idea would be to defer the vblank interrupt processing
> >>> to a timer, and just schedule it a bit into the future from the actual
> >>> interrupt handler.
> >>>
> >>
> >> Timer would be bad because then we get problems with the pageflip
> >> completion irq sometimes being processed before the vblank irq,
> >
> > You you'd need to move page flip completion to happen from the vblank
> > timer too I suppose.
> >
> >> and
> >> because we want to be fast in vblank irq handling, delivering vblank
> >> events etc. I wouldn't trust a timer to be reliable enough for such
> >> short waits.
> >
> > hrtimers should be accurate. But maybe more expensive than the timer
> > wheel.
> >
> 
> Sounds all a bit complex and fraught with new possible complications. I 
> can't spend much more time on this than i did so far, and we need to get 
> this into 4.4-rc, so i am aiming for the most simple solution that would 
> work.
> 
> >> Busy waiting wouldn't be great either in irq.
> >>
> >> So what about this relatively simple one?
> >>
> >> 1. In radeon_get_crtc_scanoutpos() we artifically define the
> >> vblank_start line to be, e.g, 5 scanlines before the true start of
> >> vblank, so for the purpose of vblank counter queries and timestamping
> >> our "vblank" would start a bit earlier and the vblank irq would always
> >> execute in "vblank".

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Mario Kleiner

On 11/25/2015 06:46 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:


 On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>
>> ...
>> Ok, but why would that be a bad thing? I think we want it to think it is
>> in the previous frame if it is called outside the vblank irq context.
>> The only reason we fudge it to the next frames vblank if i vblank irq is
>> because we know the vblank irq handler we are executing atm. was meant
>> to execute within the upcoming vblank for the next frame, so we fudge
>> the scanout positions and thereby timestamp to correspond to that new
>> frame. But if something called outside irq context it should get a
>> scanout position/timestamp that corresponds to "reality".
>
> It would be a bad thing since it would cause the timestamp to jump
> backwards, and that would also cause the frame count guesstimate to go
> backwards.
>

 But only if we don't use the dev->driver->get_vblank_counter() method,
 which we try to use on AMD.
>>>
>>> Well, if you do it that way then you have the problem of the hw counter
>>> seeming to jump forward by one after crossing the start of vblank (when
>>> compared to the value you sampled when you processed the early vblank
>>> interrupt).
>>>
>>
>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>> current locking with the new vblank counting in the core. The vblank
>> enable path is safe due to locking and discounting of redundant
>> timestamps etc. But the disable path could go wrong:
>>
>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>> updates timestamps and counts "as if" in vblank -> incremented vblank
>> count and timestamp now set in the future.
>>
>> 2. After vblank irq finishes, but just before leading edge of vblank,
>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>> count because before vblank and not in vblank irq. Now
>> drm_update_vblank_count() would process a
>> "new" timestamp and count from the past and we'd have time and counts
>> going backwards, and bad things would happen.
>>
>> I haven't observed such a thing happening during testing so far,
>> probably because the time window in which it could happen is tiny, but
>> given how awfully bad it would be, it needs to be prevented.
>>
>> I had a look at the description of the Vblank irq in the "M76 Register
>> Reference Guide" for older asics and the description suggests that the
>> vblank irq fires when the crtc's line buffer is finished reading pixel
>> data from the scanout buffer in memory for a frame, ie., when the line
>> buffer read "enters" vblank. That would explain why the irq happens a
>> few scanlines before actual vblank, because line buffer refills must
>> obviously happen before the crtc can send pixel data from the line
>> buffer to the encoders, so it would lead a bit in time. That also means
>> we can't delay the vblank irq to actually happen at start of vblank and
>> have to deal with the early vblank irq.
>>
>>> I guess one silly idea would be to defer the vblank interrupt processing
>>> to a timer, and just schedule it a bit into the future from the actual
>>> interrupt handler.
>>>
>>
>> Timer would be bad because then we get problems with the pageflip
>> completion irq sometimes being processed before the vblank irq,
>
> You you'd need to move page flip completion to happen from the vblank
> timer too I suppose.
>
>> and
>> because we want to be fast in vblank irq handling, delivering vblank
>> events etc. I wouldn't trust a timer to be reliable enough for such
>> short waits.
>
> hrtimers should be accurate. But maybe more expensive than the timer
> wheel.
>

Sounds all a bit complex and fraught with new possible complications. I 
can't spend much more time on this than i did so far, and we need to get 
this into 4.4-rc, so i am aiming for the most simple solution that would 
work.

>> Busy waiting wouldn't be great either in irq.
>>
>> So what about this relatively simple one?
>>
>> 1. In radeon_get_crtc_scanoutpos() we artifically define the
>> vblank_start line to be, e.g, 5 scanlines before the true start of
>> vblank, so for the purpose of vblank counter queries and timestamping
>> our "vblank" would start a bit earlier and the vblank irq would always
>> execute in "vblank". Non-Irq invocations like vblank_disable_and_save()
>> would also be treated to this early vblank start and so what the DRM
>> core observes would always be consistent.
>>
>> 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos()
>> internally for "dynpm" dynamic power

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Ville Syrjälä

On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>
> >>
> >> On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>  On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
>  ...
>  Ok, but why would that be a bad thing? I think we want it to think it is
>  in the previous frame if it is called outside the vblank irq context.
>  The only reason we fudge it to the next frames vblank if i vblank irq is
>  because we know the vblank irq handler we are executing atm. was meant
>  to execute within the upcoming vblank for the next frame, so we fudge
>  the scanout positions and thereby timestamp to correspond to that new
>  frame. But if something called outside irq context it should get a
>  scanout position/timestamp that corresponds to "reality".
> >>>
> >>> It would be a bad thing since it would cause the timestamp to jump
> >>> backwards, and that would also cause the frame count guesstimate to go
> >>> backwards.
> >>>
> >>
> >> But only if we don't use the dev->driver->get_vblank_counter() method,
> >> which we try to use on AMD.
> >
> > Well, if you do it that way then you have the problem of the hw counter
> > seeming to jump forward by one after crossing the start of vblank (when
> > compared to the value you sampled when you processed the early vblank
> > interrupt).
> >
> 
> Ok, finally i see the bad scenario that wouldn't get prevented by our 
> current locking with the new vblank counting in the core. The vblank 
> enable path is safe due to locking and discounting of redundant 
> timestamps etc. But the disable path could go wrong:
> 
> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> updates timestamps and counts "as if" in vblank -> incremented vblank 
> count and timestamp now set in the future.
> 
> 2. After vblank irq finishes, but just before leading edge of vblank, 
> vblank_disable_and_save() executes, doesn't get bumped timestamp or 
> count because before vblank and not in vblank irq. Now 
> drm_update_vblank_count() would process a
> "new" timestamp and count from the past and we'd have time and counts 
> going backwards, and bad things would happen.
> 
> I haven't observed such a thing happening during testing so far, 
> probably because the time window in which it could happen is tiny, but 
> given how awfully bad it would be, it needs to be prevented.
> 
> I had a look at the description of the Vblank irq in the "M76 Register 
> Reference Guide" for older asics and the description suggests that the 
> vblank irq fires when the crtc's line buffer is finished reading pixel 
> data from the scanout buffer in memory for a frame, ie., when the line 
> buffer read "enters" vblank.

Hmm. Does that mean there's always at least one fullscreen plane enabled
in the hw? As in you can't turn off the primary plane or make it smaller
than the active video area? Othwewise it sounds like you'd could either
not get it at all, or get it somewhere in the middle of the screen.

-- 
Ville SyrjÃ¤lÃ¤
Intel OTC

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Ville Syrjälä

On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>
> >>
> >> On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>  On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
>  ...
>  Ok, but why would that be a bad thing? I think we want it to think it is
>  in the previous frame if it is called outside the vblank irq context.
>  The only reason we fudge it to the next frames vblank if i vblank irq is
>  because we know the vblank irq handler we are executing atm. was meant
>  to execute within the upcoming vblank for the next frame, so we fudge
>  the scanout positions and thereby timestamp to correspond to that new
>  frame. But if something called outside irq context it should get a
>  scanout position/timestamp that corresponds to "reality".
> >>>
> >>> It would be a bad thing since it would cause the timestamp to jump
> >>> backwards, and that would also cause the frame count guesstimate to go
> >>> backwards.
> >>>
> >>
> >> But only if we don't use the dev->driver->get_vblank_counter() method,
> >> which we try to use on AMD.
> >
> > Well, if you do it that way then you have the problem of the hw counter
> > seeming to jump forward by one after crossing the start of vblank (when
> > compared to the value you sampled when you processed the early vblank
> > interrupt).
> >
> 
> Ok, finally i see the bad scenario that wouldn't get prevented by our 
> current locking with the new vblank counting in the core. The vblank 
> enable path is safe due to locking and discounting of redundant 
> timestamps etc. But the disable path could go wrong:
> 
> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> updates timestamps and counts "as if" in vblank -> incremented vblank 
> count and timestamp now set in the future.
> 
> 2. After vblank irq finishes, but just before leading edge of vblank, 
> vblank_disable_and_save() executes, doesn't get bumped timestamp or 
> count because before vblank and not in vblank irq. Now 
> drm_update_vblank_count() would process a
> "new" timestamp and count from the past and we'd have time and counts 
> going backwards, and bad things would happen.
> 
> I haven't observed such a thing happening during testing so far, 
> probably because the time window in which it could happen is tiny, but 
> given how awfully bad it would be, it needs to be prevented.
> 
> I had a look at the description of the Vblank irq in the "M76 Register 
> Reference Guide" for older asics and the description suggests that the 
> vblank irq fires when the crtc's line buffer is finished reading pixel 
> data from the scanout buffer in memory for a frame, ie., when the line 
> buffer read "enters" vblank. That would explain why the irq happens a 
> few scanlines before actual vblank, because line buffer refills must 
> obviously happen before the crtc can send pixel data from the line 
> buffer to the encoders, so it would lead a bit in time. That also means 
> we can't delay the vblank irq to actually happen at start of vblank and 
> have to deal with the early vblank irq.
> 
> > I guess one silly idea would be to defer the vblank interrupt processing
> > to a timer, and just schedule it a bit into the future from the actual
> > interrupt handler.
> >
> 
> Timer would be bad because then we get problems with the pageflip 
> completion irq sometimes being processed before the vblank irq,

You you'd need to move page flip completion to happen from the vblank
timer too I suppose.

> and 
> because we want to be fast in vblank irq handling, delivering vblank 
> events etc. I wouldn't trust a timer to be reliable enough for such 
> short waits.

hrtimers should be accurate. But maybe more expensive than the timer
wheel.

> Busy waiting wouldn't be great either in irq.
> 
> So what about this relatively simple one?
> 
> 1. In radeon_get_crtc_scanoutpos() we artifically define the 
> vblank_start line to be, e.g, 5 scanlines before the true start of 
> vblank, so for the purpose of vblank counter queries and timestamping 
> our "vblank" would start a bit earlier and the vblank irq would always 
> execute in "vblank". Non-Irq invocations like vblank_disable_and_save() 
> would also be treated to this early vblank start and so what the DRM 
> core observes would always be consistent.
> 
> 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() 
> internally for "dynpm" dynamic power management/reclocking, and to 
> implement pageflip completion detection on asics older than DCE3 which 
> don't have pageflip interrupts. For those cases we need to use the true 
> start of vblank, so for this internal use we pass in some special flag 
> into radeon_get_crtc_scanoutpos() to tell it to not

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Mario Kleiner

On 11/25/2015 06:58 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:


 On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>
>> ...
>> Ok, but why would that be a bad thing? I think we want it to think it is
>> in the previous frame if it is called outside the vblank irq context.
>> The only reason we fudge it to the next frames vblank if i vblank irq is
>> because we know the vblank irq handler we are executing atm. was meant
>> to execute within the upcoming vblank for the next frame, so we fudge
>> the scanout positions and thereby timestamp to correspond to that new
>> frame. But if something called outside irq context it should get a
>> scanout position/timestamp that corresponds to "reality".
>
> It would be a bad thing since it would cause the timestamp to jump
> backwards, and that would also cause the frame count guesstimate to go
> backwards.
>

 But only if we don't use the dev->driver->get_vblank_counter() method,
 which we try to use on AMD.
>>>
>>> Well, if you do it that way then you have the problem of the hw counter
>>> seeming to jump forward by one after crossing the start of vblank (when
>>> compared to the value you sampled when you processed the early vblank
>>> interrupt).
>>>
>>
>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>> current locking with the new vblank counting in the core. The vblank
>> enable path is safe due to locking and discounting of redundant
>> timestamps etc. But the disable path could go wrong:
>>
>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>> updates timestamps and counts "as if" in vblank -> incremented vblank
>> count and timestamp now set in the future.
>>
>> 2. After vblank irq finishes, but just before leading edge of vblank,
>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>> count because before vblank and not in vblank irq. Now
>> drm_update_vblank_count() would process a
>> "new" timestamp and count from the past and we'd have time and counts
>> going backwards, and bad things would happen.
>>
>> I haven't observed such a thing happening during testing so far,
>> probably because the time window in which it could happen is tiny, but
>> given how awfully bad it would be, it needs to be prevented.
>>
>> I had a look at the description of the Vblank irq in the "M76 Register
>> Reference Guide" for older asics and the description suggests that the
>> vblank irq fires when the crtc's line buffer is finished reading pixel
>> data from the scanout buffer in memory for a frame, ie., when the line
>> buffer read "enters" vblank.
>
> Hmm. Does that mean there's always at least one fullscreen plane enabled
> in the hw? As in you can't turn off the primary plane or make it smaller
> than the active video area? Othwewise it sounds like you'd could either
> not get it at all, or get it somewhere in the middle of the screen.
>

It says "Interrupt that can be programmed to be generated by the
primary display controller's line buffer logic either when the
source image line counter is not requesting any active
display data (i.e. in the vertical blank) or the output CRTC
timing generator is within the vertical blanking region."

So my statements were my interpretation of this quote, so i can make 
some sense out of the vblank irq behaviour. I guess Alex or Harry would 
know? The M76 reference refers to some older asics, i just assume it is 
the same for the current ones, given that observed behaviour would be 
consistent with the line buffer causing this lead of a couple of 
scanlines. I see about 2 scanlines on DCE4 and about 3 scanlines on 
DCE3. I don't know how big the line buffer is, how quickly it refills 
etc., but it sounds reasonable.

-mario

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Mario Kleiner

On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>
>>
>> On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
 On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:

 ...
 Ok, but why would that be a bad thing? I think we want it to think it is
 in the previous frame if it is called outside the vblank irq context.
 The only reason we fudge it to the next frames vblank if i vblank irq is
 because we know the vblank irq handler we are executing atm. was meant
 to execute within the upcoming vblank for the next frame, so we fudge
 the scanout positions and thereby timestamp to correspond to that new
 frame. But if something called outside irq context it should get a
 scanout position/timestamp that corresponds to "reality".
>>>
>>> It would be a bad thing since it would cause the timestamp to jump
>>> backwards, and that would also cause the frame count guesstimate to go
>>> backwards.
>>>
>>
>> But only if we don't use the dev->driver->get_vblank_counter() method,
>> which we try to use on AMD.
>
> Well, if you do it that way then you have the problem of the hw counter
> seeming to jump forward by one after crossing the start of vblank (when
> compared to the value you sampled when you processed the early vblank
> interrupt).
>

Ok, finally i see the bad scenario that wouldn't get prevented by our 
current locking with the new vblank counting in the core. The vblank 
enable path is safe due to locking and discounting of redundant 
timestamps etc. But the disable path could go wrong:

1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
updates timestamps and counts "as if" in vblank -> incremented vblank 
count and timestamp now set in the future.

2. After vblank irq finishes, but just before leading edge of vblank, 
vblank_disable_and_save() executes, doesn't get bumped timestamp or 
count because before vblank and not in vblank irq. Now 
drm_update_vblank_count() would process a
"new" timestamp and count from the past and we'd have time and counts 
going backwards, and bad things would happen.

I haven't observed such a thing happening during testing so far, 
probably because the time window in which it could happen is tiny, but 
given how awfully bad it would be, it needs to be prevented.

I had a look at the description of the Vblank irq in the "M76 Register 
Reference Guide" for older asics and the description suggests that the 
vblank irq fires when the crtc's line buffer is finished reading pixel 
data from the scanout buffer in memory for a frame, ie., when the line 
buffer read "enters" vblank. That would explain why the irq happens a 
few scanlines before actual vblank, because line buffer refills must 
obviously happen before the crtc can send pixel data from the line 
buffer to the encoders, so it would lead a bit in time. That also means 
we can't delay the vblank irq to actually happen at start of vblank and 
have to deal with the early vblank irq.

> I guess one silly idea would be to defer the vblank interrupt processing
> to a timer, and just schedule it a bit into the future from the actual
> interrupt handler.
>

Timer would be bad because then we get problems with the pageflip 
completion irq sometimes being processed before the vblank irq, and 
because we want to be fast in vblank irq handling, delivering vblank 
events etc. I wouldn't trust a timer to be reliable enough for such 
short waits. Busy waiting wouldn't be great either in irq.

So what about this relatively simple one?

1. In radeon_get_crtc_scanoutpos() we artifically define the 
vblank_start line to be, e.g, 5 scanlines before the true start of 
vblank, so for the purpose of vblank counter queries and timestamping 
our "vblank" would start a bit earlier and the vblank irq would always 
execute in "vblank". Non-Irq invocations like vblank_disable_and_save() 
would also be treated to this early vblank start and so what the DRM 
core observes would always be consistent.

2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() 
internally for "dynpm" dynamic power management/reclocking, and to 
implement pageflip completion detection on asics older than DCE3 which 
don't have pageflip interrupts. For those cases we need to use the true 
start of vblank, so for this internal use we pass in some special flag 
into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank 
start around.

3. I've added another check to my patch for drm_update_vblank_count() to 
catch timestamps going backwards and discounting such cases, for a bit 
of extra robustness against driver trouble.

Any ideas why this would be a stupid idea? I'll try this now and just 
hope meanwhile nobody finds a reason why this would be bad.

thanks,
-mario

>> Also dev->vblank_time_lock should protect us

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-25 Thread Alex Deucher

On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
 wrote:
> On 11/25/2015 06:58 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>
>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>>>
>>> On 11/23/2015 09:24 PM, Ville SyrjÃ¤lÃ¤ wrote:

 On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>
>
>
> On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>
>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>
>>> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:

 On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>
>>>
>>> ...
>>> Ok, but why would that be a bad thing? I think we want it to think it
>>> is
>>> in the previous frame if it is called outside the vblank irq context.
>>> The only reason we fudge it to the next frames vblank if i vblank irq
>>> is
>>> because we know the vblank irq handler we are executing atm. was
>>> meant
>>> to execute within the upcoming vblank for the next frame, so we fudge
>>> the scanout positions and thereby timestamp to correspond to that new
>>> frame. But if something called outside irq context it should get a
>>> scanout position/timestamp that corresponds to "reality".
>>
>>
>> It would be a bad thing since it would cause the timestamp to jump
>> backwards, and that would also cause the frame count guesstimate to go
>> backwards.
>>
>
> But only if we don't use the dev->driver->get_vblank_counter() method,
> which we try to use on AMD.


 Well, if you do it that way then you have the problem of the hw counter
 seeming to jump forward by one after crossing the start of vblank (when
 compared to the value you sampled when you processed the early vblank
 interrupt).

>>>
>>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>>> current locking with the new vblank counting in the core. The vblank
>>> enable path is safe due to locking and discounting of redundant
>>> timestamps etc. But the disable path could go wrong:
>>>
>>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>>> updates timestamps and counts "as if" in vblank -> incremented vblank
>>> count and timestamp now set in the future.
>>>
>>> 2. After vblank irq finishes, but just before leading edge of vblank,
>>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>>> count because before vblank and not in vblank irq. Now
>>> drm_update_vblank_count() would process a
>>> "new" timestamp and count from the past and we'd have time and counts
>>> going backwards, and bad things would happen.
>>>
>>> I haven't observed such a thing happening during testing so far,
>>> probably because the time window in which it could happen is tiny, but
>>> given how awfully bad it would be, it needs to be prevented.
>>>
>>> I had a look at the description of the Vblank irq in the "M76 Register
>>> Reference Guide" for older asics and the description suggests that the
>>> vblank irq fires when the crtc's line buffer is finished reading pixel
>>> data from the scanout buffer in memory for a frame, ie., when the line
>>> buffer read "enters" vblank.
>>
>>
>> Hmm. Does that mean there's always at least one fullscreen plane enabled
>> in the hw? As in you can't turn off the primary plane or make it smaller
>> than the active video area? Othwewise it sounds like you'd could either
>> not get it at all, or get it somewhere in the middle of the screen.
>>
>
> It says "Interrupt that can be programmed to be generated by the
> primary display controller's line buffer logic either when the
> source image line counter is not requesting any active
> display data (i.e. in the vertical blank) or the output CRTC
> timing generator is within the vertical blanking region."
>
> So my statements were my interpretation of this quote, so i can make some
> sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
> M76 reference refers to some older asics, i just assume it is the same for
> the current ones, given that observed behaviour would be consistent with the
> line buffer causing this lead of a couple of scanlines. I see about 2
> scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
> line buffer is, how quickly it refills etc., but it sounds reasonable.

The size of the line buffer varies by generation, but the LB logic is
still responsible for generating the vblank interrupt even on newer
hw.

Alex

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Ville Syrjälä

On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> 
> 
> On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>
> >> ...
> >> Ok, but why would that be a bad thing? I think we want it to think it is
> >> in the previous frame if it is called outside the vblank irq context.
> >> The only reason we fudge it to the next frames vblank if i vblank irq is
> >> because we know the vblank irq handler we are executing atm. was meant
> >> to execute within the upcoming vblank for the next frame, so we fudge
> >> the scanout positions and thereby timestamp to correspond to that new
> >> frame. But if something called outside irq context it should get a
> >> scanout position/timestamp that corresponds to "reality".
> >
> > It would be a bad thing since it would cause the timestamp to jump
> > backwards, and that would also cause the frame count guesstimate to go
> > backwards.
> >
> 
> But only if we don't use the dev->driver->get_vblank_counter() method, 
> which we try to use on AMD.

Well, if you do it that way then you have the problem of the hw counter
seeming to jump forward by one after crossing the start of vblank (when
compared to the value you sampled when you processed the early vblank
interrupt).

I guess one silly idea would be to defer the vblank interrupt processing
to a timer, and just schedule it a bit into the future from the actual
interrupt handler.

> Also dev->vblank_time_lock should protect us 
> from concurrent execution of the vblank counting/timestamping in irq 
> context and non-irq context? At least that was one of the purposes of 
> that lock in the past?

The scenarion I outlined doesn't require anything to happen
concurrently.

> 
> -mario
> 
> >>
> >> It would maybe be a problem to get different answers for a query at the
> >> same scanout position if we used .get_scanout_position() for something
> >> else than for calculating vblank timestamps, but we don't do that atm.
> >>
> >> Maybe i'm overlooking something here related to the somewhat rewritten
> >> code from the last year or so? But in the original design this would be
> >> exactly what i intended?
> >>
> >> ...
> >>
>  So it's good enough for typical desktop
>  applications/compositors/games/media players, and a nice improvement
>  over the previous state, but not quite sufficient for applications that
>  need long time consistent vblank counts for long waits of multiple
>  seconds or even minutes. So it's still good to have hw counters if we
>  can get some that are reliable enough.
> >>>
> >>> Ah, I didn't realize you care about small errors in the counter for
> >>> such long periods of vblank off.
> >>>
> >>
> >> Actually, you are right, i was stupid - not enough sleep last friday. I
> >> do care about such small errors, but the long vblank off periods don't
> >> matter at all the way my software works. I query the current count and
> >> timestamp (glXGetSyncValuesOML), calculate a target count based on those
> >> and then schedule a swap for that target count via glXSwapBuffersMscOML.
> >> That swapbuffers call will keep vblank irqs on until the kms pageflip is
> >> queued. So i only care about vblank counts/timestamps being consistent
> >> for short amounts of time, typically 1 video refresh from vblank query
> >> to queueing a vblank event, and then from reception of that
> >> event/queuing the pageflip to pageflip completion event. So even if the
> >> system would be heavily loaded and my code would have big preemption
> >> delays i think counts that are consistent over a few seconds would be
> >> enough to keep things working. Otherwise it wouldn't work now either
> >> with a vblank off after 5 seconds and nouveau not having vblank hw 
> >> counters.
> >>
> >> -mario
> >>
> >>
> 
>  -mario
> 
> 
> >>
> >> -mario
> >>
> 
>  It almost sort of works on the rs600 code path, but i need a bit of 
>  info
>  from you:
> 
>  1. There's this register from the old specs for m76.pdf, which is not
>  part of the current register defines for radeon-kms:
> 
>  "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> 
>  It contains the lower 16 bits of framecounter and the 13 bits of
>  vertical scanout position. It seems to give the same readings as the 
>  24
>  bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. 
>  This
>  would come handy.
> 
>  Does Evergreen and later have a same/similar register and where is 
>  it?
> 
>  2. The hw framecounter seems to increment when the vertical scanout
>  position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>  gpu i tested so

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner

On 11/23/2015 09:04 PM, Harry Wentland wrote:
> Hi Mario,
>
> when we've had issues with this on amdgpu Christian fixed it by enabling
> page flip irq all the time, rather than turning it on when usermode
> request a flip and turning it back off after we handled it. I believe
> that fix exists on radeon already. Michel should have more info on that.
>
> See other comments inline.
>
> Thanks,
> Harry
>
>
> On 2015-11-23 11:02 AM, Mario Kleiner wrote:
>> On 11/20/2015 04:42 AM, Alex Deucher wrote:
>>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>>>  wrote:
 Hi Alex and Michel and Ville,

 it's "fix vblank stuff" time again ;-)
>>>
>>> Adding Harry from our display team.  He might be able to fill in the
>>> blanks of on some of this better than I can.  It might also be worth
>>> checking to see how our DAL (our new display code which is being
>>> developed directly by our display team) code handles this.  It could
>>> be that we are just missing register settings:
>>
>> Thanks Alex! And hello Harry :)
>>
>>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
>>
>> I'll have a look at this.
>>
>>> Additionally we've published full registers headers for the display
>>> block:
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
>>>
>>> The DCE8 stuff should generally apply back to DCE4.  If you have
>>> questions about registers older asics not covered in the hw docs, let
>>> me know.  Note the new headers are dword aligned rather than byte
>>> aligned.
>>>
>>
>> I've tested now with two different progressive modes on DCE3 and one
>> progressive mode on DCE4, the only cards i have atm. So far it seems
>> that the framecounter indeed increments when the vpos from the scanout
>> position query jumps back to zero. Attached for reference is my
>> current patch to radeon-kms. This one seems to work reliably so far,
>> also if i enable the immediate vblank irq disable which we so far only
>> used on intel-kms.
>>
>> But according to this patch the framecounter increment happens
>> somewhere in the middle of vblank.
>>
>> Essentially from the vpos extracted from
>> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start"
>> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1
>> the framecounter stays on the count of the old previous scanout cycle.
>> Then when vpos wraps to zero the framecounter increments by 1. And
>> then we have another couple of dozen lines inside vblank until
>> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and
>> entering active scanout for the new frame.
>>
>> So the position of observed framecounter increment seems to be not
>> close to the end of vblank ("start of frame" indicator?), but a couple
>> of scanlines after start of vblank.
>>
>> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478,
>> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the
>> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and
>> the framecounter increments by 1, then 38 scanlines later the vblank
>> ends.
>>
>> So i seem to have something that seems to work in practice and this
>> "increment framecounter if vpos wraps back to zero" behavior makes
>> some sense. It just doesn't conform to what those descriptions for
>> start_line and "start of frame" indicator describe?
>>
> This is correct. Our HW doesn't really have a vblank counter but a frame
> counter. The framecounter increments at the start of vsync, which is
> when we wrap to zero which doesn't coincide with the start of vblank.
>
> What we're trying to do with get_vblank_counter isn't the same as what
> framecount gives us, but we could probably do something like:
>
> if (get_scanout_pos > vblank_start)
>return frame_count + 1;
> else
>return frame_count;
>

Great! That's what my current patch does and it seems to work well with 
different video modes on both DCE3 and DCE4. So theory agrees with 
practice again :) - thanks for clarifying this.

So the other problem we have since forever is the vblank irq firing 
before the start of vblank. We are typically 1-2 scanlines before 
vblank_start when we sample the scanout position and framecounter. This 
needs a slightly ugly workaround. Is there a way, maybe some config 
register, to fire the irq at leading edge of vblank instead of a bit too 
early?

thanks,
-mario

> Harry
>
>> I'll test with a few more video modes.
>>
>> thanks,
>> -mario
>>
>>

 Ville's changes to the DRM's drm_handle_vblank() /
 drm_update_vblank_count()
 code in Linux 4.4 not only made that code more elegant, but also
 removed the
 robustness against the vblank irq quirks in AMD hw and similar
 hardware. So
 now i get tons of off-by-one errors and

 "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
 completion event has impossible msc 24803 < target_msc 24804" XOrg
 messages
 from that

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner

On 11/23/2015 04:51 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>
>> ...
>> Ok, but why would that be a bad thing? I think we want it to think it is
>> in the previous frame if it is called outside the vblank irq context.
>> The only reason we fudge it to the next frames vblank if i vblank irq is
>> because we know the vblank irq handler we are executing atm. was meant
>> to execute within the upcoming vblank for the next frame, so we fudge
>> the scanout positions and thereby timestamp to correspond to that new
>> frame. But if something called outside irq context it should get a
>> scanout position/timestamp that corresponds to "reality".
>
> It would be a bad thing since it would cause the timestamp to jump
> backwards, and that would also cause the frame count guesstimate to go
> backwards.
>

But only if we don't use the dev->driver->get_vblank_counter() method, 
which we try to use on AMD. Also dev->vblank_time_lock should protect us 
from concurrent execution of the vblank counting/timestamping in irq 
context and non-irq context? At least that was one of the purposes of 
that lock in the past?

-mario

>>
>> It would maybe be a problem to get different answers for a query at the
>> same scanout position if we used .get_scanout_position() for something
>> else than for calculating vblank timestamps, but we don't do that atm.
>>
>> Maybe i'm overlooking something here related to the somewhat rewritten
>> code from the last year or so? But in the original design this would be
>> exactly what i intended?
>>
>> ...
>>
 So it's good enough for typical desktop
 applications/compositors/games/media players, and a nice improvement
 over the previous state, but not quite sufficient for applications that
 need long time consistent vblank counts for long waits of multiple
 seconds or even minutes. So it's still good to have hw counters if we
 can get some that are reliable enough.
>>>
>>> Ah, I didn't realize you care about small errors in the counter for
>>> such long periods of vblank off.
>>>
>>
>> Actually, you are right, i was stupid - not enough sleep last friday. I
>> do care about such small errors, but the long vblank off periods don't
>> matter at all the way my software works. I query the current count and
>> timestamp (glXGetSyncValuesOML), calculate a target count based on those
>> and then schedule a swap for that target count via glXSwapBuffersMscOML.
>> That swapbuffers call will keep vblank irqs on until the kms pageflip is
>> queued. So i only care about vblank counts/timestamps being consistent
>> for short amounts of time, typically 1 video refresh from vblank query
>> to queueing a vblank event, and then from reception of that
>> event/queuing the pageflip to pageflip completion event. So even if the
>> system would be heavily loaded and my code would have big preemption
>> delays i think counts that are consistent over a few seconds would be
>> enough to keep things working. Otherwise it wouldn't work now either
>> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
>>
>> -mario
>>
>>

 -mario

>>
>> -mario
>>

 It almost sort of works on the rs600 code path, but i need a bit of 
 info
 from you:

 1. There's this register from the old specs for m76.pdf, which is not
 part of the current register defines for radeon-kms:

 "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"

 It contains the lower 16 bits of framecounter and the 13 bits of
 vertical scanout position. It seems to give the same readings as the 24
 bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
 would come handy.

 Does Evergreen and later have a same/similar register and where is it?

 2. The hw framecounter seems to increment when the vertical scanout
 position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
 gpu i tested so far. Is this so on all asics? And is the hw counter
 increment happening exactly at the moment that vertical scanout 
 position
 jumps back to zero, ie. both events are driven by the same signal? Or 
 is
 the framecounter increment just happening somewhere inside either
 scanline VTOTAL-1 or scanline 0?

 If we can fix this and get it into rc2 or rc3 then we could avoid a bad
 regression and with a bit of luck at the same time improve by being 
 able
 to set dev->vblank_disable_immediate = true then and allow vblank irqs
 to get turned off more aggressively for a bit of extra power saving.

 thanks,
 -mario
>>>
 --

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Ville Syrjälä

On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
> ...
> 
> >> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we
> >> are no more than 1% of the display height away from start of vblank we
> >> fudge scanout position in a way so that the timestamp gets computed for
> >> the soon-to-begin vblank, not the old one.
> >
> > The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
> > that if you call .get_scanout_positon() from a non-irq context between
> > the irq firing and start of vblank, you'll think you're still in the
> > previous frame. Hence my suggestion to note down the frame counter when
> > called from the irq, and then keep doing the fudging until you've truly
> > crossed the start of vblank.
> >
> 
> Ok, but why would that be a bad thing? I think we want it to think it is 
> in the previous frame if it is called outside the vblank irq context. 
> The only reason we fudge it to the next frames vblank if i vblank irq is 
> because we know the vblank irq handler we are executing atm. was meant 
> to execute within the upcoming vblank for the next frame, so we fudge 
> the scanout positions and thereby timestamp to correspond to that new 
> frame. But if something called outside irq context it should get a 
> scanout position/timestamp that corresponds to "reality".

It would be a bad thing since it would cause the timestamp to jump
backwards, and that would also cause the frame count guesstimate to go
backwards.

> 
> It would maybe be a problem to get different answers for a query at the 
> same scanout position if we used .get_scanout_position() for something 
> else than for calculating vblank timestamps, but we don't do that atm.
> 
> Maybe i'm overlooking something here related to the somewhat rewritten 
> code from the last year or so? But in the original design this would be 
> exactly what i intended?
> 
> ...
> 
> >> So it's good enough for typical desktop
> >> applications/compositors/games/media players, and a nice improvement
> >> over the previous state, but not quite sufficient for applications that
> >> need long time consistent vblank counts for long waits of multiple
> >> seconds or even minutes. So it's still good to have hw counters if we
> >> can get some that are reliable enough.
> >
> > Ah, I didn't realize you care about small errors in the counter for
> > such long periods of vblank off.
> >
> 
> Actually, you are right, i was stupid - not enough sleep last friday. I 
> do care about such small errors, but the long vblank off periods don't 
> matter at all the way my software works. I query the current count and 
> timestamp (glXGetSyncValuesOML), calculate a target count based on those 
> and then schedule a swap for that target count via glXSwapBuffersMscOML. 
> That swapbuffers call will keep vblank irqs on until the kms pageflip is 
> queued. So i only care about vblank counts/timestamps being consistent 
> for short amounts of time, typically 1 video refresh from vblank query 
> to queueing a vblank event, and then from reception of that 
> event/queuing the pageflip to pageflip completion event. So even if the 
> system would be heavily loaded and my code would have big preemption 
> delays i think counts that are consistent over a few seconds would be 
> enough to keep things working. Otherwise it wouldn't work now either 
> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
> 
> -mario
> 
> 
> >>
> >> -mario
> >>
> >>
> 
>  -mario
> 
> >>
> >> It almost sort of works on the rs600 code path, but i need a bit of 
> >> info
> >> from you:
> >>
> >> 1. There's this register from the old specs for m76.pdf, which is not
> >> part of the current register defines for radeon-kms:
> >>
> >> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> >>
> >> It contains the lower 16 bits of framecounter and the 13 bits of
> >> vertical scanout position. It seems to give the same readings as the 24
> >> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
> >> would come handy.
> >>
> >> Does Evergreen and later have a same/similar register and where is it?
> >>
> >> 2. The hw framecounter seems to increment when the vertical scanout
> >> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
> >> gpu i tested so far. Is this so on all asics? And is the hw counter
> >> increment happening exactly at the moment that vertical scanout 
> >> position
> >> jumps back to zero, ie. both events are driven by the same signal? Or 
> >> is
> >> the framecounter increment just happening somewhere inside either
> >> scanline VTOTAL-1 or scanline 0?
> >>
> >>
> >> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner

On 11/20/2015 04:42 AM, Alex Deucher wrote:
> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>  wrote:
>> Hi Alex and Michel and Ville,
>>
>> it's "fix vblank stuff" time again ;-)
>
> Adding Harry from our display team.  He might be able to fill in the
> blanks of on some of this better than I can.  It might also be worth
> checking to see how our DAL (our new display code which is being
> developed directly by our display team) code handles this.  It could
> be that we are just missing register settings:

Thanks Alex! And hello Harry :)

> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip

I'll have a look at this.

> Additionally we've published full registers headers for the display block:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
> The DCE8 stuff should generally apply back to DCE4.  If you have
> questions about registers older asics not covered in the hw docs, let
> me know.  Note the new headers are dword aligned rather than byte
> aligned.
>

I've tested now with two different progressive modes on DCE3 and one 
progressive mode on DCE4, the only cards i have atm. So far it seems 
that the framecounter indeed increments when the vpos from the scanout 
position query jumps back to zero. Attached for reference is my current 
patch to radeon-kms. This one seems to work reliably so far, also if i 
enable the immediate vblank irq disable which we so far only used on 
intel-kms.

But according to this patch the framecounter increment happens somewhere 
in the middle of vblank.

Essentially from the vpos extracted from EVERGREEN_CRTC_STATUS_POSITION 
which defines start of vblank ("Start" part of 
EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 the 
framecounter stays on the count of the old previous scanout cycle. Then 
when vpos wraps to zero the framecounter increments by 1. And then we 
have another couple of dozen lines inside vblank until reaching the 
"End" part of EVERGREEN_CRTC_V_BLANK_START_END and entering active 
scanout for the new frame.

So the position of observed framecounter increment seems to be not close 
to the end of vblank ("start of frame" indicator?), but a couple of 
scanlines after start of vblank.

E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, 
vtotal is 1481, end of vblank is 38. So i enter the vblank and see the 
old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and the 
framecounter increments by 1, then 38 scanlines later the vblank ends.

So i seem to have something that seems to work in practice and this 
"increment framecounter if vpos wraps back to zero" behavior makes some 
sense. It just doesn't conform to what those descriptions for start_line 
and "start of frame" indicator describe?

I'll test with a few more video modes.

thanks,
-mario

>>
>> Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_count()
>> code in Linux 4.4 not only made that code more elegant, but also removed the
>> robustness against the vblank irq quirks in AMD hw and similar hardware. So
>> now i get tons of off-by-one errors and
>>
>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>> completion event has impossible msc 24803 < target_msc 24804" XOrg messages
>> from that kernel.
>>
>> One of the reasons for trouble is that AMD hw quirk where the hw fires an
>> extra vblank irq shortly after vblank irq's get enabled, not synchronized to
>> vblank, but typically in the middle of active scanout, so we get a redundant
>> call to drm_handle_vblank in the middle of scanout.
>>
>> To fix that i have a minor patch to make drm_update_vblank_count() again
>> robust against such redundant calls, which i will send out later to the
>> mailing list. Diff attached for reference.
>>
>> The second quirk of AMD hw is that the vblank interrupt fires a few
>> scanlines before start of vblank, so drm_handle_vblank ->
>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets called
>> before the start of the vblank for which the new vblank count should be
>> queried.
>>
>> The third problem is that the DRM vblank handling always had the assumption
>> that hardware vblank counters would essentially increment at leading edge of
>> vblank - basically in sync with the firing of the vblank irq, so that a hw
>> counter readout from within the vblank irq handler would always deliver the
>> new incremented value. If this assumption is violated then the counting by
>> use of the hw counter gets unreliable, because depending on random small
>> delays in irq handling the code may end up sampling the hw counter pre- or
>> post-increment, leading to inconsistent updating and funky bugs. It just so
>> happens that AMD hardware doesn't increment the hw counter at leading edge
>> of vblank, so stuff falls apart.
>>
>> So to fix those two problems i'm tinkering with cooking the hw vblank
>> counter value returned by radeon_get_vblank_counter_kms() to

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Mario Kleiner

On 11/20/2015 04:34 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:

...

>> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we
>> are no more than 1% of the display height away from start of vblank we
>> fudge scanout position in a way so that the timestamp gets computed for
>> the soon-to-begin vblank, not the old one.
>
> The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
> that if you call .get_scanout_positon() from a non-irq context between
> the irq firing and start of vblank, you'll think you're still in the
> previous frame. Hence my suggestion to note down the frame counter when
> called from the irq, and then keep doing the fudging until you've truly
> crossed the start of vblank.
>

Ok, but why would that be a bad thing? I think we want it to think it is 
in the previous frame if it is called outside the vblank irq context. 
The only reason we fudge it to the next frames vblank if i vblank irq is 
because we know the vblank irq handler we are executing atm. was meant 
to execute within the upcoming vblank for the next frame, so we fudge 
the scanout positions and thereby timestamp to correspond to that new 
frame. But if something called outside irq context it should get a 
scanout position/timestamp that corresponds to "reality".

It would maybe be a problem to get different answers for a query at the 
same scanout position if we used .get_scanout_position() for something 
else than for calculating vblank timestamps, but we don't do that atm.

Maybe i'm overlooking something here related to the somewhat rewritten 
code from the last year or so? But in the original design this would be 
exactly what i intended?

...

>> So it's good enough for typical desktop
>> applications/compositors/games/media players, and a nice improvement
>> over the previous state, but not quite sufficient for applications that
>> need long time consistent vblank counts for long waits of multiple
>> seconds or even minutes. So it's still good to have hw counters if we
>> can get some that are reliable enough.
>
> Ah, I didn't realize you care about small errors in the counter for
> such long periods of vblank off.
>

Actually, you are right, i was stupid - not enough sleep last friday. I 
do care about such small errors, but the long vblank off periods don't 
matter at all the way my software works. I query the current count and 
timestamp (glXGetSyncValuesOML), calculate a target count based on those 
and then schedule a swap for that target count via glXSwapBuffersMscOML. 
That swapbuffers call will keep vblank irqs on until the kms pageflip is 
queued. So i only care about vblank counts/timestamps being consistent 
for short amounts of time, typically 1 video refresh from vblank query 
to queueing a vblank event, and then from reception of that 
event/queuing the pageflip to pageflip completion event. So even if the 
system would be heavily loaded and my code would have big preemption 
delays i think counts that are consistent over a few seconds would be 
enough to keep things working. Otherwise it wouldn't work now either 
with a vblank off after 5 seconds and nouveau not having vblank hw counters.

-mario

>>
>> -mario
>>
>>

 -mario

>>
>> It almost sort of works on the rs600 code path, but i need a bit of info
>> from you:
>>
>> 1. There's this register from the old specs for m76.pdf, which is not
>> part of the current register defines for radeon-kms:
>>
>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>
>> It contains the lower 16 bits of framecounter and the 13 bits of
>> vertical scanout position. It seems to give the same readings as the 24
>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>> would come handy.
>>
>> Does Evergreen and later have a same/similar register and where is it?
>>
>> 2. The hw framecounter seems to increment when the vertical scanout
>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>> gpu i tested so far. Is this so on all asics? And is the hw counter
>> increment happening exactly at the moment that vertical scanout position
>> jumps back to zero, ie. both events are driven by the same signal? Or is
>> the framecounter increment just happening somewhere inside either
>> scanline VTOTAL-1 or scanline 0?
>>
>>
>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>> regression and with a bit of luck at the same time improve by being able
>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
>> to get turned off more aggressively for a bit of extra power saving.
>>
>> thanks,
>> -mario
>
>> -- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct 
>> drm_device

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-23 Thread Harry Wentland

Hi Mario,

when we've had issues with this on amdgpu Christian fixed it by enabling 
page flip irq all the time, rather than turning it on when usermode 
request a flip and turning it back off after we handled it. I believe 
that fix exists on radeon already. Michel should have more info on that.

See other comments inline.

Thanks,
Harry


On 2015-11-23 11:02 AM, Mario Kleiner wrote:
> On 11/20/2015 04:42 AM, Alex Deucher wrote:
>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>>  wrote:
>>> Hi Alex and Michel and Ville,
>>>
>>> it's "fix vblank stuff" time again ;-)
>>
>> Adding Harry from our display team.  He might be able to fill in the
>> blanks of on some of this better than I can.  It might also be worth
>> checking to see how our DAL (our new display code which is being
>> developed directly by our display team) code handles this.  It could
>> be that we are just missing register settings:
>
> Thanks Alex! And hello Harry :)
>
>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
>
> I'll have a look at this.
>
>> Additionally we've published full registers headers for the display 
>> block:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
>>  
>>
>> The DCE8 stuff should generally apply back to DCE4.  If you have
>> questions about registers older asics not covered in the hw docs, let
>> me know.  Note the new headers are dword aligned rather than byte
>> aligned.
>>
>
> I've tested now with two different progressive modes on DCE3 and one 
> progressive mode on DCE4, the only cards i have atm. So far it seems 
> that the framecounter indeed increments when the vpos from the scanout 
> position query jumps back to zero. Attached for reference is my 
> current patch to radeon-kms. This one seems to work reliably so far, 
> also if i enable the immediate vblank irq disable which we so far only 
> used on intel-kms.
>
> But according to this patch the framecounter increment happens 
> somewhere in the middle of vblank.
>
> Essentially from the vpos extracted from 
> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start" 
> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 
> the framecounter stays on the count of the old previous scanout cycle. 
> Then when vpos wraps to zero the framecounter increments by 1. And 
> then we have another couple of dozen lines inside vblank until 
> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and 
> entering active scanout for the new frame.
>
> So the position of observed framecounter increment seems to be not 
> close to the end of vblank ("start of frame" indicator?), but a couple 
> of scanlines after start of vblank.
>
> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, 
> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the 
> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and 
> the framecounter increments by 1, then 38 scanlines later the vblank 
> ends.
>
> So i seem to have something that seems to work in practice and this 
> "increment framecounter if vpos wraps back to zero" behavior makes 
> some sense. It just doesn't conform to what those descriptions for 
> start_line and "start of frame" indicator describe?
>
This is correct. Our HW doesn't really have a vblank counter but a frame 
counter. The framecounter increments at the start of vsync, which is 
when we wrap to zero which doesn't coincide with the start of vblank.

What we're trying to do with get_vblank_counter isn't the same as what 
framecount gives us, but we could probably do something like:

if (get_scanout_pos > vblank_start)
   return frame_count + 1;
else
   return frame_count;

Harry

> I'll test with a few more video modes.
>
> thanks,
> -mario
>
>
>>>
>>> Ville's changes to the DRM's drm_handle_vblank() / 
>>> drm_update_vblank_count()
>>> code in Linux 4.4 not only made that code more elegant, but also 
>>> removed the
>>> robustness against the vblank irq quirks in AMD hw and similar 
>>> hardware. So
>>> now i get tons of off-by-one errors and
>>>
>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>>> completion event has impossible msc 24803 < target_msc 24804" XOrg 
>>> messages
>>> from that kernel.
>>>
>>> One of the reasons for trouble is that AMD hw quirk where the hw 
>>> fires an
>>> extra vblank irq shortly after vblank irq's get enabled, not 
>>> synchronized to
>>> vblank, but typically in the middle of active scanout, so we get a 
>>> redundant
>>> call to drm_handle_vblank in the middle of scanout.
>>>
>>> To fix that i have a minor patch to make drm_update_vblank_count() 
>>> again
>>> robust against such redundant calls, which i will send out later to the
>>> mailing list. Diff attached for reference.
>>>
>>> The second quirk of AMD hw is that the vblank interrupt fires a few
>>> scanlines before start of vblank, so drm_handle_vblank ->
>>> drm_update_vblank_count() ->

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-20 Thread Ville Syrjälä

On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
> 
> On 11/19/2015 08:45 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
> >> On 11/19/2015 07:20 PM, Ville SyrjÃ¤lÃ¤ wrote:
> >>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
>  Hi Alex and Michel and Ville,
> 
>  it's "fix vblank stuff" time again ;-)
> 
>  Ville's changes to the DRM's drm_handle_vblank() /
>  drm_update_vblank_count() code in Linux 4.4 not only made that code more
>  elegant, but also removed the robustness against the vblank irq quirks
>  in AMD hw and similar hardware. So now i get tons of off-by-one errors 
>  and
> 
>  "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>  completion event has impossible msc 24803 < target_msc 24804" XOrg
>  messages from that kernel.
> >>>
> >>> Argh. Sorry about that.
> >>>
> >>
> >> On the plus side, your "vblank timestamp deltas as fake vblank counters"
> >> code seems to work nicely on nouveau-kms, as far as testing with three
> >> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and
> >> Ivybridge) i tested checked out nicely.
> >>
> >> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank
> >> counters which we could implement in nouveau, maybe with the same
> >> trickery to allow long trouble-free vblank off periods, hopefully that
> >> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3
> >> will probably also work. I think i read in the Tegra-3 PRM that the sync
> >> points they use to implement vblank counters do increment at leading
> >> edge of vblank.
> >>
> >> The only problem we may have is that some of the embedded gpus may not
> >> have compliant vblank counters and they probably also lack vblank
> >> timestamping, so it might be a good idea to rather not use vblank
> >> counters at all in those drivers - patch their kms drivers to
> >> max_vblank_count = 0;
> >>
> 
>  One of the reasons for trouble is that AMD hw quirk where the hw fires
>  an extra vblank irq shortly after vblank irq's get enabled, not
>  synchronized to vblank, but typically in the middle of active scanout,
>  so we get a redundant call to drm_handle_vblank in the middle of scanout.
> >>>
> >>> I think that should be fine as such. The code should ignore redudntant
> >>> vbl irqs. Well, assuming you have a reliable hw counter or you use the
> >>> timestamp guesstimate mechanism and your scanout position is reported
> >>> accurately. But I guess you have a bit of problem with both.
> >>>
> >>
> >> The problem is i'll need to treat calls to radeon kms
> >> driver->get_vblank_counter differently, depending if the function gets
> >> called from vblank irq, or from regular code, so that hw quirk that
> >> causes spontaneous misfiring of the vblank irq in the middle of scanout
> >> would confuse my hw vblank counter cooking method to produce a fake hw
> >> vblank counter increment. That's why i moved the filtering for redundant
> >> irqs based on vblank timestamps in drm_vblank_update() around to always
> >> apply. Makes us robust against that type of hw quirk in general and
> >> makes life for the vblank counter cooking so much easier.
> >>
> >> It's a beautiful collaboration of different hw bugs to make things
> >> interesting :)
> >>
> 
>  To fix that i have a minor patch to make drm_update_vblank_count() again
>  robust against such redundant calls, which i will send out later to the
>  mailing list. Diff attached for reference.
> 
>  The second quirk of AMD hw is that the vblank interrupt fires a few
>  scanlines before start of vblank, so drm_handle_vblank ->
>  drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>  called before the start of the vblank for which the new vblank count
>  should be queried.
> >>>
> >>> Does it fire too soon, or is the scanout position register value(s)
> >>> just offset by a few lines perhaps?
> >>>
> >>> We have that with i915 and I simply fix up the value when reading it
> >>> out. Fortunately for us the offset is constant (or at least seems to
> >>> be) for a given platform/connector combo.
> >>>
> >>
> >> I think they fire too soon, from all i've seen so far on a few cards.
> >
> > That's unfortunate. Firing a bit too late would be perfectly fine for
> > most things. And that's actually what happens on older intel hw. Firing
> > too soon opens up some more races, as in you may have to wait a bit
> > more after getting woken up to make sure you've crossed into the
> > freame you were waiting for.
> >
> > Also not sure how to deal with that sort of thing in a reasonable way in
> > .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it
> > could fudge things a bit when the early irq arrives, but then if someone
> > calls drm_update_vblank_count() after the irq was handled but before
> > start of

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-20 Thread Mario Kleiner



On 11/19/2015 08:45 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
>> On 11/19/2015 07:20 PM, Ville SyrjÃ¤lÃ¤ wrote:
>>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
 Hi Alex and Michel and Ville,

 it's "fix vblank stuff" time again ;-)

 Ville's changes to the DRM's drm_handle_vblank() /
 drm_update_vblank_count() code in Linux 4.4 not only made that code more
 elegant, but also removed the robustness against the vblank irq quirks
 in AMD hw and similar hardware. So now i get tons of off-by-one errors and

 "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
 completion event has impossible msc 24803 < target_msc 24804" XOrg
 messages from that kernel.
>>>
>>> Argh. Sorry about that.
>>>
>>
>> On the plus side, your "vblank timestamp deltas as fake vblank counters"
>> code seems to work nicely on nouveau-kms, as far as testing with three
>> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and
>> Ivybridge) i tested checked out nicely.
>>
>> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank
>> counters which we could implement in nouveau, maybe with the same
>> trickery to allow long trouble-free vblank off periods, hopefully that
>> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3
>> will probably also work. I think i read in the Tegra-3 PRM that the sync
>> points they use to implement vblank counters do increment at leading
>> edge of vblank.
>>
>> The only problem we may have is that some of the embedded gpus may not
>> have compliant vblank counters and they probably also lack vblank
>> timestamping, so it might be a good idea to rather not use vblank
>> counters at all in those drivers - patch their kms drivers to
>> max_vblank_count = 0;
>>

 One of the reasons for trouble is that AMD hw quirk where the hw fires
 an extra vblank irq shortly after vblank irq's get enabled, not
 synchronized to vblank, but typically in the middle of active scanout,
 so we get a redundant call to drm_handle_vblank in the middle of scanout.
>>>
>>> I think that should be fine as such. The code should ignore redudntant
>>> vbl irqs. Well, assuming you have a reliable hw counter or you use the
>>> timestamp guesstimate mechanism and your scanout position is reported
>>> accurately. But I guess you have a bit of problem with both.
>>>
>>
>> The problem is i'll need to treat calls to radeon kms
>> driver->get_vblank_counter differently, depending if the function gets
>> called from vblank irq, or from regular code, so that hw quirk that
>> causes spontaneous misfiring of the vblank irq in the middle of scanout
>> would confuse my hw vblank counter cooking method to produce a fake hw
>> vblank counter increment. That's why i moved the filtering for redundant
>> irqs based on vblank timestamps in drm_vblank_update() around to always
>> apply. Makes us robust against that type of hw quirk in general and
>> makes life for the vblank counter cooking so much easier.
>>
>> It's a beautiful collaboration of different hw bugs to make things
>> interesting :)
>>

 To fix that i have a minor patch to make drm_update_vblank_count() again
 robust against such redundant calls, which i will send out later to the
 mailing list. Diff attached for reference.

 The second quirk of AMD hw is that the vblank interrupt fires a few
 scanlines before start of vblank, so drm_handle_vblank ->
 drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
 called before the start of the vblank for which the new vblank count
 should be queried.
>>>
>>> Does it fire too soon, or is the scanout position register value(s)
>>> just offset by a few lines perhaps?
>>>
>>> We have that with i915 and I simply fix up the value when reading it
>>> out. Fortunately for us the offset is constant (or at least seems to
>>> be) for a given platform/connector combo.
>>>
>>
>> I think they fire too soon, from all i've seen so far on a few cards.
>
> That's unfortunate. Firing a bit too late would be perfectly fine for
> most things. And that's actually what happens on older intel hw. Firing
> too soon opens up some more races, as in you may have to wait a bit
> more after getting woken up to make sure you've crossed into the
> freame you were waiting for.
>
> Also not sure how to deal with that sort of thing in a reasonable way in
> .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it
> could fudge things a bit when the early irq arrives, but then if someone
> calls drm_update_vblank_count() after the irq was handled but before
> start of vblank you'll end up with a -1 diff.
>
> Maybe something like:
> .get_scanout_positon()
> {
>   read frame counter and scaline position
>
>   if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) &&
>

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Alex Deucher

On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
 wrote:
> Hi Alex and Michel and Ville,
>
> it's "fix vblank stuff" time again ;-)

Adding Harry from our display team.  He might be able to fill in the
blanks of on some of this better than I can.  It might also be worth
checking to see how our DAL (our new display code which is being
developed directly by our display team) code handles this.  It could
be that we are just missing register settings:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
Additionally we've published full registers headers for the display block:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
The DCE8 stuff should generally apply back to DCE4.  If you have
questions about registers older asics not covered in the hw docs, let
me know.  Note the new headers are dword aligned rather than byte
aligned.

>
> Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_count()
> code in Linux 4.4 not only made that code more elegant, but also removed the
> robustness against the vblank irq quirks in AMD hw and similar hardware. So
> now i get tons of off-by-one errors and
>
> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
> completion event has impossible msc 24803 < target_msc 24804" XOrg messages
> from that kernel.
>
> One of the reasons for trouble is that AMD hw quirk where the hw fires an
> extra vblank irq shortly after vblank irq's get enabled, not synchronized to
> vblank, but typically in the middle of active scanout, so we get a redundant
> call to drm_handle_vblank in the middle of scanout.
>
> To fix that i have a minor patch to make drm_update_vblank_count() again
> robust against such redundant calls, which i will send out later to the
> mailing list. Diff attached for reference.
>
> The second quirk of AMD hw is that the vblank interrupt fires a few
> scanlines before start of vblank, so drm_handle_vblank ->
> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets called
> before the start of the vblank for which the new vblank count should be
> queried.
>
> The third problem is that the DRM vblank handling always had the assumption
> that hardware vblank counters would essentially increment at leading edge of
> vblank - basically in sync with the firing of the vblank irq, so that a hw
> counter readout from within the vblank irq handler would always deliver the
> new incremented value. If this assumption is violated then the counting by
> use of the hw counter gets unreliable, because depending on random small
> delays in irq handling the code may end up sampling the hw counter pre- or
> post-increment, leading to inconsistent updating and funky bugs. It just so
> happens that AMD hardware doesn't increment the hw counter at leading edge
> of vblank, so stuff falls apart.
>
> So to fix those two problems i'm tinkering with cooking the hw vblank
> counter value returned by radeon_get_vblank_counter_kms() to make it appear
> as if the counter incremented at leading edge of vblank in sync with vblank
> irq.
>
> It almost sort of works on the rs600 code path, but i need a bit of info
> from you:
>
> 1. There's this register from the old specs for m76.pdf, which is not part
> of the current register defines for radeon-kms:
>
> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>
> It contains the lower 16 bits of framecounter and the 13 bits of vertical
> scanout position. It seems to give the same readings as the 24 bit
> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This would
> come handy.
>
> Does Evergreen and later have a same/similar register and where is it?

Yes.  CRTC_STATUS_VF_COUNT

CRTC:CRTC_STATUS_VF_COUNT  Â·  [R/W]  Â·  32 bits  Â·  Access: 8/16/32  Â·
 [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2]
GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4]
GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c
DESCRIPTION: Current composite vertical and frame count for CRTC
Field Name   BitsDefault   Description
CRTC_VF_COUNT
(Access: R)28:00x0Reports current vertical and frame count

>
> 2. The hw framecounter seems to increment when the vertical scanout position
> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i tested so
> far. Is this so on all asics? And is the hw counter increment happening
> exactly at the moment that vertical scanout position jumps back to zero, ie.
> both events are driven by the same signal? Or is the framecounter increment
> just happening somewhere inside either scanline VTOTAL-1 or scanline 0?

Unless Harry knows, we can ask the hw team, but I doubt they would
have changed it.  I think it's tied to the start line which is
controlled by this register:
CRTC:CRTC_START_LINE_CONTROL  Â·  [R/W]  Â·  32 bits  Â·  Access: 8/16/32
 Â·  [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2]
GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4]

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Ville Syrjälä

On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
> On 11/19/2015 07:20 PM, Ville SyrjÃ¤lÃ¤ wrote:
> > On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
> >> Hi Alex and Michel and Ville,
> >>
> >> it's "fix vblank stuff" time again ;-)
> >>
> >> Ville's changes to the DRM's drm_handle_vblank() /
> >> drm_update_vblank_count() code in Linux 4.4 not only made that code more
> >> elegant, but also removed the robustness against the vblank irq quirks
> >> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
> >>
> >> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
> >> completion event has impossible msc 24803 < target_msc 24804" XOrg
> >> messages from that kernel.
> >
> > Argh. Sorry about that.
> >
> 
> On the plus side, your "vblank timestamp deltas as fake vblank counters" 
> code seems to work nicely on nouveau-kms, as far as testing with three 
> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and 
> Ivybridge) i tested checked out nicely.
> 
> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank 
> counters which we could implement in nouveau, maybe with the same 
> trickery to allow long trouble-free vblank off periods, hopefully that 
> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 
> will probably also work. I think i read in the Tegra-3 PRM that the sync 
> points they use to implement vblank counters do increment at leading 
> edge of vblank.
> 
> The only problem we may have is that some of the embedded gpus may not 
> have compliant vblank counters and they probably also lack vblank 
> timestamping, so it might be a good idea to rather not use vblank 
> counters at all in those drivers - patch their kms drivers to 
> max_vblank_count = 0;
> 
> >>
> >> One of the reasons for trouble is that AMD hw quirk where the hw fires
> >> an extra vblank irq shortly after vblank irq's get enabled, not
> >> synchronized to vblank, but typically in the middle of active scanout,
> >> so we get a redundant call to drm_handle_vblank in the middle of scanout.
> >
> > I think that should be fine as such. The code should ignore redudntant
> > vbl irqs. Well, assuming you have a reliable hw counter or you use the
> > timestamp guesstimate mechanism and your scanout position is reported
> > accurately. But I guess you have a bit of problem with both.
> >
> 
> The problem is i'll need to treat calls to radeon kms 
> driver->get_vblank_counter differently, depending if the function gets 
> called from vblank irq, or from regular code, so that hw quirk that 
> causes spontaneous misfiring of the vblank irq in the middle of scanout 
> would confuse my hw vblank counter cooking method to produce a fake hw 
> vblank counter increment. That's why i moved the filtering for redundant 
> irqs based on vblank timestamps in drm_vblank_update() around to always 
> apply. Makes us robust against that type of hw quirk in general and 
> makes life for the vblank counter cooking so much easier.
> 
> It's a beautiful collaboration of different hw bugs to make things 
> interesting :)
> 
> >>
> >> To fix that i have a minor patch to make drm_update_vblank_count() again
> >> robust against such redundant calls, which i will send out later to the
> >> mailing list. Diff attached for reference.
> >>
> >> The second quirk of AMD hw is that the vblank interrupt fires a few
> >> scanlines before start of vblank, so drm_handle_vblank ->
> >> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
> >> called before the start of the vblank for which the new vblank count
> >> should be queried.
> >
> > Does it fire too soon, or is the scanout position register value(s)
> > just offset by a few lines perhaps?
> >
> > We have that with i915 and I simply fix up the value when reading it
> > out. Fortunately for us the offset is constant (or at least seems to
> > be) for a given platform/connector combo.
> >
> 
> I think they fire too soon, from all i've seen so far on a few cards.

That's unfortunate. Firing a bit too late would be perfectly fine for
most things. And that's actually what happens on older intel hw. Firing
too soon opens up some more races, as in you may have to wait a bit
more after getting woken up to make sure you've crossed into the
freame you were waiting for.

Also not sure how to deal with that sort of thing in a reasonable way in
.get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it
could fudge things a bit when the early irq arrives, but then if someone
calls drm_update_vblank_count() after the irq was handled but before
start of vblank you'll end up with a -1 diff.

Maybe something like:
.get_scanout_positon()
{
read frame counter and scaline position

if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) &&
 scanline_slightly_below_start_of_vblank)) {
last_fudge_frame = frame;
scanline =

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Ville Syrjälä

On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
> Hi Alex and Michel and Ville,
> 
> it's "fix vblank stuff" time again ;-)
> 
> Ville's changes to the DRM's drm_handle_vblank() / 
> drm_update_vblank_count() code in Linux 4.4 not only made that code more 
> elegant, but also removed the robustness against the vblank irq quirks 
> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
> 
> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
> completion event has impossible msc 24803 < target_msc 24804" XOrg 
> messages from that kernel.

Argh. Sorry about that.

> 
> One of the reasons for trouble is that AMD hw quirk where the hw fires 
> an extra vblank irq shortly after vblank irq's get enabled, not 
> synchronized to vblank, but typically in the middle of active scanout, 
> so we get a redundant call to drm_handle_vblank in the middle of scanout.

I think that should be fine as such. The code should ignore redudntant
vbl irqs. Well, assuming you have a reliable hw counter or you use the
timestamp guesstimate mechanism and your scanout position is reported
accurately. But I guess you have a bit of problem with both.

> 
> To fix that i have a minor patch to make drm_update_vblank_count() again 
> robust against such redundant calls, which i will send out later to the 
> mailing list. Diff attached for reference.
> 
> The second quirk of AMD hw is that the vblank interrupt fires a few 
> scanlines before start of vblank, so drm_handle_vblank -> 
> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
> called before the start of the vblank for which the new vblank count 
> should be queried.

Does it fire too soon, or is the scanout position register value(s)
just offset by a few lines perhaps?

We have that with i915 and I simply fix up the value when reading it
out. Fortunately for us the offset is constant (or at least seems to
be) for a given platform/connector combo.

> 
> The third problem is that the DRM vblank handling always had the 
> assumption that hardware vblank counters would essentially increment at 
> leading edge of vblank - basically in sync with the firing of the vblank 
> irq, so that a hw counter readout from within the vblank irq handler 
> would always deliver the new incremented value. If this assumption is 
> violated then the counting by use of the hw counter gets unreliable, 
> because depending on random small delays in irq handling the code may 
> end up sampling the hw counter pre- or post-increment, leading to 
> inconsistent updating and funky bugs. It just so happens that AMD 
> hardware doesn't increment the hw counter at leading edge of vblank, so 
> stuff falls apart.
> 
> So to fix those two problems i'm tinkering with cooking the hw vblank 
> counter value returned by radeon_get_vblank_counter_kms() to make it 
> appear as if the counter incremented at leading edge of vblank in sync 
> with vblank irq.

Yeah, I do that in i915 too. Well, on some platforms where the
counter increments at the leading edge of active.

> 
> It almost sort of works on the rs600 code path, but i need a bit of info 
> from you:
> 
> 1. There's this register from the old specs for m76.pdf, which is not 
> part of the current register defines for radeon-kms:
> 
> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> 
> It contains the lower 16 bits of framecounter and the 13 bits of 
> vertical scanout position. It seems to give the same readings as the 24 
> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
> would come handy.
> 
> Does Evergreen and later have a same/similar register and where is it?
> 
> 2. The hw framecounter seems to increment when the vertical scanout 
> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
> gpu i tested so far. Is this so on all asics? And is the hw counter 
> increment happening exactly at the moment that vertical scanout position 
> jumps back to zero, ie. both events are driven by the same signal? Or is 
> the framecounter increment just happening somewhere inside either 
> scanline VTOTAL-1 or scanline 0?
> 
> 
> If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
> regression and with a bit of luck at the same time improve by being able 
> to set dev->vblank_disable_immediate = true then and allow vblank irqs 
> to get turned off more aggressively for a bit of extra power saving.
> 
> thanks,
> -mario

> -- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, unsigned int pipe,
> unsigned long flags)
>  {
> struct drm_vblank_crtc *vblank = >vblank[pipe];
> -   u32 cur_vblank, diff;
> +   u32 cur_vblank, diff = 0;
> bool rc;
> struct timeval t_vblank;
> +   const struct timeval *t_old;
> +   u64 diff_ns;
> int count =

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Mario Kleiner

On 11/19/2015 07:20 PM, Ville SyrjÃ¤lÃ¤ wrote:
> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
>> Hi Alex and Michel and Ville,
>>
>> it's "fix vblank stuff" time again ;-)
>>
>> Ville's changes to the DRM's drm_handle_vblank() /
>> drm_update_vblank_count() code in Linux 4.4 not only made that code more
>> elegant, but also removed the robustness against the vblank irq quirks
>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
>>
>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>> messages from that kernel.
>
> Argh. Sorry about that.
>

On the plus side, your "vblank timestamp deltas as fake vblank counters" 
code seems to work nicely on nouveau-kms, as far as testing with three 
Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and 
Ivybridge) i tested checked out nicely.

And at least the recent nv50+ NVidia Tesla also have 16 bit vblank 
counters which we could implement in nouveau, maybe with the same 
trickery to allow long trouble-free vblank off periods, hopefully that 
would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 
will probably also work. I think i read in the Tegra-3 PRM that the sync 
points they use to implement vblank counters do increment at leading 
edge of vblank.

The only problem we may have is that some of the embedded gpus may not 
have compliant vblank counters and they probably also lack vblank 
timestamping, so it might be a good idea to rather not use vblank 
counters at all in those drivers - patch their kms drivers to 
max_vblank_count = 0;

>>
>> One of the reasons for trouble is that AMD hw quirk where the hw fires
>> an extra vblank irq shortly after vblank irq's get enabled, not
>> synchronized to vblank, but typically in the middle of active scanout,
>> so we get a redundant call to drm_handle_vblank in the middle of scanout.
>
> I think that should be fine as such. The code should ignore redudntant
> vbl irqs. Well, assuming you have a reliable hw counter or you use the
> timestamp guesstimate mechanism and your scanout position is reported
> accurately. But I guess you have a bit of problem with both.
>

The problem is i'll need to treat calls to radeon kms 
driver->get_vblank_counter differently, depending if the function gets 
called from vblank irq, or from regular code, so that hw quirk that 
causes spontaneous misfiring of the vblank irq in the middle of scanout 
would confuse my hw vblank counter cooking method to produce a fake hw 
vblank counter increment. That's why i moved the filtering for redundant 
irqs based on vblank timestamps in drm_vblank_update() around to always 
apply. Makes us robust against that type of hw quirk in general and 
makes life for the vblank counter cooking so much easier.

It's a beautiful collaboration of different hw bugs to make things 
interesting :)

>>
>> To fix that i have a minor patch to make drm_update_vblank_count() again
>> robust against such redundant calls, which i will send out later to the
>> mailing list. Diff attached for reference.
>>
>> The second quirk of AMD hw is that the vblank interrupt fires a few
>> scanlines before start of vblank, so drm_handle_vblank ->
>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>> called before the start of the vblank for which the new vblank count
>> should be queried.
>
> Does it fire too soon, or is the scanout position register value(s)
> just offset by a few lines perhaps?
>
> We have that with i915 and I simply fix up the value when reading it
> out. Fortunately for us the offset is constant (or at least seems to
> be) for a given platform/connector combo.
>

I think they fire too soon, from all i've seen so far on a few cards.

>>
>> The third problem is that the DRM vblank handling always had the
>> assumption that hardware vblank counters would essentially increment at
>> leading edge of vblank - basically in sync with the firing of the vblank
>> irq, so that a hw counter readout from within the vblank irq handler
>> would always deliver the new incremented value. If this assumption is
>> violated then the counting by use of the hw counter gets unreliable,
>> because depending on random small delays in irq handling the code may
>> end up sampling the hw counter pre- or post-increment, leading to
>> inconsistent updating and funky bugs. It just so happens that AMD
>> hardware doesn't increment the hw counter at leading edge of vblank, so
>> stuff falls apart.
>>
>> So to fix those two problems i'm tinkering with cooking the hw vblank
>> counter value returned by radeon_get_vblank_counter_kms() to make it
>> appear as if the counter incremented at leading edge of vblank in sync
>> with vblank irq.
>
> Yeah, I do that in i915 too. Well, on some platforms where the
> counter increments at the leading edge of active.
>

Yep. I think that's key to make the vblank instant off stuff work

Funky new vblank counter regressions in Linux 4.4-rc1

2015-11-19 Thread Mario Kleiner

Hi Alex and Michel and Ville,

it's "fix vblank stuff" time again ;-)

Ville's changes to the DRM's drm_handle_vblank() / 
drm_update_vblank_count() code in Linux 4.4 not only made that code more 
elegant, but also removed the robustness against the vblank irq quirks 
in AMD hw and similar hardware. So now i get tons of off-by-one errors and

"[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
completion event has impossible msc 24803 < target_msc 24804" XOrg 
messages from that kernel.

One of the reasons for trouble is that AMD hw quirk where the hw fires 
an extra vblank irq shortly after vblank irq's get enabled, not 
synchronized to vblank, but typically in the middle of active scanout, 
so we get a redundant call to drm_handle_vblank in the middle of scanout.

To fix that i have a minor patch to make drm_update_vblank_count() again 
robust against such redundant calls, which i will send out later to the 
mailing list. Diff attached for reference.

The second quirk of AMD hw is that the vblank interrupt fires a few 
scanlines before start of vblank, so drm_handle_vblank -> 
drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
called before the start of the vblank for which the new vblank count 
should be queried.

The third problem is that the DRM vblank handling always had the 
assumption that hardware vblank counters would essentially increment at 
leading edge of vblank - basically in sync with the firing of the vblank 
irq, so that a hw counter readout from within the vblank irq handler 
would always deliver the new incremented value. If this assumption is 
violated then the counting by use of the hw counter gets unreliable, 
because depending on random small delays in irq handling the code may 
end up sampling the hw counter pre- or post-increment, leading to 
inconsistent updating and funky bugs. It just so happens that AMD 
hardware doesn't increment the hw counter at leading edge of vblank, so 
stuff falls apart.

So to fix those two problems i'm tinkering with cooking the hw vblank 
counter value returned by radeon_get_vblank_counter_kms() to make it 
appear as if the counter incremented at leading edge of vblank in sync 
with vblank irq.

It almost sort of works on the rs600 code path, but i need a bit of info 
from you:

1. There's this register from the old specs for m76.pdf, which is not 
part of the current register defines for radeon-kms:

"D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"

It contains the lower 16 bits of framecounter and the 13 bits of 
vertical scanout position. It seems to give the same readings as the 24 
bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
would come handy.

Does Evergreen and later have a same/similar register and where is it?

2. The hw framecounter seems to increment when the vertical scanout 
position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
gpu i tested so far. Is this so on all asics? And is the hw counter 
increment happening exactly at the moment that vertical scanout position 
jumps back to zero, ie. both events are driven by the same signal? Or is 
the framecounter increment just happening somewhere inside either 
scanline VTOTAL-1 or scanline 0?


If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
regression and with a bit of luck at the same time improve by being able 
to set dev->vblank_disable_immediate = true then and allow vblank irqs 
to get turned off more aggressively for a bit of extra power saving.

thanks,
-mario
-- next part --
A non-text attachment was scrubbed...
Name: fixupForDRM.patch
Type: text/x-patch
Size: 3373 bytes
Desc: not available
URL:

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

Funky new vblank counter regressions in Linux 4.4-rc1

24 matches

Site Navigation

Mail list logo

Footer information