Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-05-26 Thread Axel Davy

hi,

I haven't done enough testing to have a full understanding of the issue, 
but it seems there could be a regression with gallium nine.


One user is reporting on irc huge slowdowns with d3d8 games (using the 
d3d8to9 wrapper) when using vsync. Reverting fixes the issues.


I don't know exactly why vsync would have issues.

Another thing though is that many d3d9 games do their own throttling, 
and if our throttling is stricter, maybe the game multithreading could 
be perturbed.

All in all, this needs more testing with gallium nine.

To prevent possible regression for next release, I'll force 
PIPE_CAP_MAX_FRAMES_IN_FLIGHT to be clamped to 2 at least for gallium nine.



Might be worth trying to see if gl could be affected.

Yours,

Axel

On 02/05/2019 03:19, Marek Olšák wrote:

If there is no other feedback, I'll push this tomorrow.

Marek

On Mon, Apr 29, 2019 at 6:12 PM Marek Olšák > wrote:


This patch might improve performance, because less submitted
unfinished work means less used memory by the unfinished work.

Marek

On Mon, Apr 29, 2019 at 11:07 AM Michel Dänzer mailto:mic...@daenzer.net>> wrote:

On 2019-04-27 6:13 p.m., Rob Clark wrote:
> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák
mailto:mar...@gmail.com>> wrote:
>>
>> From: Marek Olšák mailto:marek.ol...@amd.com>>
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>   (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>>     draw frame 2
>>     wait frame 0
>>     flush frame 2
>>     present frame 2
>>
>>     The input lag is 2 frames.
>>
>> Now:
>>     draw frame 2
>>     flush frame 2
>>     wait frame 1
>>     present frame 2
>>
>>     The input lag is 1 frame. Flushing is done before
waiting, because
>>     otherwise the device would be idle after waiting.
>>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>  src/gallium/auxiliary/util/u_screen.c         | 2 +-
>>  src/gallium/state_trackers/dri/dri_drawable.c | 20
+--
>>  2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c
b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@
u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>>     case PIPE_CAP_MAX_VARYINGS:
>>        return 8;
>>
>>     case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>        return 0;
>>
>>     case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>        return 0;
>>
>>     case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -      return 2;
>> +      return 1;
>
> would it be safer to leave the current default and let
drivers opt-in
> to the lower # individually?  I guess triple buffering would
still be
> better for some of the smaller gpu's?

This patch doesn't prevent triple buffering. The application
can still
prepare up to one frame worth of GPU commands before the GPU has
finished processing the commands of the previous frame (while the
pre-previous frame is being displayed).

I just think the term "in flight" should maybe be defined a
bit better,
but it's not a blocker and could be done in a follow-up patch.


-- 
Earthling Michel Dänzer               | https://www.amd.com

Libre software enthusiast             |             Mesa and X
developer


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-05-01 Thread Marek Olšák
If there is no other feedback, I'll push this tomorrow.

Marek

On Mon, Apr 29, 2019 at 6:12 PM Marek Olšák  wrote:

> This patch might improve performance, because less submitted unfinished
> work means less used memory by the unfinished work.
>
> Marek
>
> On Mon, Apr 29, 2019 at 11:07 AM Michel Dänzer  wrote:
>
>> On 2019-04-27 6:13 p.m., Rob Clark wrote:
>> > On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
>> >>
>> >> From: Marek Olšák 
>> >>
>> >> It's done by:
>> >> - decrease the number of frames in flight by 1
>> >> - flush before throttling in SwapBuffers
>> >>   (instead of wait-then-flush, do flush-then-wait)
>> >>
>> >> The improvement is apparent with Unigine Heaven.
>> >>
>> >> Previously:
>> >> draw frame 2
>> >> wait frame 0
>> >> flush frame 2
>> >> present frame 2
>> >>
>> >> The input lag is 2 frames.
>> >>
>> >> Now:
>> >> draw frame 2
>> >> flush frame 2
>> >> wait frame 1
>> >> present frame 2
>> >>
>> >> The input lag is 1 frame. Flushing is done before waiting, because
>> >> otherwise the device would be idle after waiting.
>> >>
>> >> Nine is affected because it also uses the pipe cap.
>> >> ---
>> >>  src/gallium/auxiliary/util/u_screen.c |  2 +-
>> >>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
>> >>  2 files changed, 11 insertions(+), 11 deletions(-)
>> >>
>> >> diff --git a/src/gallium/auxiliary/util/u_screen.c
>> b/src/gallium/auxiliary/util/u_screen.c
>> >> index 27f51e0898e..410f17421e6 100644
>> >> --- a/src/gallium/auxiliary/util/u_screen.c
>> >> +++ b/src/gallium/auxiliary/util/u_screen.c
>> >> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
>> pipe_screen *pscreen,
>> >> case PIPE_CAP_MAX_VARYINGS:
>> >>return 8;
>> >>
>> >> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>> >>return 0;
>> >>
>> >> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>> >>return 0;
>> >>
>> >> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> >> -  return 2;
>> >> +  return 1;
>> >
>> > would it be safer to leave the current default and let drivers opt-in
>> > to the lower # individually?  I guess triple buffering would still be
>> > better for some of the smaller gpu's?
>>
>> This patch doesn't prevent triple buffering. The application can still
>> prepare up to one frame worth of GPU commands before the GPU has
>> finished processing the commands of the previous frame (while the
>> pre-previous frame is being displayed).
>>
>> I just think the term "in flight" should maybe be defined a bit better,
>> but it's not a blocker and could be done in a follow-up patch.
>>
>>
>> --
>> Earthling Michel Dänzer   |  https://www.amd.com
>> Libre software enthusiast | Mesa and X developer
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-29 Thread Marek Olšák
This patch might improve performance, because less submitted unfinished
work means less used memory by the unfinished work.

Marek

On Mon, Apr 29, 2019 at 11:07 AM Michel Dänzer  wrote:

> On 2019-04-27 6:13 p.m., Rob Clark wrote:
> > On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
> >>
> >> From: Marek Olšák 
> >>
> >> It's done by:
> >> - decrease the number of frames in flight by 1
> >> - flush before throttling in SwapBuffers
> >>   (instead of wait-then-flush, do flush-then-wait)
> >>
> >> The improvement is apparent with Unigine Heaven.
> >>
> >> Previously:
> >> draw frame 2
> >> wait frame 0
> >> flush frame 2
> >> present frame 2
> >>
> >> The input lag is 2 frames.
> >>
> >> Now:
> >> draw frame 2
> >> flush frame 2
> >> wait frame 1
> >> present frame 2
> >>
> >> The input lag is 1 frame. Flushing is done before waiting, because
> >> otherwise the device would be idle after waiting.
> >>
> >> Nine is affected because it also uses the pipe cap.
> >> ---
> >>  src/gallium/auxiliary/util/u_screen.c |  2 +-
> >>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
> >>  2 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/src/gallium/auxiliary/util/u_screen.c
> b/src/gallium/auxiliary/util/u_screen.c
> >> index 27f51e0898e..410f17421e6 100644
> >> --- a/src/gallium/auxiliary/util/u_screen.c
> >> +++ b/src/gallium/auxiliary/util/u_screen.c
> >> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
> pipe_screen *pscreen,
> >> case PIPE_CAP_MAX_VARYINGS:
> >>return 8;
> >>
> >> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >>return 0;
> >>
> >> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >>return 0;
> >>
> >> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> >> -  return 2;
> >> +  return 1;
> >
> > would it be safer to leave the current default and let drivers opt-in
> > to the lower # individually?  I guess triple buffering would still be
> > better for some of the smaller gpu's?
>
> This patch doesn't prevent triple buffering. The application can still
> prepare up to one frame worth of GPU commands before the GPU has
> finished processing the commands of the previous frame (while the
> pre-previous frame is being displayed).
>
> I just think the term "in flight" should maybe be defined a bit better,
> but it's not a blocker and could be done in a follow-up patch.
>
>
> --
> Earthling Michel Dänzer   |  https://www.amd.com
> Libre software enthusiast | Mesa and X developer
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-29 Thread Michel Dänzer
On 2019-04-27 6:13 p.m., Rob Clark wrote:
> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
>>
>> From: Marek Olšák 
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>   (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>> draw frame 2
>> wait frame 0
>> flush frame 2
>> present frame 2
>>
>> The input lag is 2 frames.
>>
>> Now:
>> draw frame 2
>> flush frame 2
>> wait frame 1
>> present frame 2
>>
>> The input lag is 1 frame. Flushing is done before waiting, because
>> otherwise the device would be idle after waiting.
>>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>  src/gallium/auxiliary/util/u_screen.c |  2 +-
>>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
>>  2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c 
>> b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
>> *pscreen,
>> case PIPE_CAP_MAX_VARYINGS:
>>return 8;
>>
>> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>return 0;
>>
>> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>return 0;
>>
>> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -  return 2;
>> +  return 1;
> 
> would it be safer to leave the current default and let drivers opt-in
> to the lower # individually?  I guess triple buffering would still be
> better for some of the smaller gpu's?

This patch doesn't prevent triple buffering. The application can still
prepare up to one frame worth of GPU commands before the GPU has
finished processing the commands of the previous frame (while the
pre-previous frame is being displayed).

I just think the term "in flight" should maybe be defined a bit better,
but it's not a blocker and could be done in a follow-up patch.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Marek Olšák
The input lag is maxframes*1000/fps ms. For 10 fps and 2 frames, the lag is
200 ms. You might also say that it's the amount of time it will take the
GPU to catch up with the CPU after fence_finish returns.

Hopefully that clears up performance concerns.

Marek

On Sat, Apr 27, 2019, 4:07 PM Axel Davy  wrote:

> On 27/04/2019 21:02, Rob Clark wrote:
> > On Sat, Apr 27, 2019 at 9:52 AM Axel Davy  wrote:
> >> On 27/04/2019 18:13, Rob Clark wrote:
> >>> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
>  From: Marek Olšák 
> 
>  It's done by:
>  - decrease the number of frames in flight by 1
>  - flush before throttling in SwapBuffers
>  (instead of wait-then-flush, do flush-then-wait)
> 
>  The improvement is apparent with Unigine Heaven.
> 
>  Previously:
>    draw frame 2
>    wait frame 0
>    flush frame 2
>    present frame 2
> 
>    The input lag is 2 frames.
> 
>  Now:
>    draw frame 2
>    flush frame 2
>    wait frame 1
>    present frame 2
> 
>    The input lag is 1 frame. Flushing is done before waiting,
> because
>    otherwise the device would be idle after waiting.
> 
>  Nine is affected because it also uses the pipe cap.
>  ---
> src/gallium/auxiliary/util/u_screen.c |  2 +-
> src/gallium/state_trackers/dri/dri_drawable.c | 20
> +--
> 2 files changed, 11 insertions(+), 11 deletions(-)
> 
>  diff --git a/src/gallium/auxiliary/util/u_screen.c
> b/src/gallium/auxiliary/util/u_screen.c
>  index 27f51e0898e..410f17421e6 100644
>  --- a/src/gallium/auxiliary/util/u_screen.c
>  +++ b/src/gallium/auxiliary/util/u_screen.c
>  @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
> pipe_screen *pscreen,
>    case PIPE_CAP_MAX_VARYINGS:
>   return 8;
> 
>    case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>   return 0;
> 
>    case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>   return 0;
> 
>    case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>  -  return 2;
>  +  return 1;
> >>> would it be safer to leave the current default and let drivers opt-in
> >>> to the lower # individually?  I guess triple buffering would still be
> >>> better for some of the smaller gpu's?
> >>>
> >>> disclaimer: I haven't tested this myself or looked very closely at the
> >>> dri code.. so could be misunderstanding something..
> >>>
> >>> BR,
> >>> -R
> >>
> >> I think I can shed some light on the issue to justify (or not) the
> change.
> >>
> >> If we don't do throttling and the CPU renders frames at a faster rate
> >> than what the GPU can render,
> >> the GPU can become quite late and cause huge frame lag.
> >>
> >> The throttling involves forcing a (CPU) wait when a frame is presented
> >> if the 'x' previous images have not finished rendering.
> >>
> >> The lower 'x', the lower the frame lag.
> >>
> >> However if 'x' is too low (waiting current frame is rendered for
> >> example), the GPU can be idle until the CPU is flushing new commands.
> >>
> >> Now there is a choice to be made for the value of 'x'. 1 or 2 are
> >> reasonable values.
> >>
> >> if 'x' is 1, we wait the previous frame was rendered when we present the
> >> current frame. For '2' we wait the frame before.
> >>
> >>
> >> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
> >> affected by the frame lag (as frames take longer to render).
> >>
> > I get the latency aspect.. but my comment was more about latency vs
> > framerate (or maybe more cynically, about games vs benchmarks :-P)
> >
> > BR,
> > -R
>
>
> As long at the GPU has work to do, performance should be maximized.
>
> However in the case I described below, if CPU and GPU render at about
> the same framerate and
> the framerate has some variations (whether it being the GPU taking more
> time for one frame, or the CPU),
> using more than 1 would give a bit better performance.
>
>
> Axel
>
>
>
> >
> >> However if a game is rendering at a very unstable framerate (some frames
> >> needing more work than others), using a value of 2 is safer
> >> to maximize performance. (As using a value of 1 would lead to wait if we
> >> get a frame that takes particularly long, as using 2 smooths that a bit)
> >>
> >>
> >> I remember years ago I had extremely unstable fps when using catalyst on
> >> Portal for example. But I believe it is more a driver issue than a game
> >> issue.
> >>
> >> If we assume games don't have unstable framerate, (which seems
> >> reasonable assumption), using 1 as default makes sense.
> >>
> >>
> >> If one wants to test experimentally for regression, the ideal test case
> >> if when the GPU renders at about the same framerate as the CPU feeds it.
> >>
> >>
> >>
> >> Axel
> >>
> >>
> >>
> >>
>
>
___
mesa-dev mailing 

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Axel Davy

On 27/04/2019 21:02, Rob Clark wrote:

On Sat, Apr 27, 2019 at 9:52 AM Axel Davy  wrote:

On 27/04/2019 18:13, Rob Clark wrote:

On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:

From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
(instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
  draw frame 2
  wait frame 0
  flush frame 2
  present frame 2

  The input lag is 2 frames.

Now:
  draw frame 2
  flush frame 2
  wait frame 1
  present frame 2

  The input lag is 1 frame. Flushing is done before waiting, because
  otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
   src/gallium/auxiliary/util/u_screen.c |  2 +-
   src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
   2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
  case PIPE_CAP_MAX_VARYINGS:
 return 8;

  case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
 return 0;

  case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
 return 0;

  case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-  return 2;
+  return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R


I think I can shed some light on the issue to justify (or not) the change.

If we don't do throttling and the CPU renders frames at a faster rate
than what the GPU can render,
the GPU can become quite late and cause huge frame lag.

The throttling involves forcing a (CPU) wait when a frame is presented
if the 'x' previous images have not finished rendering.

The lower 'x', the lower the frame lag.

However if 'x' is too low (waiting current frame is rendered for
example), the GPU can be idle until the CPU is flushing new commands.

Now there is a choice to be made for the value of 'x'. 1 or 2 are
reasonable values.

if 'x' is 1, we wait the previous frame was rendered when we present the
current frame. For '2' we wait the frame before.


Thus for smaller gpu's, a value of 1 is better than 2 as it is more
affected by the frame lag (as frames take longer to render).


I get the latency aspect.. but my comment was more about latency vs
framerate (or maybe more cynically, about games vs benchmarks :-P)

BR,
-R



As long at the GPU has work to do, performance should be maximized.

However in the case I described below, if CPU and GPU render at about 
the same framerate and
the framerate has some variations (whether it being the GPU taking more 
time for one frame, or the CPU),

using more than 1 would give a bit better performance.


Axel






However if a game is rendering at a very unstable framerate (some frames
needing more work than others), using a value of 2 is safer
to maximize performance. (As using a value of 1 would lead to wait if we
get a frame that takes particularly long, as using 2 smooths that a bit)


I remember years ago I had extremely unstable fps when using catalyst on
Portal for example. But I believe it is more a driver issue than a game
issue.

If we assume games don't have unstable framerate, (which seems
reasonable assumption), using 1 as default makes sense.


If one wants to test experimentally for regression, the ideal test case
if when the GPU renders at about the same framerate as the CPU feeds it.



Axel






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Rob Clark
On Sat, Apr 27, 2019 at 9:52 AM Axel Davy  wrote:
>
> On 27/04/2019 18:13, Rob Clark wrote:
> > On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
> >> From: Marek Olšák 
> >>
> >> It's done by:
> >> - decrease the number of frames in flight by 1
> >> - flush before throttling in SwapBuffers
> >>(instead of wait-then-flush, do flush-then-wait)
> >>
> >> The improvement is apparent with Unigine Heaven.
> >>
> >> Previously:
> >>  draw frame 2
> >>  wait frame 0
> >>  flush frame 2
> >>  present frame 2
> >>
> >>  The input lag is 2 frames.
> >>
> >> Now:
> >>  draw frame 2
> >>  flush frame 2
> >>  wait frame 1
> >>  present frame 2
> >>
> >>  The input lag is 1 frame. Flushing is done before waiting, because
> >>  otherwise the device would be idle after waiting.
> >>
> >> Nine is affected because it also uses the pipe cap.
> >> ---
> >>   src/gallium/auxiliary/util/u_screen.c |  2 +-
> >>   src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
> >>   2 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/src/gallium/auxiliary/util/u_screen.c 
> >> b/src/gallium/auxiliary/util/u_screen.c
> >> index 27f51e0898e..410f17421e6 100644
> >> --- a/src/gallium/auxiliary/util/u_screen.c
> >> +++ b/src/gallium/auxiliary/util/u_screen.c
> >> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
> >> *pscreen,
> >>  case PIPE_CAP_MAX_VARYINGS:
> >> return 8;
> >>
> >>  case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >> return 0;
> >>
> >>  case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >> return 0;
> >>
> >>  case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> >> -  return 2;
> >> +  return 1;
> > would it be safer to leave the current default and let drivers opt-in
> > to the lower # individually?  I guess triple buffering would still be
> > better for some of the smaller gpu's?
> >
> > disclaimer: I haven't tested this myself or looked very closely at the
> > dri code.. so could be misunderstanding something..
> >
> > BR,
> > -R
>
>
> I think I can shed some light on the issue to justify (or not) the change.
>
> If we don't do throttling and the CPU renders frames at a faster rate
> than what the GPU can render,
> the GPU can become quite late and cause huge frame lag.
>
> The throttling involves forcing a (CPU) wait when a frame is presented
> if the 'x' previous images have not finished rendering.
>
> The lower 'x', the lower the frame lag.
>
> However if 'x' is too low (waiting current frame is rendered for
> example), the GPU can be idle until the CPU is flushing new commands.
>
> Now there is a choice to be made for the value of 'x'. 1 or 2 are
> reasonable values.
>
> if 'x' is 1, we wait the previous frame was rendered when we present the
> current frame. For '2' we wait the frame before.
>
>
> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
> affected by the frame lag (as frames take longer to render).
>

I get the latency aspect.. but my comment was more about latency vs
framerate (or maybe more cynically, about games vs benchmarks :-P)

BR,
-R

>
> However if a game is rendering at a very unstable framerate (some frames
> needing more work than others), using a value of 2 is safer
> to maximize performance. (As using a value of 1 would lead to wait if we
> get a frame that takes particularly long, as using 2 smooths that a bit)
>
>
> I remember years ago I had extremely unstable fps when using catalyst on
> Portal for example. But I believe it is more a driver issue than a game
> issue.
>
> If we assume games don't have unstable framerate, (which seems
> reasonable assumption), using 1 as default makes sense.
>
>
> If one wants to test experimentally for regression, the ideal test case
> if when the GPU renders at about the same framerate as the CPU feeds it.
>
>
>
> Axel
>
>
>
> >
> >>  case PIPE_CAP_DMABUF:
> >>   #ifdef PIPE_OS_LINUX
> >> return 1;
> >>   #else
> >> return 0;
> >>   #endif
> >>
> >>  default:
> >> unreachable("bad PIPE_CAP_*");
> >> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
> >> b/src/gallium/state_trackers/dri/dri_drawable.c
> >> index 26bfdbecc53..c1de3bed9dd 100644
> >> --- a/src/gallium/state_trackers/dri/dri_drawable.c
> >> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
> >> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
> >>  *
> >>  * This pulls a fence off the throttling queue and waits for it if 
> >> the
> >>  * number of fences on the throttling queue has reached the desired
> >>  * number.
> >>  *
> >>  * Then flushes to insert a fence at the current rendering 
> >> position, and
> >>  * pushes that fence on the queue. This requires that the 
> >> st_context_iface
> >>  * flush method returns a fence even if there are no commands to 
> >> flush.
> >>  */
> >> struct 

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Axel Davy

On 27/04/2019 18:13, Rob Clark wrote:

On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:

From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
   (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
 draw frame 2
 wait frame 0
 flush frame 2
 present frame 2

 The input lag is 2 frames.

Now:
 draw frame 2
 flush frame 2
 wait frame 1
 present frame 2

 The input lag is 1 frame. Flushing is done before waiting, because
 otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
  src/gallium/auxiliary/util/u_screen.c |  2 +-
  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
  2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
 case PIPE_CAP_MAX_VARYINGS:
return 8;

 case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
return 0;

 case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
return 0;

 case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-  return 2;
+  return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R



I think I can shed some light on the issue to justify (or not) the change.

If we don't do throttling and the CPU renders frames at a faster rate 
than what the GPU can render,

the GPU can become quite late and cause huge frame lag.

The throttling involves forcing a (CPU) wait when a frame is presented 
if the 'x' previous images have not finished rendering.


The lower 'x', the lower the frame lag.

However if 'x' is too low (waiting current frame is rendered for 
example), the GPU can be idle until the CPU is flushing new commands.


Now there is a choice to be made for the value of 'x'. 1 or 2 are 
reasonable values.


if 'x' is 1, we wait the previous frame was rendered when we present the 
current frame. For '2' we wait the frame before.



Thus for smaller gpu's, a value of 1 is better than 2 as it is more 
affected by the frame lag (as frames take longer to render).



However if a game is rendering at a very unstable framerate (some frames 
needing more work than others), using a value of 2 is safer
to maximize performance. (As using a value of 1 would lead to wait if we 
get a frame that takes particularly long, as using 2 smooths that a bit)



I remember years ago I had extremely unstable fps when using catalyst on 
Portal for example. But I believe it is more a driver issue than a game 
issue.


If we assume games don't have unstable framerate, (which seems 
reasonable assumption), using 1 as default makes sense.



If one wants to test experimentally for regression, the ideal test case 
if when the GPU renders at about the same framerate as the CPU feeds it.




Axel






 case PIPE_CAP_DMABUF:
  #ifdef PIPE_OS_LINUX
return 1;
  #else
return 0;
  #endif

 default:
unreachable("bad PIPE_CAP_*");
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index 26bfdbecc53..c1de3bed9dd 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
 *
 * This pulls a fence off the throttling queue and waits for it if the
 * number of fences on the throttling queue has reached the desired
 * number.
 *
 * Then flushes to insert a fence at the current rendering position, and
 * pushes that fence on the queue. This requires that the 
st_context_iface
 * flush method returns a fence even if there are no commands to flush.
 */
struct pipe_screen *screen = drawable->screen->base.screen;
-  struct pipe_fence_handle *fence;
+  struct pipe_fence_handle *oldest_fence, *new_fence = NULL;

-  fence = swap_fences_pop_front(drawable);
-  if (fence) {
- (void) screen->fence_finish(screen, NULL, fence, 
PIPE_TIMEOUT_INFINITE);
- screen->fence_reference(screen, , NULL);
-  }
+  st->flush(st, flush_flags, _fence);

-  st->flush(st, flush_flags, );
+  oldest_fence = swap_fences_pop_front(drawable);
+  if (oldest_fence) {
+ screen->fence_finish(screen, NULL, oldest_fence, 
PIPE_TIMEOUT_INFINITE);
+ screen->fence_reference(screen, _fence, NULL);
+  }

-  

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Rob Clark
On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
>
> From: Marek Olšák 
>
> It's done by:
> - decrease the number of frames in flight by 1
> - flush before throttling in SwapBuffers
>   (instead of wait-then-flush, do flush-then-wait)
>
> The improvement is apparent with Unigine Heaven.
>
> Previously:
> draw frame 2
> wait frame 0
> flush frame 2
> present frame 2
>
> The input lag is 2 frames.
>
> Now:
> draw frame 2
> flush frame 2
> wait frame 1
> present frame 2
>
> The input lag is 1 frame. Flushing is done before waiting, because
> otherwise the device would be idle after waiting.
>
> Nine is affected because it also uses the pipe cap.
> ---
>  src/gallium/auxiliary/util/u_screen.c |  2 +-
>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
>  2 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_screen.c 
> b/src/gallium/auxiliary/util/u_screen.c
> index 27f51e0898e..410f17421e6 100644
> --- a/src/gallium/auxiliary/util/u_screen.c
> +++ b/src/gallium/auxiliary/util/u_screen.c
> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
> *pscreen,
> case PIPE_CAP_MAX_VARYINGS:
>return 8;
>
> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>return 0;
>
> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>return 0;
>
> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> -  return 2;
> +  return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R

>
> case PIPE_CAP_DMABUF:
>  #ifdef PIPE_OS_LINUX
>return 1;
>  #else
>return 0;
>  #endif
>
> default:
>unreachable("bad PIPE_CAP_*");
> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
> b/src/gallium/state_trackers/dri/dri_drawable.c
> index 26bfdbecc53..c1de3bed9dd 100644
> --- a/src/gallium/state_trackers/dri/dri_drawable.c
> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
> *
> * This pulls a fence off the throttling queue and waits for it if the
> * number of fences on the throttling queue has reached the desired
> * number.
> *
> * Then flushes to insert a fence at the current rendering position, 
> and
> * pushes that fence on the queue. This requires that the 
> st_context_iface
> * flush method returns a fence even if there are no commands to flush.
> */
>struct pipe_screen *screen = drawable->screen->base.screen;
> -  struct pipe_fence_handle *fence;
> +  struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
>
> -  fence = swap_fences_pop_front(drawable);
> -  if (fence) {
> - (void) screen->fence_finish(screen, NULL, fence, 
> PIPE_TIMEOUT_INFINITE);
> - screen->fence_reference(screen, , NULL);
> -  }
> +  st->flush(st, flush_flags, _fence);
>
> -  st->flush(st, flush_flags, );
> +  oldest_fence = swap_fences_pop_front(drawable);
> +  if (oldest_fence) {
> + screen->fence_finish(screen, NULL, oldest_fence, 
> PIPE_TIMEOUT_INFINITE);
> + screen->fence_reference(screen, _fence, NULL);
> +  }
>
> -  if (fence) {
> - swap_fences_push_back(drawable, fence);
> - screen->fence_reference(screen, , NULL);
> +  if (new_fence) {
> + swap_fences_push_back(drawable, new_fence);
> + screen->fence_reference(screen, _fence, NULL);
>}
> }
> else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
>st->flush(st, flush_flags, NULL);
> }
>
> if (drawable) {
>drawable->flushing = FALSE;
> }
>
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Axel Davy

On 26/04/2019 20:40, Marek Olšák wrote:
On Fri, Apr 26, 2019 at 12:56 PM Axel Davy > wrote:


On 26/04/2019 10:08, Michel Dänzer wrote:
> On 2019-04-26 4:06 a.m., Marek Olšák wrote:
>> From: Marek Olšák mailto:marek.ol...@amd.com>>
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>    (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>>      draw frame 2
>>      wait frame 0
>>      flush frame 2
>>      present frame 2
>>
>>      The input lag is 2 frames.
>>
>> Now:
>>      draw frame 2
>>      flush frame 2
>>      wait frame 1
>>      present frame 2
>>
>>      The input lag is 1 frame. Flushing is done before waiting,
because
>>      otherwise the device would be idle after waiting.
> Nice idea. Not sure offhand about all ramifications, but
certainly worth
> a go.
>
>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>   src/gallium/auxiliary/util/u_screen.c         | 2 +-
>>   src/gallium/state_trackers/dri/dri_drawable.c | 20
+--
>>   2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c
b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
pipe_screen *pscreen,
>>      case PIPE_CAP_MAX_VARYINGS:
>>         return 8;
>>
>>      case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>         return 0;
>>
>>      case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>         return 0;
>>
>>      case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -      return 2;
>> +      return 1;
> This might be slightly misleading, as there can still be two
frames in
> flight (on the GPU) at the same time. Might be better to leave
this at 2
> (so Nine isn't affected) and adjust its treatment in
> src/gallium/state_trackers/dri/dri_drawable.c .
>
>
Checking what gallium nine does currently, it seems we already do
flush
then wait,
however we call swap_fences_pop_front and swap_fences_push_back in
the
reverse order compared to your patch.
We compensate by taking PIPE_CAP_MAX_FRAMES_IN_FLIGHT + 1

In conclusion, with the proposed patch, gl and nine should have
the same
behaviour (and thus if gl benefits from a value of 1, nine should
as well).
I haven't have noticed input lag, I guess I have to test on heaven if
you see a difference.
How can I slow down my gpu to test that ? I use to use the
/sys/kernel/debug/dri/0/ vars to force low dpm, but it doesn't
seem to
be possible anymore as the related files are gone (rx480) ?


I set maximum settings, windowed, resolution: custom, and I type in 
the 4K resolution (I don't have a 4K monitor). When it's running, I 
enable wireframe. It should be pretty slow.


Marek



I couldn't notice any performance difference (card at low or high dpm - 
somehow the vars were still there, I was looking at the wrong place),
with and without the change and gallium nine. I couldn't notice a change 
in lag either (slowest I had was around 20 fps, which may not be the 
best to see that).


I'm fine with this change affecting nine.


Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-26 Thread Marek Olšák
On Fri, Apr 26, 2019 at 12:56 PM Axel Davy  wrote:

> On 26/04/2019 10:08, Michel Dänzer wrote:
> > On 2019-04-26 4:06 a.m., Marek Olšák wrote:
> >> From: Marek Olšák 
> >>
> >> It's done by:
> >> - decrease the number of frames in flight by 1
> >> - flush before throttling in SwapBuffers
> >>(instead of wait-then-flush, do flush-then-wait)
> >>
> >> The improvement is apparent with Unigine Heaven.
> >>
> >> Previously:
> >>  draw frame 2
> >>  wait frame 0
> >>  flush frame 2
> >>  present frame 2
> >>
> >>  The input lag is 2 frames.
> >>
> >> Now:
> >>  draw frame 2
> >>  flush frame 2
> >>  wait frame 1
> >>  present frame 2
> >>
> >>  The input lag is 1 frame. Flushing is done before waiting, because
> >>  otherwise the device would be idle after waiting.
> > Nice idea. Not sure offhand about all ramifications, but certainly worth
> > a go.
> >
> >
> >> Nine is affected because it also uses the pipe cap.
> >> ---
> >>   src/gallium/auxiliary/util/u_screen.c |  2 +-
> >>   src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
> >>   2 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/src/gallium/auxiliary/util/u_screen.c
> b/src/gallium/auxiliary/util/u_screen.c
> >> index 27f51e0898e..410f17421e6 100644
> >> --- a/src/gallium/auxiliary/util/u_screen.c
> >> +++ b/src/gallium/auxiliary/util/u_screen.c
> >> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
> pipe_screen *pscreen,
> >>  case PIPE_CAP_MAX_VARYINGS:
> >> return 8;
> >>
> >>  case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >> return 0;
> >>
> >>  case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >> return 0;
> >>
> >>  case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> >> -  return 2;
> >> +  return 1;
> > This might be slightly misleading, as there can still be two frames in
> > flight (on the GPU) at the same time. Might be better to leave this at 2
> > (so Nine isn't affected) and adjust its treatment in
> > src/gallium/state_trackers/dri/dri_drawable.c .
> >
> >
> Checking what gallium nine does currently, it seems we already do flush
> then wait,
> however we call swap_fences_pop_front and swap_fences_push_back in the
> reverse order compared to your patch.
> We compensate by taking PIPE_CAP_MAX_FRAMES_IN_FLIGHT + 1
>
> In conclusion, with the proposed patch, gl and nine should have the same
> behaviour (and thus if gl benefits from a value of 1, nine should as well).
> I haven't have noticed input lag, I guess I have to test on heaven if
> you see a difference.
> How can I slow down my gpu to test that ? I use to use the
> /sys/kernel/debug/dri/0/ vars to force low dpm, but it doesn't seem to
> be possible anymore as the related files are gone (rx480) ?
>

I set maximum settings, windowed, resolution: custom, and I type in the 4K
resolution (I don't have a 4K monitor). When it's running, I enable
wireframe. It should be pretty slow.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-26 Thread Marek Olšák
On Fri, Apr 26, 2019 at 4:08 AM Michel Dänzer  wrote:

> On 2019-04-26 4:06 a.m., Marek Olšák wrote:
> > From: Marek Olšák 
> >
> > It's done by:
> > - decrease the number of frames in flight by 1
> > - flush before throttling in SwapBuffers
> >   (instead of wait-then-flush, do flush-then-wait)
> >
> > The improvement is apparent with Unigine Heaven.
> >
> > Previously:
> > draw frame 2
> > wait frame 0
> > flush frame 2
> > present frame 2
> >
> > The input lag is 2 frames.
> >
> > Now:
> > draw frame 2
> > flush frame 2
> > wait frame 1
> > present frame 2
> >
> > The input lag is 1 frame. Flushing is done before waiting, because
> > otherwise the device would be idle after waiting.
>
> Nice idea. Not sure offhand about all ramifications, but certainly worth
> a go.
>
>
> > Nine is affected because it also uses the pipe cap.
> > ---
> >  src/gallium/auxiliary/util/u_screen.c |  2 +-
> >  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
> >  2 files changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/src/gallium/auxiliary/util/u_screen.c
> b/src/gallium/auxiliary/util/u_screen.c
> > index 27f51e0898e..410f17421e6 100644
> > --- a/src/gallium/auxiliary/util/u_screen.c
> > +++ b/src/gallium/auxiliary/util/u_screen.c
> > @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
> pipe_screen *pscreen,
> > case PIPE_CAP_MAX_VARYINGS:
> >return 8;
> >
> > case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >return 0;
> >
> > case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >return 0;
> >
> > case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> > -  return 2;
> > +  return 1;
>
> This might be slightly misleading, as there can still be two frames in
> flight (on the GPU) at the same time. Might be better to leave this at 2
> (so Nine isn't affected) and adjust its treatment in
> src/gallium/state_trackers/dri/dri_drawable.c .
>

When dri_flush is waiting, there can be at most 2 frames submitted and not
finished (that's why it's waiting). When dri_flush is done, there is at
most 1 frame submitted and not finished. The CAP is enforced by dri_flush.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-26 Thread Axel Davy

On 26/04/2019 10:08, Michel Dänzer wrote:

On 2019-04-26 4:06 a.m., Marek Olšák wrote:

From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
   (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
 draw frame 2
 wait frame 0
 flush frame 2
 present frame 2

 The input lag is 2 frames.

Now:
 draw frame 2
 flush frame 2
 wait frame 1
 present frame 2

 The input lag is 1 frame. Flushing is done before waiting, because
 otherwise the device would be idle after waiting.

Nice idea. Not sure offhand about all ramifications, but certainly worth
a go.



Nine is affected because it also uses the pipe cap.
---
  src/gallium/auxiliary/util/u_screen.c |  2 +-
  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
  2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
 case PIPE_CAP_MAX_VARYINGS:
return 8;
  
 case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:

return 0;
  
 case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:

return 0;
  
 case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:

-  return 2;
+  return 1;

This might be slightly misleading, as there can still be two frames in
flight (on the GPU) at the same time. Might be better to leave this at 2
(so Nine isn't affected) and adjust its treatment in
src/gallium/state_trackers/dri/dri_drawable.c .


Checking what gallium nine does currently, it seems we already do flush 
then wait,
however we call swap_fences_pop_front and swap_fences_push_back in the 
reverse order compared to your patch.

We compensate by taking PIPE_CAP_MAX_FRAMES_IN_FLIGHT + 1

In conclusion, with the proposed patch, gl and nine should have the same 
behaviour (and thus if gl benefits from a value of 1, nine should as well).
I haven't have noticed input lag, I guess I have to test on heaven if 
you see a difference.
How can I slow down my gpu to test that ? I use to use the 
/sys/kernel/debug/dri/0/ vars to force low dpm, but it doesn't seem to 
be possible anymore as the related files are gone (rx480) ?



Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-26 Thread Michel Dänzer
On 2019-04-26 4:06 a.m., Marek Olšák wrote:
> From: Marek Olšák 
> 
> It's done by:
> - decrease the number of frames in flight by 1
> - flush before throttling in SwapBuffers
>   (instead of wait-then-flush, do flush-then-wait)
> 
> The improvement is apparent with Unigine Heaven.
> 
> Previously:
> draw frame 2
> wait frame 0
> flush frame 2
> present frame 2
> 
> The input lag is 2 frames.
> 
> Now:
> draw frame 2
> flush frame 2
> wait frame 1
> present frame 2
> 
> The input lag is 1 frame. Flushing is done before waiting, because
> otherwise the device would be idle after waiting.

Nice idea. Not sure offhand about all ramifications, but certainly worth
a go.


> Nine is affected because it also uses the pipe cap.
> ---
>  src/gallium/auxiliary/util/u_screen.c |  2 +-
>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_screen.c 
> b/src/gallium/auxiliary/util/u_screen.c
> index 27f51e0898e..410f17421e6 100644
> --- a/src/gallium/auxiliary/util/u_screen.c
> +++ b/src/gallium/auxiliary/util/u_screen.c
> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
> *pscreen,
> case PIPE_CAP_MAX_VARYINGS:
>return 8;
>  
> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>return 0;
>  
> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>return 0;
>  
> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> -  return 2;
> +  return 1;

This might be slightly misleading, as there can still be two frames in
flight (on the GPU) at the same time. Might be better to leave this at 2
(so Nine isn't affected) and adjust its treatment in
src/gallium/state_trackers/dri/dri_drawable.c .


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-25 Thread Marek Olšák
From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
  (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
draw frame 2
wait frame 0
flush frame 2
present frame 2

The input lag is 2 frames.

Now:
draw frame 2
flush frame 2
wait frame 1
present frame 2

The input lag is 1 frame. Flushing is done before waiting, because
otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
 src/gallium/auxiliary/util/u_screen.c |  2 +-
 src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
case PIPE_CAP_MAX_VARYINGS:
   return 8;
 
case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
   return 0;
 
case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
   return 0;
 
case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-  return 2;
+  return 1;
 
case PIPE_CAP_DMABUF:
 #ifdef PIPE_OS_LINUX
   return 1;
 #else
   return 0;
 #endif
 
default:
   unreachable("bad PIPE_CAP_*");
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index 26bfdbecc53..c1de3bed9dd 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
*
* This pulls a fence off the throttling queue and waits for it if the
* number of fences on the throttling queue has reached the desired
* number.
*
* Then flushes to insert a fence at the current rendering position, and
* pushes that fence on the queue. This requires that the 
st_context_iface
* flush method returns a fence even if there are no commands to flush.
*/
   struct pipe_screen *screen = drawable->screen->base.screen;
-  struct pipe_fence_handle *fence;
+  struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
 
-  fence = swap_fences_pop_front(drawable);
-  if (fence) {
- (void) screen->fence_finish(screen, NULL, fence, 
PIPE_TIMEOUT_INFINITE);
- screen->fence_reference(screen, , NULL);
-  }
+  st->flush(st, flush_flags, _fence);
 
-  st->flush(st, flush_flags, );
+  oldest_fence = swap_fences_pop_front(drawable);
+  if (oldest_fence) {
+ screen->fence_finish(screen, NULL, oldest_fence, 
PIPE_TIMEOUT_INFINITE);
+ screen->fence_reference(screen, _fence, NULL);
+  }
 
-  if (fence) {
- swap_fences_push_back(drawable, fence);
- screen->fence_reference(screen, , NULL);
+  if (new_fence) {
+ swap_fences_push_back(drawable, new_fence);
+ screen->fence_reference(screen, _fence, NULL);
   }
}
else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
   st->flush(st, flush_flags, NULL);
}
 
if (drawable) {
   drawable->flushing = FALSE;
}
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev