Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Marek Olšák
The input lag is maxframes*1000/fps ms. For 10 fps and 2 frames, the lag is
200 ms. You might also say that it's the amount of time it will take the
GPU to catch up with the CPU after fence_finish returns.

Hopefully that clears up performance concerns.

Marek

On Sat, Apr 27, 2019, 4:07 PM Axel Davy  wrote:

> On 27/04/2019 21:02, Rob Clark wrote:
> > On Sat, Apr 27, 2019 at 9:52 AM Axel Davy  wrote:
> >> On 27/04/2019 18:13, Rob Clark wrote:
> >>> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
>  From: Marek Olšák 
> 
>  It's done by:
>  - decrease the number of frames in flight by 1
>  - flush before throttling in SwapBuffers
>  (instead of wait-then-flush, do flush-then-wait)
> 
>  The improvement is apparent with Unigine Heaven.
> 
>  Previously:
>    draw frame 2
>    wait frame 0
>    flush frame 2
>    present frame 2
> 
>    The input lag is 2 frames.
> 
>  Now:
>    draw frame 2
>    flush frame 2
>    wait frame 1
>    present frame 2
> 
>    The input lag is 1 frame. Flushing is done before waiting,
> because
>    otherwise the device would be idle after waiting.
> 
>  Nine is affected because it also uses the pipe cap.
>  ---
> src/gallium/auxiliary/util/u_screen.c |  2 +-
> src/gallium/state_trackers/dri/dri_drawable.c | 20
> +--
> 2 files changed, 11 insertions(+), 11 deletions(-)
> 
>  diff --git a/src/gallium/auxiliary/util/u_screen.c
> b/src/gallium/auxiliary/util/u_screen.c
>  index 27f51e0898e..410f17421e6 100644
>  --- a/src/gallium/auxiliary/util/u_screen.c
>  +++ b/src/gallium/auxiliary/util/u_screen.c
>  @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
> pipe_screen *pscreen,
>    case PIPE_CAP_MAX_VARYINGS:
>   return 8;
> 
>    case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>   return 0;
> 
>    case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>   return 0;
> 
>    case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>  -  return 2;
>  +  return 1;
> >>> would it be safer to leave the current default and let drivers opt-in
> >>> to the lower # individually?  I guess triple buffering would still be
> >>> better for some of the smaller gpu's?
> >>>
> >>> disclaimer: I haven't tested this myself or looked very closely at the
> >>> dri code.. so could be misunderstanding something..
> >>>
> >>> BR,
> >>> -R
> >>
> >> I think I can shed some light on the issue to justify (or not) the
> change.
> >>
> >> If we don't do throttling and the CPU renders frames at a faster rate
> >> than what the GPU can render,
> >> the GPU can become quite late and cause huge frame lag.
> >>
> >> The throttling involves forcing a (CPU) wait when a frame is presented
> >> if the 'x' previous images have not finished rendering.
> >>
> >> The lower 'x', the lower the frame lag.
> >>
> >> However if 'x' is too low (waiting current frame is rendered for
> >> example), the GPU can be idle until the CPU is flushing new commands.
> >>
> >> Now there is a choice to be made for the value of 'x'. 1 or 2 are
> >> reasonable values.
> >>
> >> if 'x' is 1, we wait the previous frame was rendered when we present the
> >> current frame. For '2' we wait the frame before.
> >>
> >>
> >> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
> >> affected by the frame lag (as frames take longer to render).
> >>
> > I get the latency aspect.. but my comment was more about latency vs
> > framerate (or maybe more cynically, about games vs benchmarks :-P)
> >
> > BR,
> > -R
>
>
> As long at the GPU has work to do, performance should be maximized.
>
> However in the case I described below, if CPU and GPU render at about
> the same framerate and
> the framerate has some variations (whether it being the GPU taking more
> time for one frame, or the CPU),
> using more than 1 would give a bit better performance.
>
>
> Axel
>
>
>
> >
> >> However if a game is rendering at a very unstable framerate (some frames
> >> needing more work than others), using a value of 2 is safer
> >> to maximize performance. (As using a value of 1 would lead to wait if we
> >> get a frame that takes particularly long, as using 2 smooths that a bit)
> >>
> >>
> >> I remember years ago I had extremely unstable fps when using catalyst on
> >> Portal for example. But I believe it is more a driver issue than a game
> >> issue.
> >>
> >> If we assume games don't have unstable framerate, (which seems
> >> reasonable assumption), using 1 as default makes sense.
> >>
> >>
> >> If one wants to test experimentally for regression, the ideal test case
> >> if when the GPU renders at about the same framerate as the CPU feeds it.
> >>
> >>
> >>
> >> Axel
> >>
> >>
> >>
> >>
>
>
___
mesa-dev mailing 

[Mesa-dev] [MR] Gallium nine fixes and improvements for Mesa 19.1

2019-04-27 Thread Axel Davy

Hi,


I usually send my patch series via mail, but as this one was rather 
long, and as people seem to find gitlab easier to work with for long 
patch series, this time I used the merge request system.



Don't hesitate to take a look:

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/748


Yours,

Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Axel Davy

On 27/04/2019 21:02, Rob Clark wrote:

On Sat, Apr 27, 2019 at 9:52 AM Axel Davy  wrote:

On 27/04/2019 18:13, Rob Clark wrote:

On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:

From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
(instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
  draw frame 2
  wait frame 0
  flush frame 2
  present frame 2

  The input lag is 2 frames.

Now:
  draw frame 2
  flush frame 2
  wait frame 1
  present frame 2

  The input lag is 1 frame. Flushing is done before waiting, because
  otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
   src/gallium/auxiliary/util/u_screen.c |  2 +-
   src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
   2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
  case PIPE_CAP_MAX_VARYINGS:
 return 8;

  case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
 return 0;

  case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
 return 0;

  case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-  return 2;
+  return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R


I think I can shed some light on the issue to justify (or not) the change.

If we don't do throttling and the CPU renders frames at a faster rate
than what the GPU can render,
the GPU can become quite late and cause huge frame lag.

The throttling involves forcing a (CPU) wait when a frame is presented
if the 'x' previous images have not finished rendering.

The lower 'x', the lower the frame lag.

However if 'x' is too low (waiting current frame is rendered for
example), the GPU can be idle until the CPU is flushing new commands.

Now there is a choice to be made for the value of 'x'. 1 or 2 are
reasonable values.

if 'x' is 1, we wait the previous frame was rendered when we present the
current frame. For '2' we wait the frame before.


Thus for smaller gpu's, a value of 1 is better than 2 as it is more
affected by the frame lag (as frames take longer to render).


I get the latency aspect.. but my comment was more about latency vs
framerate (or maybe more cynically, about games vs benchmarks :-P)

BR,
-R



As long at the GPU has work to do, performance should be maximized.

However in the case I described below, if CPU and GPU render at about 
the same framerate and
the framerate has some variations (whether it being the GPU taking more 
time for one frame, or the CPU),

using more than 1 would give a bit better performance.


Axel






However if a game is rendering at a very unstable framerate (some frames
needing more work than others), using a value of 2 is safer
to maximize performance. (As using a value of 1 would lead to wait if we
get a frame that takes particularly long, as using 2 smooths that a bit)


I remember years ago I had extremely unstable fps when using catalyst on
Portal for example. But I believe it is more a driver issue than a game
issue.

If we assume games don't have unstable framerate, (which seems
reasonable assumption), using 1 as default makes sense.


If one wants to test experimentally for regression, the ideal test case
if when the GPU renders at about the same framerate as the CPU feeds it.



Axel






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Rob Clark
On Sat, Apr 27, 2019 at 9:52 AM Axel Davy  wrote:
>
> On 27/04/2019 18:13, Rob Clark wrote:
> > On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
> >> From: Marek Olšák 
> >>
> >> It's done by:
> >> - decrease the number of frames in flight by 1
> >> - flush before throttling in SwapBuffers
> >>(instead of wait-then-flush, do flush-then-wait)
> >>
> >> The improvement is apparent with Unigine Heaven.
> >>
> >> Previously:
> >>  draw frame 2
> >>  wait frame 0
> >>  flush frame 2
> >>  present frame 2
> >>
> >>  The input lag is 2 frames.
> >>
> >> Now:
> >>  draw frame 2
> >>  flush frame 2
> >>  wait frame 1
> >>  present frame 2
> >>
> >>  The input lag is 1 frame. Flushing is done before waiting, because
> >>  otherwise the device would be idle after waiting.
> >>
> >> Nine is affected because it also uses the pipe cap.
> >> ---
> >>   src/gallium/auxiliary/util/u_screen.c |  2 +-
> >>   src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
> >>   2 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/src/gallium/auxiliary/util/u_screen.c 
> >> b/src/gallium/auxiliary/util/u_screen.c
> >> index 27f51e0898e..410f17421e6 100644
> >> --- a/src/gallium/auxiliary/util/u_screen.c
> >> +++ b/src/gallium/auxiliary/util/u_screen.c
> >> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
> >> *pscreen,
> >>  case PIPE_CAP_MAX_VARYINGS:
> >> return 8;
> >>
> >>  case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >> return 0;
> >>
> >>  case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >> return 0;
> >>
> >>  case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> >> -  return 2;
> >> +  return 1;
> > would it be safer to leave the current default and let drivers opt-in
> > to the lower # individually?  I guess triple buffering would still be
> > better for some of the smaller gpu's?
> >
> > disclaimer: I haven't tested this myself or looked very closely at the
> > dri code.. so could be misunderstanding something..
> >
> > BR,
> > -R
>
>
> I think I can shed some light on the issue to justify (or not) the change.
>
> If we don't do throttling and the CPU renders frames at a faster rate
> than what the GPU can render,
> the GPU can become quite late and cause huge frame lag.
>
> The throttling involves forcing a (CPU) wait when a frame is presented
> if the 'x' previous images have not finished rendering.
>
> The lower 'x', the lower the frame lag.
>
> However if 'x' is too low (waiting current frame is rendered for
> example), the GPU can be idle until the CPU is flushing new commands.
>
> Now there is a choice to be made for the value of 'x'. 1 or 2 are
> reasonable values.
>
> if 'x' is 1, we wait the previous frame was rendered when we present the
> current frame. For '2' we wait the frame before.
>
>
> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
> affected by the frame lag (as frames take longer to render).
>

I get the latency aspect.. but my comment was more about latency vs
framerate (or maybe more cynically, about games vs benchmarks :-P)

BR,
-R

>
> However if a game is rendering at a very unstable framerate (some frames
> needing more work than others), using a value of 2 is safer
> to maximize performance. (As using a value of 1 would lead to wait if we
> get a frame that takes particularly long, as using 2 smooths that a bit)
>
>
> I remember years ago I had extremely unstable fps when using catalyst on
> Portal for example. But I believe it is more a driver issue than a game
> issue.
>
> If we assume games don't have unstable framerate, (which seems
> reasonable assumption), using 1 as default makes sense.
>
>
> If one wants to test experimentally for regression, the ideal test case
> if when the GPU renders at about the same framerate as the CPU feeds it.
>
>
>
> Axel
>
>
>
> >
> >>  case PIPE_CAP_DMABUF:
> >>   #ifdef PIPE_OS_LINUX
> >> return 1;
> >>   #else
> >> return 0;
> >>   #endif
> >>
> >>  default:
> >> unreachable("bad PIPE_CAP_*");
> >> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
> >> b/src/gallium/state_trackers/dri/dri_drawable.c
> >> index 26bfdbecc53..c1de3bed9dd 100644
> >> --- a/src/gallium/state_trackers/dri/dri_drawable.c
> >> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
> >> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
> >>  *
> >>  * This pulls a fence off the throttling queue and waits for it if 
> >> the
> >>  * number of fences on the throttling queue has reached the desired
> >>  * number.
> >>  *
> >>  * Then flushes to insert a fence at the current rendering 
> >> position, and
> >>  * pushes that fence on the queue. This requires that the 
> >> st_context_iface
> >>  * flush method returns a fence even if there are no commands to 
> >> flush.
> >>  */
> >> struct 

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Axel Davy

On 27/04/2019 18:13, Rob Clark wrote:

On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:

From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
   (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
 draw frame 2
 wait frame 0
 flush frame 2
 present frame 2

 The input lag is 2 frames.

Now:
 draw frame 2
 flush frame 2
 wait frame 1
 present frame 2

 The input lag is 1 frame. Flushing is done before waiting, because
 otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
  src/gallium/auxiliary/util/u_screen.c |  2 +-
  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
  2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
 case PIPE_CAP_MAX_VARYINGS:
return 8;

 case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
return 0;

 case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
return 0;

 case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-  return 2;
+  return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R



I think I can shed some light on the issue to justify (or not) the change.

If we don't do throttling and the CPU renders frames at a faster rate 
than what the GPU can render,

the GPU can become quite late and cause huge frame lag.

The throttling involves forcing a (CPU) wait when a frame is presented 
if the 'x' previous images have not finished rendering.


The lower 'x', the lower the frame lag.

However if 'x' is too low (waiting current frame is rendered for 
example), the GPU can be idle until the CPU is flushing new commands.


Now there is a choice to be made for the value of 'x'. 1 or 2 are 
reasonable values.


if 'x' is 1, we wait the previous frame was rendered when we present the 
current frame. For '2' we wait the frame before.



Thus for smaller gpu's, a value of 1 is better than 2 as it is more 
affected by the frame lag (as frames take longer to render).



However if a game is rendering at a very unstable framerate (some frames 
needing more work than others), using a value of 2 is safer
to maximize performance. (As using a value of 1 would lead to wait if we 
get a frame that takes particularly long, as using 2 smooths that a bit)



I remember years ago I had extremely unstable fps when using catalyst on 
Portal for example. But I believe it is more a driver issue than a game 
issue.


If we assume games don't have unstable framerate, (which seems 
reasonable assumption), using 1 as default makes sense.



If one wants to test experimentally for regression, the ideal test case 
if when the GPU renders at about the same framerate as the CPU feeds it.




Axel






 case PIPE_CAP_DMABUF:
  #ifdef PIPE_OS_LINUX
return 1;
  #else
return 0;
  #endif

 default:
unreachable("bad PIPE_CAP_*");
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index 26bfdbecc53..c1de3bed9dd 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
 *
 * This pulls a fence off the throttling queue and waits for it if the
 * number of fences on the throttling queue has reached the desired
 * number.
 *
 * Then flushes to insert a fence at the current rendering position, and
 * pushes that fence on the queue. This requires that the 
st_context_iface
 * flush method returns a fence even if there are no commands to flush.
 */
struct pipe_screen *screen = drawable->screen->base.screen;
-  struct pipe_fence_handle *fence;
+  struct pipe_fence_handle *oldest_fence, *new_fence = NULL;

-  fence = swap_fences_pop_front(drawable);
-  if (fence) {
- (void) screen->fence_finish(screen, NULL, fence, 
PIPE_TIMEOUT_INFINITE);
- screen->fence_reference(screen, , NULL);
-  }
+  st->flush(st, flush_flags, _fence);

-  st->flush(st, flush_flags, );
+  oldest_fence = swap_fences_pop_front(drawable);
+  if (oldest_fence) {
+ screen->fence_finish(screen, NULL, oldest_fence, 
PIPE_TIMEOUT_INFINITE);
+ screen->fence_reference(screen, _fence, NULL);
+  }

-  

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Rob Clark
On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák  wrote:
>
> From: Marek Olšák 
>
> It's done by:
> - decrease the number of frames in flight by 1
> - flush before throttling in SwapBuffers
>   (instead of wait-then-flush, do flush-then-wait)
>
> The improvement is apparent with Unigine Heaven.
>
> Previously:
> draw frame 2
> wait frame 0
> flush frame 2
> present frame 2
>
> The input lag is 2 frames.
>
> Now:
> draw frame 2
> flush frame 2
> wait frame 1
> present frame 2
>
> The input lag is 1 frame. Flushing is done before waiting, because
> otherwise the device would be idle after waiting.
>
> Nine is affected because it also uses the pipe cap.
> ---
>  src/gallium/auxiliary/util/u_screen.c |  2 +-
>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
>  2 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_screen.c 
> b/src/gallium/auxiliary/util/u_screen.c
> index 27f51e0898e..410f17421e6 100644
> --- a/src/gallium/auxiliary/util/u_screen.c
> +++ b/src/gallium/auxiliary/util/u_screen.c
> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
> *pscreen,
> case PIPE_CAP_MAX_VARYINGS:
>return 8;
>
> case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>return 0;
>
> case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>return 0;
>
> case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> -  return 2;
> +  return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R

>
> case PIPE_CAP_DMABUF:
>  #ifdef PIPE_OS_LINUX
>return 1;
>  #else
>return 0;
>  #endif
>
> default:
>unreachable("bad PIPE_CAP_*");
> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
> b/src/gallium/state_trackers/dri/dri_drawable.c
> index 26bfdbecc53..c1de3bed9dd 100644
> --- a/src/gallium/state_trackers/dri/dri_drawable.c
> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
> *
> * This pulls a fence off the throttling queue and waits for it if the
> * number of fences on the throttling queue has reached the desired
> * number.
> *
> * Then flushes to insert a fence at the current rendering position, 
> and
> * pushes that fence on the queue. This requires that the 
> st_context_iface
> * flush method returns a fence even if there are no commands to flush.
> */
>struct pipe_screen *screen = drawable->screen->base.screen;
> -  struct pipe_fence_handle *fence;
> +  struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
>
> -  fence = swap_fences_pop_front(drawable);
> -  if (fence) {
> - (void) screen->fence_finish(screen, NULL, fence, 
> PIPE_TIMEOUT_INFINITE);
> - screen->fence_reference(screen, , NULL);
> -  }
> +  st->flush(st, flush_flags, _fence);
>
> -  st->flush(st, flush_flags, );
> +  oldest_fence = swap_fences_pop_front(drawable);
> +  if (oldest_fence) {
> + screen->fence_finish(screen, NULL, oldest_fence, 
> PIPE_TIMEOUT_INFINITE);
> + screen->fence_reference(screen, _fence, NULL);
> +  }
>
> -  if (fence) {
> - swap_fences_push_back(drawable, fence);
> - screen->fence_reference(screen, , NULL);
> +  if (new_fence) {
> + swap_fences_push_back(drawable, new_fence);
> + screen->fence_reference(screen, _fence, NULL);
>}
> }
> else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
>st->flush(st, flush_flags, NULL);
> }
>
> if (drawable) {
>drawable->flushing = FALSE;
> }
>
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] mesa: rework error handling in glDrawBuffers

2019-04-27 Thread Mathias Fröhlich
Hi Marek,

one comment/failure inline:

On Saturday, 27 April 2019 05:31:45 CEST Marek Olšák wrote:
> From: Marek Olšák 
> 
> It's needed by the next pbuffer fix, which changes the behavior of
> draw_buffer_enum_to_bitmask, so it can't be used to help with error
> checking.
> ---
>  src/mesa/main/buffers.c | 53 ++---
>  1 file changed, 29 insertions(+), 24 deletions(-)
> 
> diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
> index d98c015bb24..2148fa1316c 100644
> --- a/src/mesa/main/buffers.c
> +++ b/src/mesa/main/buffers.c
> @@ -430,65 +430,70 @@ draw_buffers(struct gl_context *ctx, struct 
> gl_framebuffer *fb, GLsizei n,
>   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(invalid buffers)", 
> caller);
>   return;
>}
> }
>  
> supportedMask = supported_buffer_bitmask(ctx, fb);
> usedBufferMask = 0x0;
>  
> /* complicated error checking... */
> for (output = 0; output < n; output++) {
> -  destMask[output] = draw_buffer_enum_to_bitmask(ctx, buffers[output]);
> -
>if (!no_error) {
> - /* From the OpenGL 3.0 specification, page 258:
> -  * "Each buffer listed in bufs must be one of the values from tables
> -  *  4.5 or 4.6.  Otherwise, an INVALID_ENUM error is generated.
> -  */
> - if (destMask[output] == BAD_MASK) {
> -_mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
> -caller, _mesa_enum_to_string(buffers[output]));
> -return;
> - }
> -
>   /* From the OpenGL 4.5 specification, page 493 (page 515 of the PDF)
>* "An INVALID_ENUM error is generated if any value in bufs is 
> FRONT,
>* LEFT, RIGHT, or FRONT_AND_BACK . This restriction applies to both
>* the default framebuffer and framebuffer objects, and exists 
> because
>* these constants may themselves refer to multiple buffers, as 
> shown
>* in table 17.4."
>*
> -  * And on page 492 (page 514 of the PDF):
> +  * From the OpenGL 4.5 specification, page 492 (page 514 of the 
> PDF):
>* "If the default framebuffer is affected, then each of the 
> constants
>* must be one of the values listed in table 17.6 or the special 
> value
>* BACK. When BACK is used, n must be 1 and color values are written
>* into the left buffer for single-buffered contexts, or into the 
> back
>* left buffer for double-buffered contexts."
>*
>* Note "special value BACK". GL_BACK also refers to multiple 
> buffers,
>* but it is consider a special case here. This is a change on 4.5.
>* For OpenGL 4.x we check that behaviour. For any previous version 
> we
>* keep considering it wrong (as INVALID_ENUM).
>*/
> - if (util_bitcount(destMask[output]) > 1) {
> -if (_mesa_is_winsys_fbo(fb) && ctx->Version >= 40 &&
> -buffers[output] == GL_BACK) {
> -   if (n != 1) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION, "%s(with GL_BACK n 
> must be 1)",
> -  caller);
> -  return;
> -   }
> -} else {
> -   _mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
> -   caller, _mesa_enum_to_string(buffers[output]));
> + if (buffers[output] == GL_BACK &&
> + _mesa_is_winsys_fbo(fb) &&
> + _mesa_is_desktop_gl(ctx) &&
> + ctx->Version >= 40) {
> +if (n != 1) {
> +   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(with GL_BACK n 
> must be 1)",
> +   caller);
> return;
>  }
> + } else if (buffers[output] == GL_FRONT ||
> +buffers[output] == GL_LEFT ||
> +buffers[output] == GL_RIGHT ||
> +buffers[output] == GL_FRONT_AND_BACK) {
> +_mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
> +caller, _mesa_enum_to_string(buffers[output]));
> +return;
> + }
> +  }

According to the comment above "Note "special value BACK". GL_BACK...",
there must also be a GL_INVALID_ENUM on GL_BACK when not in opengles. Right?

That specific case can be tested using
MESA_GL_VERSION_OVERRIDE=3.3 piglit/bin/gl-3.1-draw-buffers-errors 

best

Mathias

> +
> +  destMask[output] = draw_buffer_enum_to_bitmask(ctx, buffers[output]);
> +
> +  if (!no_error) {
> + /* From the OpenGL 3.0 specification, page 258:
> +  * "Each buffer listed in bufs must be one of the values from tables
> +  *  4.5 or 4.6.  Otherwise, an INVALID_ENUM error is generated.
> +  */
> + if (destMask[output] == BAD_MASK) {
> +_mesa_error(ctx, GL_INVALID_ENUM, 

Re: [Mesa-dev] [PATCH 2/2] egl: add EGL_platform_device support

2019-04-27 Thread Marek Olšák
Those are all valid reasons, but I don't wanna expose swrast for AMD's
customers.

Marek

On Sat, Apr 27, 2019, 5:45 AM Mathias Fröhlich 
wrote:

> Hi Marek,
>
> On Wednesday, 24 April 2019 02:01:42 CEST Marek Olšák wrote:
> > Adam, did you notice my original suggestion "If there is at least 1 drm
> > device, swrast won't be in the list."? which means swrast would be in the
> > list for your "dumb" GPUs.
>
> Imagine a box with a low end drm capable hardware chip like you find
> sometimes
> in server type boxes (intel/matrox...). Otherwise the box is equipped with
> lots of cpu
> power. This is something that you will find a lot in that major
> engineering application
> environment. Your application will be glad to find the swrast renderer
> that is finally
> more capable than the 'GPU' mostly there to drive an administration
> console.
> You do not want to lock a swrast 'device' (or however you want to call it)
> out by
> a may be less capable 'console GPU'.
>
> Beside that having a second type of 'normalized renderer' like Eric was
> telling
> about is an other one.
>
> As well as sometimes it may make sense to utilize the GPU
> with one set of work and a second GPU with an other set of work in
> parallel.
> When you only find a single gpu device in one box, you may be glad to find
> a swrast device that you can make use of in parallel with the gpu without
> the need
> to put up different code paths in your application.
>
> May be I can come up with other cases, but thats the 5 minutes for now ...
>
> best
>
> Mathias
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-27 Thread Axel Davy

On 26/04/2019 20:40, Marek Olšák wrote:
On Fri, Apr 26, 2019 at 12:56 PM Axel Davy > wrote:


On 26/04/2019 10:08, Michel Dänzer wrote:
> On 2019-04-26 4:06 a.m., Marek Olšák wrote:
>> From: Marek Olšák mailto:marek.ol...@amd.com>>
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>    (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>>      draw frame 2
>>      wait frame 0
>>      flush frame 2
>>      present frame 2
>>
>>      The input lag is 2 frames.
>>
>> Now:
>>      draw frame 2
>>      flush frame 2
>>      wait frame 1
>>      present frame 2
>>
>>      The input lag is 1 frame. Flushing is done before waiting,
because
>>      otherwise the device would be idle after waiting.
> Nice idea. Not sure offhand about all ramifications, but
certainly worth
> a go.
>
>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>   src/gallium/auxiliary/util/u_screen.c         | 2 +-
>>   src/gallium/state_trackers/dri/dri_drawable.c | 20
+--
>>   2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c
b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct
pipe_screen *pscreen,
>>      case PIPE_CAP_MAX_VARYINGS:
>>         return 8;
>>
>>      case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>         return 0;
>>
>>      case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>         return 0;
>>
>>      case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -      return 2;
>> +      return 1;
> This might be slightly misleading, as there can still be two
frames in
> flight (on the GPU) at the same time. Might be better to leave
this at 2
> (so Nine isn't affected) and adjust its treatment in
> src/gallium/state_trackers/dri/dri_drawable.c .
>
>
Checking what gallium nine does currently, it seems we already do
flush
then wait,
however we call swap_fences_pop_front and swap_fences_push_back in
the
reverse order compared to your patch.
We compensate by taking PIPE_CAP_MAX_FRAMES_IN_FLIGHT + 1

In conclusion, with the proposed patch, gl and nine should have
the same
behaviour (and thus if gl benefits from a value of 1, nine should
as well).
I haven't have noticed input lag, I guess I have to test on heaven if
you see a difference.
How can I slow down my gpu to test that ? I use to use the
/sys/kernel/debug/dri/0/ vars to force low dpm, but it doesn't
seem to
be possible anymore as the related files are gone (rx480) ?


I set maximum settings, windowed, resolution: custom, and I type in 
the 4K resolution (I don't have a 4K monitor). When it's running, I 
enable wireframe. It should be pretty slow.


Marek



I couldn't notice any performance difference (card at low or high dpm - 
somehow the vars were still there, I was looking at the wrong place),
with and without the change and gallium nine. I couldn't notice a change 
in lag either (slowest I had was around 20 fps, which may not be the 
best to see that).


I'm fine with this change affecting nine.


Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: add missing VEGA20 chip in radv_get_device_name()

2019-04-27 Thread Samuel Pitoiset
Otherwise it returns "AMD RADV unknown".

Cc: 19.0 
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index b7f991fce72..8340bf52c11 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -112,6 +112,7 @@ radv_get_device_name(enum radeon_family family, char *name, 
size_t name_len)
case CHIP_VEGAM: chip_string = "AMD RADV VEGA M"; break;
case CHIP_VEGA10: chip_string = "AMD RADV VEGA10"; break;
case CHIP_VEGA12: chip_string = "AMD RADV VEGA12"; break;
+   case CHIP_VEGA20: chip_string = "AMD RADV VEGA20"; break;
case CHIP_RAVEN: chip_string = "AMD RADV RAVEN"; break;
case CHIP_RAVEN2: chip_string = "AMD RADV RAVEN2"; break;
default: chip_string = "AMD RADV unknown"; break;
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] radv: fix set_output_usage_mask() with composite and 64-bit types

2019-04-27 Thread Rhys Perry
It previously used var->type instead of deref_instr->type and didn't
handle 64-bit outputs.

This fixes lots of transform feedback CTS tests involving transform
feedback and geometry shaders (mostly
dEQP-VK.transform_feedback.fuzz.random_geometry.*)

v2: fix writemask widening when comp != 0
v3: fix 64-bit variables when comp != 0, again

Signed-off-by: Rhys Perry 
Cc: 19.0 
---
 src/amd/vulkan/radv_shader_info.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_shader_info.c 
b/src/amd/vulkan/radv_shader_info.c
index 932a1852266..e771ad79878 100644
--- a/src/amd/vulkan/radv_shader_info.c
+++ b/src/amd/vulkan/radv_shader_info.c
@@ -112,6 +112,15 @@ gather_intrinsic_load_deref_info(const nir_shader *nir,
}
 }
 
+static uint32_t
+widen_writemask(uint32_t wrmask)
+{
+   uint32_t new_wrmask = 0;
+   for(unsigned i = 0; i < 4; i++)
+   new_wrmask |= (wrmask & (1 << i) ? 0x3 : 0x0) << (i * 2);
+   return new_wrmask;
+}
+
 static void
 set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
  uint8_t *output_usage_mask)
@@ -119,7 +128,7 @@ set_output_usage_mask(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
nir_deref_instr *deref_instr =
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
nir_variable *var = nir_deref_instr_get_variable(deref_instr);
-   unsigned attrib_count = glsl_count_attribute_slots(var->type, false);
+   unsigned attrib_count = glsl_count_attribute_slots(deref_instr->type, 
false);
unsigned idx = var->data.location;
unsigned comp = var->data.location_frac;
unsigned const_offset = 0;
@@ -127,15 +136,19 @@ set_output_usage_mask(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
get_deref_offset(deref_instr, _offset);
 
if (var->data.compact) {
+   assert(!glsl_type_is_64bit(deref_instr->type));
const_offset += comp;
output_usage_mask[idx + const_offset / 4] |= 1 << (const_offset 
% 4);
return;
}
 
-   for (unsigned i = 0; i < attrib_count; i++) {
+   uint32_t wrmask = nir_intrinsic_write_mask(instr);
+   if (glsl_type_is_64bit(deref_instr->type))
+   wrmask = widen_writemask(wrmask);
+
+   for (unsigned i = 0; i < attrib_count; i++)
output_usage_mask[idx + i + const_offset] |=
-   instr->const_index[0] << comp;
-   }
+   ((wrmask >> (i * 4)) & 0xf) << comp;
 }
 
 static void
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] egl: add EGL_platform_device support

2019-04-27 Thread Mathias Fröhlich
Hi Marek,

On Wednesday, 24 April 2019 02:01:42 CEST Marek Olšák wrote:
> Adam, did you notice my original suggestion "If there is at least 1 drm
> device, swrast won't be in the list."? which means swrast would be in the
> list for your "dumb" GPUs.

Imagine a box with a low end drm capable hardware chip like you find sometimes
in server type boxes (intel/matrox...). Otherwise the box is equipped with lots 
of cpu
power. This is something that you will find a lot in that major engineering 
application
environment. Your application will be glad to find the swrast renderer that is 
finally
more capable than the 'GPU' mostly there to drive an administration console.
You do not want to lock a swrast 'device' (or however you want to call it) out 
by
a may be less capable 'console GPU'.

Beside that having a second type of 'normalized renderer' like Eric was telling
about is an other one.

As well as sometimes it may make sense to utilize the GPU
with one set of work and a second GPU with an other set of work in parallel.
When you only find a single gpu device in one box, you may be glad to find
a swrast device that you can make use of in parallel with the gpu without the 
need
to put up different code paths in your application.

May be I can come up with other cases, but thats the 5 minutes for now ...

best

Mathias


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] radv: fix set_output_usage_mask() with composite and 64-bit types

2019-04-27 Thread Rhys Perry
It previously used var->type instead of deref_instr->type and didn't
handle 64-bit outputs.

This fixes lots of transform feedback CTS tests involving transform
feedback and geometry shaders (mostly
dEQP-VK.transform_feedback.fuzz.random_geometry.*)

v2: fix writemask widening when comp != 0

Signed-off-by: Rhys Perry 
Cc: 19.0 
---
 src/amd/vulkan/radv_shader_info.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_shader_info.c 
b/src/amd/vulkan/radv_shader_info.c
index 932a1852266..63ee25ab7c9 100644
--- a/src/amd/vulkan/radv_shader_info.c
+++ b/src/amd/vulkan/radv_shader_info.c
@@ -112,6 +112,15 @@ gather_intrinsic_load_deref_info(const nir_shader *nir,
}
 }
 
+static uint32_t
+widen_writemask(uint32_t wrmask)
+{
+   uint32_t new_wrmask = 0;
+   for(unsigned i = 0; i < 4; i++)
+   new_wrmask |= (wrmask & (1 << i) ? 0x3 : 0x0) << (i * 2);
+   return new_wrmask;
+}
+
 static void
 set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
  uint8_t *output_usage_mask)
@@ -119,7 +128,7 @@ set_output_usage_mask(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
nir_deref_instr *deref_instr =
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
nir_variable *var = nir_deref_instr_get_variable(deref_instr);
-   unsigned attrib_count = glsl_count_attribute_slots(var->type, false);
+   unsigned attrib_count = glsl_count_attribute_slots(deref_instr->type, 
false);
unsigned idx = var->data.location;
unsigned comp = var->data.location_frac;
unsigned const_offset = 0;
@@ -127,15 +136,21 @@ set_output_usage_mask(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
get_deref_offset(deref_instr, _offset);
 
if (var->data.compact) {
+   assert(!glsl_type_is_64bit(deref_instr->type));
const_offset += comp;
output_usage_mask[idx + const_offset / 4] |= 1 << (const_offset 
% 4);
return;
}
 
-   for (unsigned i = 0; i < attrib_count; i++) {
+   uint32_t wrmask = nir_intrinsic_write_mask(instr);
+   if (glsl_type_is_64bit(deref_instr->type))
+   wrmask = widen_writemask(wrmask) << (comp * 2);
+   else
+   wrmask = wrmask << comp;
+
+   for (unsigned i = 0; i < attrib_count; i++)
output_usage_mask[idx + i + const_offset] |=
-   instr->const_index[0] << comp;
-   }
+   (wrmask >> (i * 4)) & 0xf;
 }
 
 static void
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: fix set_output_usage_mask() with composite and 64-bit types

2019-04-27 Thread Rhys Perry
It previously used var->type instead of deref_instr->type and didn't
handle 64-bit outputs.

This fixes lots of transform feedback CTS tests involving transform
feedback and geometry shaders (mostly
dEQP-VK.transform_feedback.fuzz.random_geometry.*)

Signed-off-by: Rhys Perry 
Cc: 19.0 
---
 src/amd/vulkan/radv_shader_info.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_shader_info.c 
b/src/amd/vulkan/radv_shader_info.c
index 932a1852266..a3bfc81808e 100644
--- a/src/amd/vulkan/radv_shader_info.c
+++ b/src/amd/vulkan/radv_shader_info.c
@@ -112,6 +112,15 @@ gather_intrinsic_load_deref_info(const nir_shader *nir,
}
 }
 
+static uint32_t
+widen_writemask(uint32_t wrmask)
+{
+   uint32_t new_wrmask = 0;
+   for(unsigned i = 0; i < 4; i++)
+   new_wrmask |= (wrmask & (1 << i) ? 0x3 : 0x0) << (i * 2);
+   return new_wrmask;
+}
+
 static void
 set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
  uint8_t *output_usage_mask)
@@ -119,7 +128,7 @@ set_output_usage_mask(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
nir_deref_instr *deref_instr =
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
nir_variable *var = nir_deref_instr_get_variable(deref_instr);
-   unsigned attrib_count = glsl_count_attribute_slots(var->type, false);
+   unsigned attrib_count = glsl_count_attribute_slots(deref_instr->type, 
false);
unsigned idx = var->data.location;
unsigned comp = var->data.location_frac;
unsigned const_offset = 0;
@@ -127,15 +136,19 @@ set_output_usage_mask(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
get_deref_offset(deref_instr, _offset);
 
if (var->data.compact) {
+   assert(!glsl_type_is_64bit(deref_instr->type));
const_offset += comp;
output_usage_mask[idx + const_offset / 4] |= 1 << (const_offset 
% 4);
return;
}
 
-   for (unsigned i = 0; i < attrib_count; i++) {
+   uint32_t wrmask = nir_intrinsic_write_mask(instr) << comp;
+   if (glsl_type_is_64bit(deref_instr->type))
+   wrmask = widen_writemask(wrmask);
+
+   for (unsigned i = 0; i < attrib_count; i++)
output_usage_mask[idx + i + const_offset] |=
-   instr->const_index[0] << comp;
-   }
+   (wrmask >> (i * 4)) & 0xf;
 }
 
 static void
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev