Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-05 Thread Michel Dänzer
On 02/09/16 11:46 PM, Axel Davy wrote:
> Le 02/09/2016 à 16:41, Axel Davy a écrit :
>> Le 02/09/2016 à 03:06, Michel Dänzer a écrit :
>>> On 02/09/16 12:37 AM, Alex Deucher wrote:
 On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand
  wrote:
> On Aug 31, 2016 11:39 PM, "Michel Dänzer"  wrote:
>> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
>>> On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer >> > wrote:
>>>
>>>  That said, I'd be interested in hearing about any test cases
>>> where 4
>>>  buffers provide a significant boost over 3.
>>>
>>>
>>> A little history that may be useful: Quadbuffering was originally
>>> added
>>> for DRI3+present here:
>>>
>>>
>>> https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d
>>>
>> So the commit message claims. If you look at the code after that
>> change
>> though, it's basically impossible to end up with 4 buffers (at least
>> permanently), since it would require all these conditions to be
>> true at
>> the same:
>>
>> 1. priv->flipping (the last Present request was completed as a flip)
>> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC)
>> (the X
>> driver doesn't support async flips)
>> 3. priv->swap_interval == 0
>>
>> Given 2, 1 & 3 are mutually exclusive.
> I'm not seeing how 1 & 3 are mutually exclusive. priv->swap_interval
> doesn't seem to have anything to do with whether or not you're
> flipping.
>>> priv->swap_interval == 0 can only use flips if async flips are
>>> supported.
>> This may not be always true.

It is with the current Present implementation in xserver.


>> In particular with my XWayland Present implementation (yeah it's been
>> years, perhaps time to work on it again), you could have:
>> . 1 buffer on the screen
>> . 1 buffer scheduled for next flip kernel side
>> . 1 buffer old than the previous buffers, and that will get scheduled
>> for flip after if no new buffer arrives (and will be released if a new
>> buffer arrives)
> I meant "newer" of course
>>
>> All three kept by the Wayland compositor.
>> Thus you need a fourth buffer to keep rendering.

The same logic could be implemented in the core Present code in xserver.
My main concern about this is the increased latency due to using a sync
flip. If the application or user sets swap interval 0, I suspect the
intention tends to be to minimize latency. So maybe this logic should
only be used if the driver supports async flips but rejects them in
specific cases, e.g. because the user enabled TearFree.

Anyway, in the meantime, 4 buffers can only make any difference vs 3
buffers with async flips even in theory, and in practice I can only
measure ~1% difference with vblank_mode=0 glxgears in fullscreen (if
anyone can measure a bigger difference with any test case, please let me
know!). Is that enough to justify the additional memory usage for the
4th buffer?

With GpuTest Triangle in fullscreen, I actually get ~0.3% higher numbers
with 3 vs 4 buffers.


> If you use one buffer for swap interval = 0 and no flipping, it means
> you wait for the xserver to handle the presentation request and send the
> answer, before rendering again.
> Before there wasn't this synchronization, because there were two buffers.
> 
> Could this be the cause of the slowdown ?

It was indeed. The thing is, even now I can only measure any significant
difference between one or two buffers with vblank_mode=0 glxgears (not
even e.g. with GpuTest triangle at 800x600 windowed). I thought I tested
that with my previous change but must have messed up somehow. :(


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-02 Thread Axel Davy

Le 02/09/2016 à 16:41, Axel Davy a écrit :

Le 02/09/2016 à 03:06, Michel Dänzer a écrit :

On 02/09/16 12:37 AM, Alex Deucher wrote:
On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand 
 wrote:

On Aug 31, 2016 11:39 PM, "Michel Dänzer"  wrote:

On 01/09/16 02:05 PM, Jason Ekstrand wrote:

On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer > wrote:

 On 31/08/16 11:21 PM, Jason Ekstrand wrote:
 > On Aug 19, 2016 12:07 AM, "Michel Dänzer" 
 > >> 
wrote:

 >> From: Michel Dänzer 
 > >>

 >>
 >> Always use 3 buffers when flipping. With only 2 buffers, 
we have

to wait
 >> for a flip to complete (which takes non-0 time even with
asynchronous
 >> flips) before we can start working on the next frame. We 
were

previously
 >> only using 2 buffers for flipping if the X server supports
asynchronous
 >> flips, even when we're not using asynchronous flips. This 
could

result
 >> in bad performance (the referenced bug report is an 
extreme case,

where
 >> the inter-frame stalls were preventing the GPU from 
reaching its

maximum
 >> clocks).
 >
 > Sorry for the post-push review but I don't usually pay much
attention to
 > the window system code.  In any case, I believe you're 
doing your
 > counting wrong.  When flipping with swapinterval=0, you 
need 4

buffers:
 >
 > 1. The buffer currently being scanned out (will be 
released at

next vblank)
 > 2. The buffer X has queued for scanout but is waiting on 
vblank


 s/vblank/flip/g, since async flips may not wait for vblank, but
yeah.

 > 3. The buffer the application has just submitted which X will
queue next
 > of it doesn't get another before the window closes.
 > 4. The buffer the application is using for rendering.
 >
 > With only 3, you get a stall during that window in which X 
has

queued
 > another flip but we're waiting on vblank before the flip 
begins.

An I
 > missing something?

 Nothing, except maybe the paragraph below stating that I 
couldn't
 measure any benefit from using 4 buffers. :) I'm not exactly 
sure

why,
 but I suspect it might be because even with just 3 buffers, 
the GPU

can
 always work on at least one frame ahead of time.

 Also note that even before my change, we were only using 3 
buffers

when
 the X driver supports async flips (with swap interval 0; only 2
buffers
 with swap interval > 0).


Yes, because with async flips you don't have a buffer sitting 
queued in

the kernel waiting to be flipped which you can't cancel.

Actually, there is. Even async flips take non-0 time to complete.



that makes perfect sense.
What exactly does? My change may not be perfect, but the logic 
before it

was mostly backwards.

I think perhaps the problem here is that I don't know what you mean by
"async flips".  It's an X term that obviously does not mean what I 
thought

it meant.

Async means immediate (or as close to it as possibly, maybe hsync
depending on the hw) not at vsync.

Exactly.


 That said, I'd be interested in hearing about any test cases 
where 4

 buffers provide a significant boost over 3.


A little history that may be useful: Quadbuffering was originally 
added

for DRI3+present here:


https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d 

So the commit message claims. If you look at the code after that 
change

though, it's basically impossible to end up with 4 buffers (at least
permanently), since it would require all these conditions to be 
true at

the same:

1. priv->flipping (the last Present request was completed as a flip)
2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) 
(the X

driver doesn't support async flips)
3. priv->swap_interval == 0

Given 2, 1 & 3 are mutually exclusive.

I'm not seeing how 1 & 3 are mutually exclusive. priv->swap_interval
doesn't seem to have anything to do with whether or not you're 
flipping.
priv->swap_interval == 0 can only use flips if async flips are 
supported.
This may not be always true. In particular with my XWayland Present 
implementation (yeah it's been years, perhaps time to work on it again),

you could have:
. 1 buffer on the screen
. 1 buffer scheduled for next flip kernel side
. 1 buffer old than the previous buffers, and that will get scheduled 
for flip after if no new buffer arrives (and will be released if a new 
buffer arrives)

I meant "newer" of course


All three kept by the Wayland compositor.
Thus you need a fourth buffer to keep rendering.

Axel



So WRT https://bugs.freedesktop.org/show_bug.cgi?id=97549, let's not
jump to any conclusions but look at how 

Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-02 Thread Axel Davy

Le 02/09/2016 à 03:06, Michel Dänzer a écrit :

On 02/09/16 12:37 AM, Alex Deucher wrote:

On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand  wrote:

On Aug 31, 2016 11:39 PM, "Michel Dänzer"  wrote:

On 01/09/16 02:05 PM, Jason Ekstrand wrote:

On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer > wrote:

 On 31/08/16 11:21 PM, Jason Ekstrand wrote:
 > On Aug 19, 2016 12:07 AM, "Michel Dänzer" 
 > >> wrote:
 >> From: Michel Dänzer 
 > >>
 >>
 >> Always use 3 buffers when flipping. With only 2 buffers, we have
to wait
 >> for a flip to complete (which takes non-0 time even with
asynchronous
 >> flips) before we can start working on the next frame. We were
previously
 >> only using 2 buffers for flipping if the X server supports
asynchronous
 >> flips, even when we're not using asynchronous flips. This could
result
 >> in bad performance (the referenced bug report is an extreme case,
where
 >> the inter-frame stalls were preventing the GPU from reaching its
maximum
 >> clocks).
 >
 > Sorry for the post-push review but I don't usually pay much
attention to
 > the window system code.  In any case, I believe you're doing your
 > counting wrong.  When flipping with swapinterval=0, you need 4
buffers:
 >
 > 1. The buffer currently being scanned out  (will be released at
next vblank)
 > 2. The buffer X has queued for scanout but is waiting on vblank

 s/vblank/flip/g, since async flips may not wait for vblank, but
yeah.

 > 3. The buffer the application has just submitted which X will
queue next
 > of it doesn't get another before the window closes.
 > 4. The buffer the application is using for rendering.
 >
 > With only 3, you get a stall during that window in which X has
queued
 > another flip but we're waiting on vblank before the flip begins.
An I
 > missing something?

 Nothing, except maybe the paragraph below stating that I couldn't
 measure any benefit from using 4 buffers. :) I'm not exactly sure
why,
 but I suspect it might be because even with just 3 buffers, the GPU
can
 always work on at least one frame ahead of time.

 Also note that even before my change, we were only using 3 buffers
when
 the X driver supports async flips (with swap interval 0; only 2
buffers
 with swap interval > 0).


Yes, because with async flips you don't have a buffer sitting queued in
the kernel waiting to be flipped which you can't cancel.

Actually, there is. Even async flips take non-0 time to complete.



that makes perfect sense.

What exactly does? My change may not be perfect, but the logic before it
was mostly backwards.

I think perhaps the problem here is that I don't know what you mean by
"async flips".  It's an X term that obviously does not mean what I thought
it meant.

Async means immediate (or as close to it as possibly, maybe hsync
depending on the hw) not at vsync.

Exactly.



 That said, I'd be interested in hearing about any test cases where 4
 buffers provide a significant boost over 3.


A little history that may be useful: Quadbuffering was originally added
for DRI3+present here:


https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d

So the commit message claims. If you look at the code after that change
though, it's basically impossible to end up with 4 buffers (at least
permanently), since it would require all these conditions to be true at
the same:

1. priv->flipping (the last Present request was completed as a flip)
2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
driver doesn't support async flips)
3. priv->swap_interval == 0

Given 2, 1 & 3 are mutually exclusive.

I'm not seeing how 1 & 3 are mutually exclusive.  priv->swap_interval
doesn't seem to have anything to do with whether or not you're flipping.

priv->swap_interval == 0 can only use flips if async flips are supported.
This may not be always true. In particular with my XWayland Present 
implementation (yeah it's been years, perhaps time to work on it again),

you could have:
. 1 buffer on the screen
. 1 buffer scheduled for next flip kernel side
. 1 buffer old than the previous buffers, and that will get scheduled 
for flip after if no new buffer arrives (and will be released if a new 
buffer arrives)


All three kept by the Wayland compositor.
Thus you need a fourth buffer to keep rendering.

Axel



So WRT https://bugs.freedesktop.org/show_bug.cgi?id=97549, let's not
jump to any conclusions but look at how many buffers actually end up
being used for what reasons in each case. I suspect there might be some
surprises. :)




Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-01 Thread Michel Dänzer
On 02/09/16 12:37 AM, Alex Deucher wrote:
> On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand  wrote:
>> On Aug 31, 2016 11:39 PM, "Michel Dänzer"  wrote:
>>> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
 On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer > wrote:

 On 31/08/16 11:21 PM, Jason Ekstrand wrote:
 > On Aug 19, 2016 12:07 AM, "Michel Dänzer" 
 > >> wrote:
 >> From: Michel Dänzer 
 > >>
 >>
 >> Always use 3 buffers when flipping. With only 2 buffers, we have
 to wait
 >> for a flip to complete (which takes non-0 time even with
 asynchronous
 >> flips) before we can start working on the next frame. We were
 previously
 >> only using 2 buffers for flipping if the X server supports
 asynchronous
 >> flips, even when we're not using asynchronous flips. This could
 result
 >> in bad performance (the referenced bug report is an extreme case,
 where
 >> the inter-frame stalls were preventing the GPU from reaching its
 maximum
 >> clocks).
 >
 > Sorry for the post-push review but I don't usually pay much
 attention to
 > the window system code.  In any case, I believe you're doing your
 > counting wrong.  When flipping with swapinterval=0, you need 4
 buffers:
 >
 > 1. The buffer currently being scanned out  (will be released at
 next vblank)
 > 2. The buffer X has queued for scanout but is waiting on vblank

 s/vblank/flip/g, since async flips may not wait for vblank, but
 yeah.

 > 3. The buffer the application has just submitted which X will
 queue next
 > of it doesn't get another before the window closes.
 > 4. The buffer the application is using for rendering.
 >
 > With only 3, you get a stall during that window in which X has
 queued
 > another flip but we're waiting on vblank before the flip begins.
 An I
 > missing something?

 Nothing, except maybe the paragraph below stating that I couldn't
 measure any benefit from using 4 buffers. :) I'm not exactly sure
 why,
 but I suspect it might be because even with just 3 buffers, the GPU
 can
 always work on at least one frame ahead of time.

 Also note that even before my change, we were only using 3 buffers
 when
 the X driver supports async flips (with swap interval 0; only 2
 buffers
 with swap interval > 0).


 Yes, because with async flips you don't have a buffer sitting queued in
 the kernel waiting to be flipped which you can't cancel.
>>>
>>> Actually, there is. Even async flips take non-0 time to complete.
>>>
>>>
 that makes perfect sense.
>>>
>>> What exactly does? My change may not be perfect, but the logic before it
>>> was mostly backwards.
>>
>> I think perhaps the problem here is that I don't know what you mean by
>> "async flips".  It's an X term that obviously does not mean what I thought
>> it meant.
> 
> Async means immediate (or as close to it as possibly, maybe hsync
> depending on the hw) not at vsync.

Exactly.


 That said, I'd be interested in hearing about any test cases where 4
 buffers provide a significant boost over 3.


 A little history that may be useful: Quadbuffering was originally added
 for DRI3+present here:


 https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d
>>>
>>> So the commit message claims. If you look at the code after that change
>>> though, it's basically impossible to end up with 4 buffers (at least
>>> permanently), since it would require all these conditions to be true at
>>> the same:
>>>
>>> 1. priv->flipping (the last Present request was completed as a flip)
>>> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
>>>driver doesn't support async flips)
>>> 3. priv->swap_interval == 0
>>>
>>> Given 2, 1 & 3 are mutually exclusive.
>>
>> I'm not seeing how 1 & 3 are mutually exclusive.  priv->swap_interval
>> doesn't seem to have anything to do with whether or not you're flipping.

priv->swap_interval == 0 can only use flips if async flips are supported.


So WRT https://bugs.freedesktop.org/show_bug.cgi?id=97549, let's not
jump to any conclusions but look at how many buffers actually end up
being used for what reasons in each case. I suspect there might be some
surprises. :)


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast

Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-01 Thread Alex Deucher
On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand  wrote:
> On Aug 31, 2016 11:39 PM, "Michel Dänzer"  wrote:
>>
>> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
>> >
>> >
>> > On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer > > > wrote:
>> >
>> > On 31/08/16 11:21 PM, Jason Ekstrand wrote:
>> > > On Aug 19, 2016 12:07 AM, "Michel Dänzer" > > 
>> > > >> wrote:
>> > >> From: Michel Dänzer > > 
>> > > >>
>> > >>
>> > >> Always use 3 buffers when flipping. With only 2 buffers, we have
>> > to wait
>> > >> for a flip to complete (which takes non-0 time even with
>> > asynchronous
>> > >> flips) before we can start working on the next frame. We were
>> > previously
>> > >> only using 2 buffers for flipping if the X server supports
>> > asynchronous
>> > >> flips, even when we're not using asynchronous flips. This could
>> > result
>> > >> in bad performance (the referenced bug report is an extreme case,
>> > where
>> > >> the inter-frame stalls were preventing the GPU from reaching its
>> > maximum
>> > >> clocks).
>> > >
>> > > Sorry for the post-push review but I don't usually pay much
>> > attention to
>> > > the window system code.  In any case, I believe you're doing your
>> > > counting wrong.  When flipping with swapinterval=0, you need 4
>> > buffers:
>> > >
>> > > 1. The buffer currently being scanned out  (will be released at
>> > next vblank)
>> > > 2. The buffer X has queued for scanout but is waiting on vblank
>> >
>> > s/vblank/flip/g, since async flips may not wait for vblank, but
>> > yeah.
>> >
>> > > 3. The buffer the application has just submitted which X will
>> > queue next
>> > > of it doesn't get another before the window closes.
>> > > 4. The buffer the application is using for rendering.
>> > >
>> > > With only 3, you get a stall during that window in which X has
>> > queued
>> > > another flip but we're waiting on vblank before the flip begins.
>> > An I
>> > > missing something?
>> >
>> > Nothing, except maybe the paragraph below stating that I couldn't
>> > measure any benefit from using 4 buffers. :) I'm not exactly sure
>> > why,
>> > but I suspect it might be because even with just 3 buffers, the GPU
>> > can
>> > always work on at least one frame ahead of time.
>> >
>> > Also note that even before my change, we were only using 3 buffers
>> > when
>> > the X driver supports async flips (with swap interval 0; only 2
>> > buffers
>> > with swap interval > 0).
>> >
>> >
>> > Yes, because with async flips you don't have a buffer sitting queued in
>> > the kernel waiting to be flipped which you can't cancel.
>>
>> Actually, there is. Even async flips take non-0 time to complete.
>>
>>
>> > that makes perfect sense.
>>
>> What exactly does? My change may not be perfect, but the logic before it
>> was mostly backwards.
>
> I think perhaps the problem here is that I don't know what you mean by
> "async flips".  It's an X term that obviously does not mean what I thought
> it meant.

Async means immediate (or as close to it as possibly, maybe hsync
depending on the hw) not at vsync.

Alex

>
>> > That said, I'd be interested in hearing about any test cases where 4
>> > buffers provide a significant boost over 3.
>> >
>> >
>> > A little history that may be useful: Quadbuffering was originally added
>> > for DRI3+present here:
>> >
>> >
>> > https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d
>>
>> So the commit message claims. If you look at the code after that change
>> though, it's basically impossible to end up with 4 buffers (at least
>> permanently), since it would require all these conditions to be true at
>> the same:
>>
>> 1. priv->flipping (the last Present request was completed as a flip)
>> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
>>driver doesn't support async flips)
>> 3. priv->swap_interval == 0
>>
>> Given 2, 1 & 3 are mutually exclusive.
>
> I'm not seeing how 1 & 3 are mutually exclusive.  priv->swap_interval
> doesn't seem to have anything to do with whether or not you're flipping.
>
>> > Unfortunately, neither of those specify precise metrics.  Eero's bug had
>> > some very concrete numbers.  Hopefully he can provide you with the
>> > details you need for further analysis.
>>
>> Indeed. FWIW, my feeling is that my change most likely just exposed
>> issues elsewhere.
>>
>>
>> --
>> Earthling Michel Dänzer   |   http://www.amd.com
>> Libre software enthusiast | Mesa and X developer
>
>
> 

Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-01 Thread Jason Ekstrand
On Aug 31, 2016 11:39 PM, "Michel Dänzer"  wrote:
>
> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
> >
> >
> > On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer  > > wrote:
> >
> > On 31/08/16 11:21 PM, Jason Ekstrand wrote:
> > > On Aug 19, 2016 12:07 AM, "Michel Dänzer" 
> > > >> wrote:
> > >> From: Michel Dänzer  > > >>
> > >>
> > >> Always use 3 buffers when flipping. With only 2 buffers, we have
to wait
> > >> for a flip to complete (which takes non-0 time even with
asynchronous
> > >> flips) before we can start working on the next frame. We were
previously
> > >> only using 2 buffers for flipping if the X server supports
asynchronous
> > >> flips, even when we're not using asynchronous flips. This could
result
> > >> in bad performance (the referenced bug report is an extreme
case, where
> > >> the inter-frame stalls were preventing the GPU from reaching its
maximum
> > >> clocks).
> > >
> > > Sorry for the post-push review but I don't usually pay much
attention to
> > > the window system code.  In any case, I believe you're doing your
> > > counting wrong.  When flipping with swapinterval=0, you need 4
buffers:
> > >
> > > 1. The buffer currently being scanned out  (will be released at
next vblank)
> > > 2. The buffer X has queued for scanout but is waiting on vblank
> >
> > s/vblank/flip/g, since async flips may not wait for vblank, but
yeah.
> >
> > > 3. The buffer the application has just submitted which X will
queue next
> > > of it doesn't get another before the window closes.
> > > 4. The buffer the application is using for rendering.
> > >
> > > With only 3, you get a stall during that window in which X has
queued
> > > another flip but we're waiting on vblank before the flip begins.
An I
> > > missing something?
> >
> > Nothing, except maybe the paragraph below stating that I couldn't
> > measure any benefit from using 4 buffers. :) I'm not exactly sure
why,
> > but I suspect it might be because even with just 3 buffers, the GPU
can
> > always work on at least one frame ahead of time.
> >
> > Also note that even before my change, we were only using 3 buffers
when
> > the X driver supports async flips (with swap interval 0; only 2
buffers
> > with swap interval > 0).
> >
> >
> > Yes, because with async flips you don't have a buffer sitting queued in
> > the kernel waiting to be flipped which you can't cancel.
>
> Actually, there is. Even async flips take non-0 time to complete.
>
>
> > that makes perfect sense.
>
> What exactly does? My change may not be perfect, but the logic before it
> was mostly backwards.

I think perhaps the problem here is that I don't know what you mean by
"async flips".  It's an X term that obviously does not mean what I thought
it meant.

> > That said, I'd be interested in hearing about any test cases where 4
> > buffers provide a significant boost over 3.
> >
> >
> > A little history that may be useful: Quadbuffering was originally added
> > for DRI3+present here:
> >
> >
https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d
>
> So the commit message claims. If you look at the code after that change
> though, it's basically impossible to end up with 4 buffers (at least
> permanently), since it would require all these conditions to be true at
> the same:
>
> 1. priv->flipping (the last Present request was completed as a flip)
> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
>driver doesn't support async flips)
> 3. priv->swap_interval == 0
>
> Given 2, 1 & 3 are mutually exclusive.

I'm not seeing how 1 & 3 are mutually exclusive.  priv->swap_interval
doesn't seem to have anything to do with whether or not you're flipping.

> > Unfortunately, neither of those specify precise metrics.  Eero's bug had
> > some very concrete numbers.  Hopefully he can provide you with the
> > details you need for further analysis.
>
> Indeed. FWIW, my feeling is that my change most likely just exposed
> issues elsewhere.
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-09-01 Thread Michel Dänzer
On 01/09/16 02:05 PM, Jason Ekstrand wrote:
> 
> 
> On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer  > wrote:
> 
> On 31/08/16 11:21 PM, Jason Ekstrand wrote:
> > On Aug 19, 2016 12:07 AM, "Michel Dänzer"  
> > >> wrote:
> >> From: Michel Dänzer  
> > >>
> >>
> >> Always use 3 buffers when flipping. With only 2 buffers, we have to 
> wait
> >> for a flip to complete (which takes non-0 time even with asynchronous
> >> flips) before we can start working on the next frame. We were 
> previously
> >> only using 2 buffers for flipping if the X server supports asynchronous
> >> flips, even when we're not using asynchronous flips. This could result
> >> in bad performance (the referenced bug report is an extreme case, where
> >> the inter-frame stalls were preventing the GPU from reaching its 
> maximum
> >> clocks).
> >
> > Sorry for the post-push review but I don't usually pay much attention to
> > the window system code.  In any case, I believe you're doing your
> > counting wrong.  When flipping with swapinterval=0, you need 4 buffers:
> >
> > 1. The buffer currently being scanned out  (will be released at next 
> vblank)
> > 2. The buffer X has queued for scanout but is waiting on vblank
> 
> s/vblank/flip/g, since async flips may not wait for vblank, but yeah.
> 
> > 3. The buffer the application has just submitted which X will queue next
> > of it doesn't get another before the window closes.
> > 4. The buffer the application is using for rendering.
> >
> > With only 3, you get a stall during that window in which X has queued
> > another flip but we're waiting on vblank before the flip begins. An I
> > missing something?
> 
> Nothing, except maybe the paragraph below stating that I couldn't
> measure any benefit from using 4 buffers. :) I'm not exactly sure why,
> but I suspect it might be because even with just 3 buffers, the GPU can
> always work on at least one frame ahead of time.
> 
> Also note that even before my change, we were only using 3 buffers when
> the X driver supports async flips (with swap interval 0; only 2 buffers
> with swap interval > 0).
> 
> 
> Yes, because with async flips you don't have a buffer sitting queued in
> the kernel waiting to be flipped which you can't cancel. 

Actually, there is. Even async flips take non-0 time to complete.


> that makes perfect sense.

What exactly does? My change may not be perfect, but the logic before it
was mostly backwards.


> That said, I'd be interested in hearing about any test cases where 4
> buffers provide a significant boost over 3.
> 
> 
> A little history that may be useful: Quadbuffering was originally added
> for DRI3+present here:
> 
> https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d

So the commit message claims. If you look at the code after that change
though, it's basically impossible to end up with 4 buffers (at least
permanently), since it would require all these conditions to be true at
the same:

1. priv->flipping (the last Present request was completed as a flip)
2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
   driver doesn't support async flips)
3. priv->swap_interval == 0

Given 2, 1 & 3 are mutually exclusive.


> Unfortunately, neither of those specify precise metrics.  Eero's bug had
> some very concrete numbers.  Hopefully he can provide you with the
> details you need for further analysis.

Indeed. FWIW, my feeling is that my change most likely just exposed
issues elsewhere.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-31 Thread Jason Ekstrand
On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer  wrote:

> On 31/08/16 11:21 PM, Jason Ekstrand wrote:
> > On Aug 19, 2016 12:07 AM, "Michel Dänzer"  > > wrote:
> >> From: Michel Dänzer  > >
> >>
> >> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
> >> for a flip to complete (which takes non-0 time even with asynchronous
> >> flips) before we can start working on the next frame. We were previously
> >> only using 2 buffers for flipping if the X server supports asynchronous
> >> flips, even when we're not using asynchronous flips. This could result
> >> in bad performance (the referenced bug report is an extreme case, where
> >> the inter-frame stalls were preventing the GPU from reaching its maximum
> >> clocks).
> >
> > Sorry for the post-push review but I don't usually pay much attention to
> > the window system code.  In any case, I believe you're doing your
> > counting wrong.  When flipping with swapinterval=0, you need 4 buffers:
> >
> > 1. The buffer currently being scanned out  (will be released at next
> vblank)
> > 2. The buffer X has queued for scanout but is waiting on vblank
>
> s/vblank/flip/g, since async flips may not wait for vblank, but yeah.
>
> > 3. The buffer the application has just submitted which X will queue next
> > of it doesn't get another before the window closes.
> > 4. The buffer the application is using for rendering.
> >
> > With only 3, you get a stall during that window in which X has queued
> > another flip but we're waiting on vblank before the flip begins. An I
> > missing something?
>
> Nothing, except maybe the paragraph below stating that I couldn't
> measure any benefit from using 4 buffers. :) I'm not exactly sure why,
> but I suspect it might be because even with just 3 buffers, the GPU can
> always work on at least one frame ahead of time.
>
> Also note that even before my change, we were only using 3 buffers when
> the X driver supports async flips (with swap interval 0; only 2 buffers
> with swap interval > 0).
>

Yes, because with async flips you don't have a buffer sitting queued in the
kernel waiting to be flipped which you can't cancel.  that makes perfect
sense.


> That said, I'd be interested in hearing about any test cases where 4
> buffers provide a significant boost over 3.
>

A little history that may be useful: Quadbuffering was originally added for
DRI3+present here:

https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a36ef5fe23056299a77414f9ad8b5e5a1d

In Wayland, the change was made here

https://cgit.freedesktop.org/mesa/mesa/commit/?id=992a2dbba80aba35efe83202e1013bd6143f0dba

Unfortunately, neither of those specify precise metrics.  Eero's bug had
some very concrete numbers.  Hopefully he can provide you with the details
you need for further analysis.


>
> >> I couldn't measure any performance boost using 4 buffers with flipping.
> >> Performance actually seemed to go down slightly, but that might have
> >> been just noise.
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-31 Thread Michel Dänzer
On 31/08/16 11:21 PM, Jason Ekstrand wrote:
> On Aug 19, 2016 12:07 AM, "Michel Dänzer"  > wrote:
>> From: Michel Dänzer  >
>>
>> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
>> for a flip to complete (which takes non-0 time even with asynchronous
>> flips) before we can start working on the next frame. We were previously
>> only using 2 buffers for flipping if the X server supports asynchronous
>> flips, even when we're not using asynchronous flips. This could result
>> in bad performance (the referenced bug report is an extreme case, where
>> the inter-frame stalls were preventing the GPU from reaching its maximum
>> clocks).
> 
> Sorry for the post-push review but I don't usually pay much attention to
> the window system code.  In any case, I believe you're doing your
> counting wrong.  When flipping with swapinterval=0, you need 4 buffers:
> 
> 1. The buffer currently being scanned out  (will be released at next vblank)
> 2. The buffer X has queued for scanout but is waiting on vblank

s/vblank/flip/g, since async flips may not wait for vblank, but yeah.

> 3. The buffer the application has just submitted which X will queue next
> of it doesn't get another before the window closes.
> 4. The buffer the application is using for rendering.
> 
> With only 3, you get a stall during that window in which X has queued
> another flip but we're waiting on vblank before the flip begins. An I
> missing something?

Nothing, except maybe the paragraph below stating that I couldn't
measure any benefit from using 4 buffers. :) I'm not exactly sure why,
but I suspect it might be because even with just 3 buffers, the GPU can
always work on at least one frame ahead of time.

Also note that even before my change, we were only using 3 buffers when
the X driver supports async flips (with swap interval 0; only 2 buffers
with swap interval > 0).

That said, I'd be interested in hearing about any test cases where 4
buffers provide a significant boost over 3.


>> I couldn't measure any performance boost using 4 buffers with flipping.
>> Performance actually seemed to go down slightly, but that might have
>> been just noise.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-31 Thread Jason Ekstrand
On Aug 19, 2016 12:07 AM, "Michel Dänzer"  wrote:
>
> From: Michel Dänzer 
>
> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
> for a flip to complete (which takes non-0 time even with asynchronous
> flips) before we can start working on the next frame. We were previously
> only using 2 buffers for flipping if the X server supports asynchronous
> flips, even when we're not using asynchronous flips. This could result
> in bad performance (the referenced bug report is an extreme case, where
> the inter-frame stalls were preventing the GPU from reaching its maximum
> clocks).

Sorry for the post-push review but I don't usually pay much attention to
the window system code.  In any case, I believe you're doing your counting
wrong.  When flipping with swapinterval=0, you need 4 buffers:

1. The buffer currently being scanned out  (will be released at next vblank)
2. The buffer X has queued for scanout but is waiting on vblank
3. The buffer the application has just submitted which X will queue next of
it doesn't get another before the window closes.
4. The buffer the application is using for rendering.

With only 3, you get a stall during that window in which X has queued
another flip but we're waiting on vblank before the flip begins. An I
missing something?

--Jason

> I couldn't measure any performance boost using 4 buffers with flipping.
> Performance actually seemed to go down slightly, but that might have
> been just noise.
>
> Without flipping, a single back buffer is enough for swap interval 0,
> but we need to use 2 back buffers when the swap interval is non-0,
> otherwise we have to wait for the swap interval to pass before we can
> start working on the next frame. This condition was previously reversed.
>
> Cc: "12.0 11.2" 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
> Signed-off-by: Michel Dänzer 
> ---
>  src/loader/loader_dri3_helper.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/src/loader/loader_dri3_helper.c
b/src/loader/loader_dri3_helper.c
> index e9fb97b..86ae5ae 100644
> --- a/src/loader/loader_dri3_helper.c
> +++ b/src/loader/loader_dri3_helper.c
> @@ -68,15 +68,12 @@ dri3_fence_await(xcb_connection_t *c, struct
loader_dri3_buffer *buffer)
>  static void
>  dri3_update_num_back(struct loader_dri3_drawable *draw)
>  {
> -   draw->num_back = 1;
> -   if (draw->flipping) {
> -  if (!draw->is_pixmap &&
> -  !(draw->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC))
> - draw->num_back++;
> -  draw->num_back++;
> -   }
> -   if (draw->vtable->get_swap_interval(draw) == 0)
> -  draw->num_back++;
> +   if (draw->flipping)
> +  draw->num_back = 3;
> +   else if (draw->vtable->get_swap_interval(draw) != 0)
> +  draw->num_back = 2;
> +   else
> +  draw->num_back = 1;
>  }
>
>  void
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-28 Thread Michel Dänzer
On 28/08/16 03:03 AM, Dieter Nützel wrote:
> Am 25.08.2016 11:09, schrieb Michel Dänzer:
>> On 24/08/16 06:35 AM, Eric Anholt wrote:
>>> Michel Dänzer  writes:
 On 20/08/16 04:42 AM, Eric Anholt wrote:
> Michel Dänzer  writes:
>
>> From: Michel Dänzer 
>>
>> Always use 3 buffers when flipping. With only 2 buffers, we have
>> to wait
>> for a flip to complete (which takes non-0 time even with asynchronous
>> flips) before we can start working on the next frame. We were
>> previously
>> only using 2 buffers for flipping if the X server supports
>> asynchronous
>> flips, even when we're not using asynchronous flips. This could
>> result
>> in bad performance (the referenced bug report is an extreme case,
>> where
>> the inter-frame stalls were preventing the GPU from reaching its
>> maximum
>> clocks).
>>
>> I couldn't measure any performance boost using 4 buffers with
>> flipping.
>> Performance actually seemed to go down slightly, but that might have
>> been just noise.
>>
>> Without flipping, a single back buffer is enough for swap interval 0,
>> but we need to use 2 back buffers when the swap interval is non-0,
>> otherwise we have to wait for the swap interval to pass before we can
>> start working on the next frame. This condition was previously
>> reversed.
>>
>> Cc: "12.0 11.2" 
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
>> Signed-off-by: Michel Dänzer 
>
> Reviewed-by: Eric Anholt 

 Thanks.
> 
> Hello Michel,
> 
> found this _before_ your commit, but haven't had time, so...
> 
> With this patch Blender at least 2.76b flicker horribly with
> 
> 'User Preferences...' -> 'Window Draw Method: Automatic'
> Switching to:
> 'Window Draw Method: Tripple Buffer'
> fix it.
> 
> I'll add this to the related bug if you want.

It's a blender bug:

https://bugs.freedesktop.org/show_bug.cgi?id=97059


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-27 Thread Dieter Nützel

Am 25.08.2016 11:09, schrieb Michel Dänzer:

On 24/08/16 06:35 AM, Eric Anholt wrote:

Michel Dänzer  writes:

On 20/08/16 04:42 AM, Eric Anholt wrote:

Michel Dänzer  writes:


From: Michel Dänzer 

Always use 3 buffers when flipping. With only 2 buffers, we have to 
wait
for a flip to complete (which takes non-0 time even with 
asynchronous
flips) before we can start working on the next frame. We were 
previously
only using 2 buffers for flipping if the X server supports 
asynchronous
flips, even when we're not using asynchronous flips. This could 
result
in bad performance (the referenced bug report is an extreme case, 
where
the inter-frame stalls were preventing the GPU from reaching its 
maximum

clocks).

I couldn't measure any performance boost using 4 buffers with 
flipping.
Performance actually seemed to go down slightly, but that might 
have

been just noise.

Without flipping, a single back buffer is enough for swap interval 
0,

but we need to use 2 back buffers when the swap interval is non-0,
otherwise we have to wait for the swap interval to pass before we 
can
start working on the next frame. This condition was previously 
reversed.


Cc: "12.0 11.2" 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
Signed-off-by: Michel Dänzer 


Reviewed-by: Eric Anholt 


Thanks.


Hello Michel,

found this _before_ your commit, but haven't had time, so...

With this patch Blender at least 2.76b flicker horribly with

'User Preferences...' -> 'Window Draw Method: Automatic'
Switching to:
'Window Draw Method: Tripple Buffer'
fix it.

I'll add this to the related bug if you want.

Greetings,

Dieter
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-25 Thread Michel Dänzer
On 24/08/16 06:35 AM, Eric Anholt wrote:
> Michel Dänzer  writes:
>> On 20/08/16 04:42 AM, Eric Anholt wrote:
>>> Michel Dänzer  writes:
>>>
 From: Michel Dänzer 

 Always use 3 buffers when flipping. With only 2 buffers, we have to wait
 for a flip to complete (which takes non-0 time even with asynchronous
 flips) before we can start working on the next frame. We were previously
 only using 2 buffers for flipping if the X server supports asynchronous
 flips, even when we're not using asynchronous flips. This could result
 in bad performance (the referenced bug report is an extreme case, where
 the inter-frame stalls were preventing the GPU from reaching its maximum
 clocks).

 I couldn't measure any performance boost using 4 buffers with flipping.
 Performance actually seemed to go down slightly, but that might have
 been just noise.

 Without flipping, a single back buffer is enough for swap interval 0,
 but we need to use 2 back buffers when the swap interval is non-0,
 otherwise we have to wait for the swap interval to pass before we can
 start working on the next frame. This condition was previously reversed.

 Cc: "12.0 11.2" 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
 Signed-off-by: Michel Dänzer 
>>>
>>> Reviewed-by: Eric Anholt 
>>
>> Thanks.
>>
>> Note that four piglit EGL tests fail with this change, but AFAICT all of
>> those tests are broken. I submitted fixes for three of them:
>>
>> https://patchwork.freedesktop.org/patch/106809/
>> https://patchwork.freedesktop.org/patch/106807/
>> https://patchwork.freedesktop.org/patch/106808/
>>
>> The last one is egl_chromium_sync_control. It calls eglSwapBuffers twice
>> in a row and expects that the SBC is higher than before immediately
>> after that. I.e. it expects that there is only one back buffer, so that
>> at least the second eglSwapBuffers has to block until the first swap
>> finishes. Which is no longer true with this DRI3 change. Not sure what
>> to do about this one.
> 
> From a quick glance, change the test to wait on the SBC before querying
> the new one, maybe?

I'm not sure how to do that.

How about the attached patch? If it looks good, somebody please push it
for me.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
From 851a77647eb58e56bf909ac4c49217d6fa43c663 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= 
Date: Thu, 25 Aug 2016 18:05:53 +0900
Subject: [PATCH] EGL_CHROMIUM_sync_control: Wait for up to 2 seconds for SBC
 to increase
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Instead of expecting it to increase immediately after two eglSwapBuffers
calls.

Mesa's DRI3 platform can use more than one back buffer, in which case it
can queue up multiple buffer swaps before any of them completes.

We wait for up to 2 seconds so that we can pass with the Present
extension fake CRTC, which ticks at 1Hz.

Signed-off-by: Michel Dänzer 
---
 .../egl_chromium_sync_control.c| 32 +++---
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/tests/egl/spec/egl_chromium_sync_control/egl_chromium_sync_control.c b/tests/egl/spec/egl_chromium_sync_control/egl_chromium_sync_control.c
index 2df2a52..6f2aee8 100644
--- a/tests/egl/spec/egl_chromium_sync_control/egl_chromium_sync_control.c
+++ b/tests/egl/spec/egl_chromium_sync_control/egl_chromium_sync_control.c
@@ -161,10 +161,8 @@ test_eglGetSyncValuesCHROMIUM_ust_test(struct egl_state *state)
  *
  * Calling glFinish() won't force the swap buffers to happen; it will only flush
  * the command to the compositor.  By default, the buffer swap for each drawable
- * object is rate limited to happen no more than 60 times a second.
- * SwapBuffers() will block in order to enforce this.  Therefore, by the time
- * our second SwapBuffer() call returns, we are guaranteed that SBC will have
- * incremented at least once, possibly twice, but not more than twice.
+ * object is rate limited to happen no more than 60 times a second. Therefore,
+ * we have to wait for the SBC to increase, but it should happen eventually.
  *
  * When buffer swaps are synchronized to the vertical retrace (which is the
  * case when the surface's swap interval is non-zero), then the MSC increments
@@ -180,6 +178,7 @@ test_eglGetSyncValuesCHROMIUM_msc_and_sbc_test(struct egl_state *state)
 	EGLuint64KHR msc2 = canary;
 	EGLuint64KHR sbc2 = canary;
 	EGLint min_swap_interval;
+	int i;
 
 	if (!eglGetConfigAttrib(state->egl_dpy, state->cfg,
 EGL_MIN_SWAP_INTERVAL, _swap_interval)) {
@@ -188,7 +187,7 @@ 

Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-23 Thread Eric Anholt
Michel Dänzer  writes:

> On 20/08/16 04:42 AM, Eric Anholt wrote:
>> Michel Dänzer  writes:
>> 
>>> From: Michel Dänzer 
>>>
>>> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
>>> for a flip to complete (which takes non-0 time even with asynchronous
>>> flips) before we can start working on the next frame. We were previously
>>> only using 2 buffers for flipping if the X server supports asynchronous
>>> flips, even when we're not using asynchronous flips. This could result
>>> in bad performance (the referenced bug report is an extreme case, where
>>> the inter-frame stalls were preventing the GPU from reaching its maximum
>>> clocks).
>>>
>>> I couldn't measure any performance boost using 4 buffers with flipping.
>>> Performance actually seemed to go down slightly, but that might have
>>> been just noise.
>>>
>>> Without flipping, a single back buffer is enough for swap interval 0,
>>> but we need to use 2 back buffers when the swap interval is non-0,
>>> otherwise we have to wait for the swap interval to pass before we can
>>> start working on the next frame. This condition was previously reversed.
>>>
>>> Cc: "12.0 11.2" 
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
>>> Signed-off-by: Michel Dänzer 
>> 
>> Reviewed-by: Eric Anholt 
>
> Thanks.
>
> Note that four piglit EGL tests fail with this change, but AFAICT all of
> those tests are broken. I submitted fixes for three of them:
>
> https://patchwork.freedesktop.org/patch/106809/
> https://patchwork.freedesktop.org/patch/106807/
> https://patchwork.freedesktop.org/patch/106808/
>
> The last one is egl_chromium_sync_control. It calls eglSwapBuffers twice
> in a row and expects that the SBC is higher than before immediately
> after that. I.e. it expects that there is only one back buffer, so that
> at least the second eglSwapBuffers has to block until the first swap
> finishes. Which is no longer true with this DRI3 change. Not sure what
> to do about this one.

From a quick glance, change the test to wait on the SBC before querying
the new one, maybe?  It does seem questionable to assume that a swap
waits on the previous swap.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-23 Thread Michel Dänzer
On 20/08/16 04:42 AM, Eric Anholt wrote:
> Michel Dänzer  writes:
> 
>> From: Michel Dänzer 
>>
>> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
>> for a flip to complete (which takes non-0 time even with asynchronous
>> flips) before we can start working on the next frame. We were previously
>> only using 2 buffers for flipping if the X server supports asynchronous
>> flips, even when we're not using asynchronous flips. This could result
>> in bad performance (the referenced bug report is an extreme case, where
>> the inter-frame stalls were preventing the GPU from reaching its maximum
>> clocks).
>>
>> I couldn't measure any performance boost using 4 buffers with flipping.
>> Performance actually seemed to go down slightly, but that might have
>> been just noise.
>>
>> Without flipping, a single back buffer is enough for swap interval 0,
>> but we need to use 2 back buffers when the swap interval is non-0,
>> otherwise we have to wait for the swap interval to pass before we can
>> start working on the next frame. This condition was previously reversed.
>>
>> Cc: "12.0 11.2" 
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
>> Signed-off-by: Michel Dänzer 
> 
> Reviewed-by: Eric Anholt 

Thanks.

Note that four piglit EGL tests fail with this change, but AFAICT all of
those tests are broken. I submitted fixes for three of them:

https://patchwork.freedesktop.org/patch/106809/
https://patchwork.freedesktop.org/patch/106807/
https://patchwork.freedesktop.org/patch/106808/

The last one is egl_chromium_sync_control. It calls eglSwapBuffers twice
in a row and expects that the SBC is higher than before immediately
after that. I.e. it expects that there is only one back buffer, so that
at least the second eglSwapBuffers has to block until the first swap
finishes. Which is no longer true with this DRI3 change. Not sure what
to do about this one.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-19 Thread Eric Anholt
Michel Dänzer  writes:

> From: Michel Dänzer 
>
> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
> for a flip to complete (which takes non-0 time even with asynchronous
> flips) before we can start working on the next frame. We were previously
> only using 2 buffers for flipping if the X server supports asynchronous
> flips, even when we're not using asynchronous flips. This could result
> in bad performance (the referenced bug report is an extreme case, where
> the inter-frame stalls were preventing the GPU from reaching its maximum
> clocks).
>
> I couldn't measure any performance boost using 4 buffers with flipping.
> Performance actually seemed to go down slightly, but that might have
> been just noise.
>
> Without flipping, a single back buffer is enough for swap interval 0,
> but we need to use 2 back buffers when the swap interval is non-0,
> otherwise we have to wait for the swap interval to pass before we can
> start working on the next frame. This condition was previously reversed.
>
> Cc: "12.0 11.2" 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
> Signed-off-by: Michel Dänzer 

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-19 Thread Keith Packard
Michel Dänzer  writes:

> From: Michel Dänzer 
>
> Always use 3 buffers when flipping. With only 2 buffers, we have to wait
> for a flip to complete (which takes non-0 time even with asynchronous
> flips) before we can start working on the next frame. We were previously
> only using 2 buffers for flipping if the X server supports asynchronous
> flips, even when we're not using asynchronous flips. This could result
> in bad performance (the referenced bug report is an extreme case, where
> the inter-frame stalls were preventing the GPU from reaching its maximum
> clocks).

Makes sense to me.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-19 Thread Frank Binns

On 19/08/16 08:07, Michel Dänzer wrote:

From: Michel Dänzer 

Always use 3 buffers when flipping. With only 2 buffers, we have to wait
for a flip to complete (which takes non-0 time even with asynchronous
flips) before we can start working on the next frame. We were previously
only using 2 buffers for flipping if the X server supports asynchronous
flips, even when we're not using asynchronous flips. This could result
in bad performance (the referenced bug report is an extreme case, where
the inter-frame stalls were preventing the GPU from reaching its maximum
clocks).

I couldn't measure any performance boost using 4 buffers with flipping.
Performance actually seemed to go down slightly, but that might have
been just noise.

Without flipping, a single back buffer is enough for swap interval 0,
but we need to use 2 back buffers when the swap interval is non-0,
otherwise we have to wait for the swap interval to pass before we can
start working on the next frame. This condition was previously reversed.

Cc: "12.0 11.2" 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
Signed-off-by: Michel Dänzer 


Reviewed-by: Frank Binns 


---
  src/loader/loader_dri3_helper.c | 15 ++-
  1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index e9fb97b..86ae5ae 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -68,15 +68,12 @@ dri3_fence_await(xcb_connection_t *c, struct 
loader_dri3_buffer *buffer)
  static void
  dri3_update_num_back(struct loader_dri3_drawable *draw)
  {
-   draw->num_back = 1;
-   if (draw->flipping) {
-  if (!draw->is_pixmap &&
-  !(draw->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC))
- draw->num_back++;
-  draw->num_back++;
-   }
-   if (draw->vtable->get_swap_interval(draw) == 0)
-  draw->num_back++;
+   if (draw->flipping)
+  draw->num_back = 3;
+   else if (draw->vtable->get_swap_interval(draw) != 0)
+  draw->num_back = 2;
+   else
+  draw->num_back = 1;
  }
  
  void


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

2016-08-19 Thread Michel Dänzer
From: Michel Dänzer 

Always use 3 buffers when flipping. With only 2 buffers, we have to wait
for a flip to complete (which takes non-0 time even with asynchronous
flips) before we can start working on the next frame. We were previously
only using 2 buffers for flipping if the X server supports asynchronous
flips, even when we're not using asynchronous flips. This could result
in bad performance (the referenced bug report is an extreme case, where
the inter-frame stalls were preventing the GPU from reaching its maximum
clocks).

I couldn't measure any performance boost using 4 buffers with flipping.
Performance actually seemed to go down slightly, but that might have
been just noise.

Without flipping, a single back buffer is enough for swap interval 0,
but we need to use 2 back buffers when the swap interval is non-0,
otherwise we have to wait for the swap interval to pass before we can
start working on the next frame. This condition was previously reversed.

Cc: "12.0 11.2" 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
Signed-off-by: Michel Dänzer 
---
 src/loader/loader_dri3_helper.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index e9fb97b..86ae5ae 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -68,15 +68,12 @@ dri3_fence_await(xcb_connection_t *c, struct 
loader_dri3_buffer *buffer)
 static void
 dri3_update_num_back(struct loader_dri3_drawable *draw)
 {
-   draw->num_back = 1;
-   if (draw->flipping) {
-  if (!draw->is_pixmap &&
-  !(draw->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC))
- draw->num_back++;
-  draw->num_back++;
-   }
-   if (draw->vtable->get_swap_interval(draw) == 0)
-  draw->num_back++;
+   if (draw->flipping)
+  draw->num_back = 3;
+   else if (draw->vtable->get_swap_interval(draw) != 0)
+  draw->num_back = 2;
+   else
+  draw->num_back = 1;
 }
 
 void
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev