Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-11 Thread Christoph Hellwig
I think the prime issue is that dma_direct_alloc respects the dma
mask.  Which we don't need if actually using the iommu.  This would
be mostly harmless exept for the the SEV bit high in the address that
makes the checks fail.

For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
addressing these issues properly.


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-11 Thread Christoph Hellwig
I think the prime issue is that dma_direct_alloc respects the dma
mask.  Which we don't need if actually using the iommu.  This would
be mostly harmless exept for the the SEV bit high in the address that
makes the checks fail.

For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
addressing these issues properly.


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-06 Thread Christian König

Am 06.06.2018 um 16:12 schrieb Michel Dänzer:

On 2018-06-06 03:33 PM, Gabriel C wrote:

2018-06-06 14:19 GMT+02:00 Christian König :

Am 06.06.2018 um 14:08 schrieb Gabriel C:

2018-06-06 13:33 GMT+02:00 Christian König :

Am 06.06.2018 um 13:28 schrieb Gabriel C:

2018-04-11 7:02 GMT+02:00 Gabriel C :


[    6.337838] [drm] PCIE GART of 2048M enabled (table at
0x001D6000).
[    6.338210] radeon :21:00.0: (-12) create WB bo failed
[    6.338214] radeon :21:00.0: disabling GPU acceleration

...


I have the same Issue now on final 4.17.


Please file a bug report, and ideally bisect which commit(s) 
introduced the issue(s).



http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt 



http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt 



Also nothing else changed in that setup just testing kernel 4.17.



That has nothing TODO with the driver nor the original bug you 
reported. The
problem is that SME is active and that is currently not supported at 
all

with a that hardware.


Ok .. so are we playing now kernel an AMD Hardware roulette on each 
release ?


SME was like this in kernel 4.16.x here and all worked.


If that is true, again please bisect which commit broke it.

All the reports I've seen before this indicated that at least amdgpu 
has never worked with SME (which BTW doesn't mean it's never going to 
work or that we don't want to support it, just that as far as we know 
it's currently not working).


At least in theory it should work when we use the coherent DMA allocator.

When that really worked before, so the most likely commit which broke 
this is:


commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou 
Date:   Fri Feb 9 10:44:09 2018 +0800

    drm/amdgpu: only enable swiotlb alloc when need v2

    get the max io mapping address of system memory to see if it is over
    our card accessing range.
    v2: move checking later

    Signed-off-by: Chunming Zhou 
    Reviewed-by: Monk Liu 
    Reviewed-by: Christian König 
    Signed-off-by: Alex Deucher 

Currently looking into how we could somehow improve this detection.

Regards,
Christian.


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-06 Thread Christian König

Am 06.06.2018 um 16:12 schrieb Michel Dänzer:

On 2018-06-06 03:33 PM, Gabriel C wrote:

2018-06-06 14:19 GMT+02:00 Christian König :

Am 06.06.2018 um 14:08 schrieb Gabriel C:

2018-06-06 13:33 GMT+02:00 Christian König :

Am 06.06.2018 um 13:28 schrieb Gabriel C:

2018-04-11 7:02 GMT+02:00 Gabriel C :


[    6.337838] [drm] PCIE GART of 2048M enabled (table at
0x001D6000).
[    6.338210] radeon :21:00.0: (-12) create WB bo failed
[    6.338214] radeon :21:00.0: disabling GPU acceleration

...


I have the same Issue now on final 4.17.


Please file a bug report, and ideally bisect which commit(s) 
introduced the issue(s).



http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt 



http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt 



Also nothing else changed in that setup just testing kernel 4.17.



That has nothing TODO with the driver nor the original bug you 
reported. The
problem is that SME is active and that is currently not supported at 
all

with a that hardware.


Ok .. so are we playing now kernel an AMD Hardware roulette on each 
release ?


SME was like this in kernel 4.16.x here and all worked.


If that is true, again please bisect which commit broke it.

All the reports I've seen before this indicated that at least amdgpu 
has never worked with SME (which BTW doesn't mean it's never going to 
work or that we don't want to support it, just that as far as we know 
it's currently not working).


At least in theory it should work when we use the coherent DMA allocator.

When that really worked before, so the most likely commit which broke 
this is:


commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou 
Date:   Fri Feb 9 10:44:09 2018 +0800

    drm/amdgpu: only enable swiotlb alloc when need v2

    get the max io mapping address of system memory to see if it is over
    our card accessing range.
    v2: move checking later

    Signed-off-by: Chunming Zhou 
    Reviewed-by: Monk Liu 
    Reviewed-by: Christian König 
    Signed-off-by: Alex Deucher 

Currently looking into how we could somehow improve this detection.

Regards,
Christian.


Re: AMD graphics performance regression in 4.15 and later

2018-06-06 Thread Christian König

Am 06.06.2018 um 14:08 schrieb Gabriel C:

2018-06-06 13:33 GMT+02:00 Christian König :

Am 06.06.2018 um 13:28 schrieb Gabriel C:

2018-04-11 7:02 GMT+02:00 Gabriel C :

2018-04-11 6:00 GMT+02:00 Gabriel C :
2018-04-09 11:42 GMT+02:00 Christian König
:

Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

...

I can help testing code for 4.17/++ if you wish but that is *different*
storry.


Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
are broken now in this one.

radeon tells:

...

[6.337838] [drm] PCIE GART of 2048M enabled (table at
0x001D6000).
[6.338210] radeon :21:00.0: (-12) create WB bo failed
[6.338214] radeon :21:00.0: disabling GPU acceleration

...


I have the same Issue now on final 4.17.


Actually Michel came up with a fix for the performance regression which is
now backported to older kernels as well.

So the original issue of this mail thread should be fixed by now.

Ok , will test as soon I get the GPU to work :))


Also I played with BIOS options also which does not fix anything but
changes the error message.

IOMMU && SR-IOV disabled the error changes to this :

[7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
test failed (scratch(0x850C)=0xCAFEDEAD)
[7.092059] radeon :21:00.0: disabling GPU acceleration


While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
kill the GPU with no way
for me to make it work ( at least I could not find any workaround by now )


That actually sounds like something completely different. Can you provide a
full dmesg of radeon and/or amdgpu?

Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt

Also nothing else changed in that setup just testing kernel 4.17.


That has nothing TODO with the driver nor the original bug you reported. 
The problem is that SME is active and that is currently not supported at 
all with a that hardware.


Try to disable SME either in the BIOS or on the kernel command line.

Regards,
Christian.



I can force the GPU to use amdgpu if you wish and post dmesg's too.
Just let me know




Re: AMD graphics performance regression in 4.15 and later

2018-06-06 Thread Christian König

Am 06.06.2018 um 14:08 schrieb Gabriel C:

2018-06-06 13:33 GMT+02:00 Christian König :

Am 06.06.2018 um 13:28 schrieb Gabriel C:

2018-04-11 7:02 GMT+02:00 Gabriel C :

2018-04-11 6:00 GMT+02:00 Gabriel C :
2018-04-09 11:42 GMT+02:00 Christian König
:

Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

...

I can help testing code for 4.17/++ if you wish but that is *different*
storry.


Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
are broken now in this one.

radeon tells:

...

[6.337838] [drm] PCIE GART of 2048M enabled (table at
0x001D6000).
[6.338210] radeon :21:00.0: (-12) create WB bo failed
[6.338214] radeon :21:00.0: disabling GPU acceleration

...


I have the same Issue now on final 4.17.


Actually Michel came up with a fix for the performance regression which is
now backported to older kernels as well.

So the original issue of this mail thread should be fixed by now.

Ok , will test as soon I get the GPU to work :))


Also I played with BIOS options also which does not fix anything but
changes the error message.

IOMMU && SR-IOV disabled the error changes to this :

[7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
test failed (scratch(0x850C)=0xCAFEDEAD)
[7.092059] radeon :21:00.0: disabling GPU acceleration


While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
kill the GPU with no way
for me to make it work ( at least I could not find any workaround by now )


That actually sounds like something completely different. Can you provide a
full dmesg of radeon and/or amdgpu?

Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt

Also nothing else changed in that setup just testing kernel 4.17.


That has nothing TODO with the driver nor the original bug you reported. 
The problem is that SME is active and that is currently not supported at 
all with a that hardware.


Try to disable SME either in the BIOS or on the kernel command line.

Regards,
Christian.



I can force the GPU to use amdgpu if you wish and post dmesg's too.
Just let me know




Re: AMD graphics performance regression in 4.15 and later

2018-04-23 Thread Michel Dänzer
On 2018-04-20 09:40 PM, Felix Kuehling wrote:
> On 2018-04-20 10:47 AM, Michel Dänzer wrote:
>> On 2018-04-11 11:37 AM, Christian König wrote:
>>> Am 11.04.2018 um 06:00 schrieb Gabriel C:
 2018-04-09 11:42 GMT+02:00 Christian König
 :
> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>> Hi Christian,
>>
>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>> Feel free to comment since you have a better understanding of what's
>> going on.
>>
>> One last question: right now I'm running 4.15.0 with the "offending"
>> patch reverted. Is that safe to run or are there possible bad
>> interactions with other changes.
> That should work without problems.
>
> But I just had another idea as well, if you want you could still test
> the
> new code path which will be using in 4.17.
>
 While Firefox may do some strange things is not about only Firefox.

 With your patches my EPYC box is unusable with  4.15++ kernels.
 The whole Desktop is acting weird.  This one is using
 an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

 Box is  2 * EPYC 7281 with 128 GB ECC RAM

 Also a 14C Xeon box with a HD7700 is broken same way.
>>> The hardware is irrelevant for this. We need to know what software stack
>>> you use on top of it.
>>>
>>> E.g. desktop environment/Mesa and DDX version etc...
>>>
 Everything breaks in X .. scrolling , moving windows , flickering etc.


 reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
 648bc3574716400acc06f99915815f80d9563783
 from an 4.15 kernel makes things work again.


> Backporting all the detection logic is to invasive, but you could
> just go
> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
> code path.
>
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>
 Well you really can't be serious about these suggestions ? Are you ?

 Telling peoples to #if 0 random code is not a solution.
>>> That is for testing and not a permanent solution.
>>>
 You broke existsing working userland with your patches and at least
 please fix that for 4.16.

 I can help testing code for 4.17/++ if you wish but that is
 *different* storry.
>>> Please test Alex's amd-staging-drm-next branch from
>>> git://people.freedesktop.org/~agd5f/linux.
>> I think we're still missing something here.
>>
>> I'm currently running 4.16.2 + the DRM subsystem changes which are going
>> into 4.17 (so I have the changes Christian is referring to) with a
>> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
>> Some observations:
>>
>> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
>> order of a minute, during which the kernel is spending most of one
>> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
>> precise), called from ttm_alloc_new_pages.
> Philip debugged a similar problem with a KFD memory stress test about
> two weeks ago, where the kernel was seemingly stuck in an infinite loop
> trying to allocate huge pages. I'm pasting his analysis for the record:
> 
>> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this
>> seems a corner case inside __alloc_pages_slowpath(), it never exits
>> but goes to retry path every time. It can reclaim pages and
>> did_some_progress (as a result, no_progress_loops is reset to 0 every
>> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page
>> allocations under this specific memory pressure.  
> As a workaround to unblock our release branch testing we removed
> transparent huge page allocation from  ttm_get_pages. We're seeing this
> as far back as 4.13 on our release branch.

Thanks for sharing this. In the future, please raise issues like this on
the public mailing lists from the beginning.


> If we're really talking about the same problem, I don't think it's
> caused by recent page allocator changes, but rather exposed by recent
> TTM changes.

It sounds related, but probably not exactly the same problem. I already
had the TTM code using GFP_TRANSHUGE before I ran into the issue. Also,
__alloc_pages_slowpath eventually succeeds for me, it can just take up
to about a minute.

I'm currently testing using (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY)
instead of GFP_TRANSHUGE in TTM.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: AMD graphics performance regression in 4.15 and later

2018-04-23 Thread Michel Dänzer
On 2018-04-20 09:40 PM, Felix Kuehling wrote:
> On 2018-04-20 10:47 AM, Michel Dänzer wrote:
>> On 2018-04-11 11:37 AM, Christian König wrote:
>>> Am 11.04.2018 um 06:00 schrieb Gabriel C:
 2018-04-09 11:42 GMT+02:00 Christian König
 :
> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>> Hi Christian,
>>
>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>> Feel free to comment since you have a better understanding of what's
>> going on.
>>
>> One last question: right now I'm running 4.15.0 with the "offending"
>> patch reverted. Is that safe to run or are there possible bad
>> interactions with other changes.
> That should work without problems.
>
> But I just had another idea as well, if you want you could still test
> the
> new code path which will be using in 4.17.
>
 While Firefox may do some strange things is not about only Firefox.

 With your patches my EPYC box is unusable with  4.15++ kernels.
 The whole Desktop is acting weird.  This one is using
 an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

 Box is  2 * EPYC 7281 with 128 GB ECC RAM

 Also a 14C Xeon box with a HD7700 is broken same way.
>>> The hardware is irrelevant for this. We need to know what software stack
>>> you use on top of it.
>>>
>>> E.g. desktop environment/Mesa and DDX version etc...
>>>
 Everything breaks in X .. scrolling , moving windows , flickering etc.


 reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
 648bc3574716400acc06f99915815f80d9563783
 from an 4.15 kernel makes things work again.


> Backporting all the detection logic is to invasive, but you could
> just go
> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
> code path.
>
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>
 Well you really can't be serious about these suggestions ? Are you ?

 Telling peoples to #if 0 random code is not a solution.
>>> That is for testing and not a permanent solution.
>>>
 You broke existsing working userland with your patches and at least
 please fix that for 4.16.

 I can help testing code for 4.17/++ if you wish but that is
 *different* storry.
>>> Please test Alex's amd-staging-drm-next branch from
>>> git://people.freedesktop.org/~agd5f/linux.
>> I think we're still missing something here.
>>
>> I'm currently running 4.16.2 + the DRM subsystem changes which are going
>> into 4.17 (so I have the changes Christian is referring to) with a
>> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
>> Some observations:
>>
>> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
>> order of a minute, during which the kernel is spending most of one
>> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
>> precise), called from ttm_alloc_new_pages.
> Philip debugged a similar problem with a KFD memory stress test about
> two weeks ago, where the kernel was seemingly stuck in an infinite loop
> trying to allocate huge pages. I'm pasting his analysis for the record:
> 
>> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this
>> seems a corner case inside __alloc_pages_slowpath(), it never exits
>> but goes to retry path every time. It can reclaim pages and
>> did_some_progress (as a result, no_progress_loops is reset to 0 every
>> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page
>> allocations under this specific memory pressure.  
> As a workaround to unblock our release branch testing we removed
> transparent huge page allocation from  ttm_get_pages. We're seeing this
> as far back as 4.13 on our release branch.

Thanks for sharing this. In the future, please raise issues like this on
the public mailing lists from the beginning.


> If we're really talking about the same problem, I don't think it's
> caused by recent page allocator changes, but rather exposed by recent
> TTM changes.

It sounds related, but probably not exactly the same problem. I already
had the TTM code using GFP_TRANSHUGE before I ran into the issue. Also,
__alloc_pages_slowpath eventually succeeds for me, it can just take up
to about a minute.

I'm currently testing using (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY)
instead of GFP_TRANSHUGE in TTM.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: AMD graphics performance regression in 4.15 and later

2018-04-20 Thread Felix Kuehling
[+Philip]

On 2018-04-20 10:47 AM, Michel Dänzer wrote:
> On 2018-04-11 11:37 AM, Christian König wrote:
>> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>> :
 Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
> Hi Christian,
>
> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
> Feel free to comment since you have a better understanding of what's
> going on.
>
> One last question: right now I'm running 4.15.0 with the "offending"
> patch reverted. Is that safe to run or are there possible bad
> interactions with other changes.
 That should work without problems.

 But I just had another idea as well, if you want you could still test
 the
 new code path which will be using in 4.17.

>>> While Firefox may do some strange things is not about only Firefox.
>>>
>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>> The whole Desktop is acting weird.  This one is using
>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>
>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>
>>> Also a 14C Xeon box with a HD7700 is broken same way.
>> The hardware is irrelevant for this. We need to know what software stack
>> you use on top of it.
>>
>> E.g. desktop environment/Mesa and DDX version etc...
>>
>>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>>
>>>
>>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>>> 648bc3574716400acc06f99915815f80d9563783
>>> from an 4.15 kernel makes things work again.
>>>
>>>
 Backporting all the detection logic is to invasive, but you could
 just go
 into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
 code path.

 Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

>>> Well you really can't be serious about these suggestions ? Are you ?
>>>
>>> Telling peoples to #if 0 random code is not a solution.
>> That is for testing and not a permanent solution.
>>
>>> You broke existsing working userland with your patches and at least
>>> please fix that for 4.16.
>>>
>>> I can help testing code for 4.17/++ if you wish but that is
>>> *different* storry.
>> Please test Alex's amd-staging-drm-next branch from
>> git://people.freedesktop.org/~agd5f/linux.
> I think we're still missing something here.
>
> I'm currently running 4.16.2 + the DRM subsystem changes which are going
> into 4.17 (so I have the changes Christian is referring to) with a
> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
> Some observations:
>
> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
> order of a minute, during which the kernel is spending most of one
> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
> precise), called from ttm_alloc_new_pages.
Philip debugged a similar problem with a KFD memory stress test about
two weeks ago, where the kernel was seemingly stuck in an infinite loop
trying to allocate huge pages. I'm pasting his analysis for the record:

> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this
> seems a corner case inside __alloc_pages_slowpath(), it never exits
> but goes to retry path every time. It can reclaim pages and
> did_some_progress (as a result, no_progress_loops is reset to 0 every
> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page
> allocations under this specific memory pressure.  
As a workaround to unblock our release branch testing we removed
transparent huge page allocation from  ttm_get_pages. We're seeing this
as far back as 4.13 on our release branch.

If we're really talking about the same problem, I don't think it's
caused by recent page allocator changes, but rather exposed by recent
TTM changes.

Regards,
  Felix

>
> At least in the case of Firefox, this happens due to Mesa internal BO
> allocations for glTex(Sub)Image, so it's not obvious that Firefox is
> doing something wrong.
>
> I never noticed this before this week. Before, I was running 4.15.y +
> DRM subsystem changes from 4.16. Maybe something has changed in core
> code, trying harder to allocate huge pages.
>
>
> Maybe TTM should only try to use any huge pages that happen to be
> available, not spend any (/ "too much"?) additional effort trying to
> free up huge pages?
>
>



Re: AMD graphics performance regression in 4.15 and later

2018-04-20 Thread Felix Kuehling
[+Philip]

On 2018-04-20 10:47 AM, Michel Dänzer wrote:
> On 2018-04-11 11:37 AM, Christian König wrote:
>> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>> :
 Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
> Hi Christian,
>
> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
> Feel free to comment since you have a better understanding of what's
> going on.
>
> One last question: right now I'm running 4.15.0 with the "offending"
> patch reverted. Is that safe to run or are there possible bad
> interactions with other changes.
 That should work without problems.

 But I just had another idea as well, if you want you could still test
 the
 new code path which will be using in 4.17.

>>> While Firefox may do some strange things is not about only Firefox.
>>>
>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>> The whole Desktop is acting weird.  This one is using
>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>
>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>
>>> Also a 14C Xeon box with a HD7700 is broken same way.
>> The hardware is irrelevant for this. We need to know what software stack
>> you use on top of it.
>>
>> E.g. desktop environment/Mesa and DDX version etc...
>>
>>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>>
>>>
>>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>>> 648bc3574716400acc06f99915815f80d9563783
>>> from an 4.15 kernel makes things work again.
>>>
>>>
 Backporting all the detection logic is to invasive, but you could
 just go
 into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
 code path.

 Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

>>> Well you really can't be serious about these suggestions ? Are you ?
>>>
>>> Telling peoples to #if 0 random code is not a solution.
>> That is for testing and not a permanent solution.
>>
>>> You broke existsing working userland with your patches and at least
>>> please fix that for 4.16.
>>>
>>> I can help testing code for 4.17/++ if you wish but that is
>>> *different* storry.
>> Please test Alex's amd-staging-drm-next branch from
>> git://people.freedesktop.org/~agd5f/linux.
> I think we're still missing something here.
>
> I'm currently running 4.16.2 + the DRM subsystem changes which are going
> into 4.17 (so I have the changes Christian is referring to) with a
> Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
> Some observations:
>
> Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
> order of a minute, during which the kernel is spending most of one
> core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
> precise), called from ttm_alloc_new_pages.
Philip debugged a similar problem with a KFD memory stress test about
two weeks ago, where the kernel was seemingly stuck in an infinite loop
trying to allocate huge pages. I'm pasting his analysis for the record:

> [...] it uses huge_flags GFP_TRANSHUGE to call alloc_pages(), this
> seems a corner case inside __alloc_pages_slowpath(), it never exits
> but goes to retry path every time. It can reclaim pages and
> did_some_progress (as a result, no_progress_loops is reset to 0 every
> loop, never reach MAX_RECLAIM_RETRIES) but cannot finish huge page
> allocations under this specific memory pressure.  
As a workaround to unblock our release branch testing we removed
transparent huge page allocation from  ttm_get_pages. We're seeing this
as far back as 4.13 on our release branch.

If we're really talking about the same problem, I don't think it's
caused by recent page allocator changes, but rather exposed by recent
TTM changes.

Regards,
  Felix

>
> At least in the case of Firefox, this happens due to Mesa internal BO
> allocations for glTex(Sub)Image, so it's not obvious that Firefox is
> doing something wrong.
>
> I never noticed this before this week. Before, I was running 4.15.y +
> DRM subsystem changes from 4.16. Maybe something has changed in core
> code, trying harder to allocate huge pages.
>
>
> Maybe TTM should only try to use any huge pages that happen to be
> available, not spend any (/ "too much"?) additional effort trying to
> free up huge pages?
>
>



Re: AMD graphics performance regression in 4.15 and later

2018-04-20 Thread Michel Dänzer
On 2018-04-11 11:37 AM, Christian König wrote:
> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> :
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
 Hi Christian,

 Thanks for the info. FYI, I've also opened a Firefox bug for that at:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
 Feel free to comment since you have a better understanding of what's
 going on.

 One last question: right now I'm running 4.15.0 with the "offending"
 patch reverted. Is that safe to run or are there possible bad
 interactions with other changes.
>>>
>>> That should work without problems.
>>>
>>> But I just had another idea as well, if you want you could still test
>>> the
>>> new code path which will be using in 4.17.
>>>
>> While Firefox may do some strange things is not about only Firefox.
>>
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
> 
> The hardware is irrelevant for this. We need to know what software stack
> you use on top of it.
> 
> E.g. desktop environment/Mesa and DDX version etc...
> 
>>
>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>
>>
>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>> 648bc3574716400acc06f99915815f80d9563783
>> from an 4.15 kernel makes things work again.
>>
>>
>>> Backporting all the detection logic is to invasive, but you could
>>> just go
>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>> code path.
>>>
>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>
>> Well you really can't be serious about these suggestions ? Are you ?
>>
>> Telling peoples to #if 0 random code is not a solution.
> 
> That is for testing and not a permanent solution.
> 
>> You broke existsing working userland with your patches and at least
>> please fix that for 4.16.
>>
>> I can help testing code for 4.17/++ if you wish but that is
>> *different* storry.
> 
> Please test Alex's amd-staging-drm-next branch from
> git://people.freedesktop.org/~agd5f/linux.

I think we're still missing something here.

I'm currently running 4.16.2 + the DRM subsystem changes which are going
into 4.17 (so I have the changes Christian is referring to) with a
Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
Some observations:

Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
order of a minute, during which the kernel is spending most of one
core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
precise), called from ttm_alloc_new_pages.

At least in the case of Firefox, this happens due to Mesa internal BO
allocations for glTex(Sub)Image, so it's not obvious that Firefox is
doing something wrong.

I never noticed this before this week. Before, I was running 4.15.y +
DRM subsystem changes from 4.16. Maybe something has changed in core
code, trying harder to allocate huge pages.


Maybe TTM should only try to use any huge pages that happen to be
available, not spend any (/ "too much"?) additional effort trying to
free up huge pages?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: AMD graphics performance regression in 4.15 and later

2018-04-20 Thread Michel Dänzer
On 2018-04-11 11:37 AM, Christian König wrote:
> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> :
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
 Hi Christian,

 Thanks for the info. FYI, I've also opened a Firefox bug for that at:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
 Feel free to comment since you have a better understanding of what's
 going on.

 One last question: right now I'm running 4.15.0 with the "offending"
 patch reverted. Is that safe to run or are there possible bad
 interactions with other changes.
>>>
>>> That should work without problems.
>>>
>>> But I just had another idea as well, if you want you could still test
>>> the
>>> new code path which will be using in 4.17.
>>>
>> While Firefox may do some strange things is not about only Firefox.
>>
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
> 
> The hardware is irrelevant for this. We need to know what software stack
> you use on top of it.
> 
> E.g. desktop environment/Mesa and DDX version etc...
> 
>>
>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>
>>
>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>> 648bc3574716400acc06f99915815f80d9563783
>> from an 4.15 kernel makes things work again.
>>
>>
>>> Backporting all the detection logic is to invasive, but you could
>>> just go
>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>> code path.
>>>
>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>
>> Well you really can't be serious about these suggestions ? Are you ?
>>
>> Telling peoples to #if 0 random code is not a solution.
> 
> That is for testing and not a permanent solution.
> 
>> You broke existsing working userland with your patches and at least
>> please fix that for 4.16.
>>
>> I can help testing code for 4.17/++ if you wish but that is
>> *different* storry.
> 
> Please test Alex's amd-staging-drm-next branch from
> git://people.freedesktop.org/~agd5f/linux.

I think we're still missing something here.

I'm currently running 4.16.2 + the DRM subsystem changes which are going
into 4.17 (so I have the changes Christian is referring to) with a
Kaveri APU, and I'm seeing similar symptoms as described by Jean-Marc.
Some observations:

Firefox, Thunderbird, or worst, gnome-shell, can freeze for up to on the
order of a minute, during which the kernel is spending most of one
core's cycles inside alloc_pages (__alloc_pages_nodemask to be more
precise), called from ttm_alloc_new_pages.

At least in the case of Firefox, this happens due to Mesa internal BO
allocations for glTex(Sub)Image, so it's not obvious that Firefox is
doing something wrong.

I never noticed this before this week. Before, I was running 4.15.y +
DRM subsystem changes from 4.16. Maybe something has changed in core
code, trying harder to allocate huge pages.


Maybe TTM should only try to use any huge pages that happen to be
available, not spend any (/ "too much"?) additional effort trying to
free up huge pages?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-12 0:20 GMT+02:00 Gabriel C :
> 2018-04-11 20:35 GMT+02:00 Jean-Marc Valin :
>> On 04/11/2018 05:37 AM, Christian König wrote:
 With your patches my EPYC box is unusable with  4.15++ kernels.
 The whole Desktop is acting weird.  This one is using
 an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

 Box is  2 * EPYC 7281 with 128 GB ECC RAM

 Also a 14C Xeon box with a HD7700 is broken same way.
>>>
>>> The hardware is irrelevant for this. We need to know what software stack
>>> you use on top of it.
>>
>> Well, the hardware appears to be part of the issue too. I don't think
>> it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
>> 2xXeon and the previous reported had it on a Core 2 Quad that internally
>> has two dies.
>>
>> I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
>> over the weekend and report what happens.
>>
>
> To get that right .. is only a matter of disabling SWIOTLB *code*
> while CONFIG_SWIOTLB is still set ?

Ok I tested that on 4.16.1 and yes it does work. However I didn't like the
#if 0 method and so compile an kernel twice just to compare an test.

I created an small patch and added swiotlb option for amdgpu and radeon
so I can boot and compare / test with and without SWIOTLB code.

( not meant for upstream )

http://ftp.frugalware.org/pub/other/people/crazy/0001-Make-it-possible-to-disable-SWIOTLB-code-on-admgpu-a.patch

With SWIOTLB code off all works fine , while hell breaks when turning on.

Maybe similar options should be added upstream until code is more
stable in 4.17/4.18

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-12 0:20 GMT+02:00 Gabriel C :
> 2018-04-11 20:35 GMT+02:00 Jean-Marc Valin :
>> On 04/11/2018 05:37 AM, Christian König wrote:
 With your patches my EPYC box is unusable with  4.15++ kernels.
 The whole Desktop is acting weird.  This one is using
 an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

 Box is  2 * EPYC 7281 with 128 GB ECC RAM

 Also a 14C Xeon box with a HD7700 is broken same way.
>>>
>>> The hardware is irrelevant for this. We need to know what software stack
>>> you use on top of it.
>>
>> Well, the hardware appears to be part of the issue too. I don't think
>> it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
>> 2xXeon and the previous reported had it on a Core 2 Quad that internally
>> has two dies.
>>
>> I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
>> over the weekend and report what happens.
>>
>
> To get that right .. is only a matter of disabling SWIOTLB *code*
> while CONFIG_SWIOTLB is still set ?

Ok I tested that on 4.16.1 and yes it does work. However I didn't like the
#if 0 method and so compile an kernel twice just to compare an test.

I created an small patch and added swiotlb option for amdgpu and radeon
so I can boot and compare / test with and without SWIOTLB code.

( not meant for upstream )

http://ftp.frugalware.org/pub/other/people/crazy/0001-Make-it-possible-to-disable-SWIOTLB-code-on-admgpu-a.patch

With SWIOTLB code off all works fine , while hell breaks when turning on.

Maybe similar options should be added upstream until code is more
stable in 4.17/4.18

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-11 20:35 GMT+02:00 Jean-Marc Valin :
> On 04/11/2018 05:37 AM, Christian König wrote:
>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>> The whole Desktop is acting weird.  This one is using
>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>
>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>
>>> Also a 14C Xeon box with a HD7700 is broken same way.
>>
>> The hardware is irrelevant for this. We need to know what software stack
>> you use on top of it.
>
> Well, the hardware appears to be part of the issue too. I don't think
> it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
> 2xXeon and the previous reported had it on a Core 2 Quad that internally
> has two dies.
>
> I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
> over the weekend and report what happens.
>

To get that right .. is only a matter of disabling SWIOTLB *code*
while CONFIG_SWIOTLB is still set ?


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-11 20:35 GMT+02:00 Jean-Marc Valin :
> On 04/11/2018 05:37 AM, Christian König wrote:
>>> With your patches my EPYC box is unusable with  4.15++ kernels.
>>> The whole Desktop is acting weird.  This one is using
>>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>>
>>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>>
>>> Also a 14C Xeon box with a HD7700 is broken same way.
>>
>> The hardware is irrelevant for this. We need to know what software stack
>> you use on top of it.
>
> Well, the hardware appears to be part of the issue too. I don't think
> it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
> 2xXeon and the previous reported had it on a Core 2 Quad that internally
> has two dies.
>
> I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
> over the weekend and report what happens.
>

To get that right .. is only a matter of disabling SWIOTLB *code*
while CONFIG_SWIOTLB is still set ?


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Jean-Marc Valin
On 04/11/2018 05:37 AM, Christian König wrote:
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
> 
> The hardware is irrelevant for this. We need to know what software stack
> you use on top of it.

Well, the hardware appears to be part of the issue too. I don't think
it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
2xXeon and the previous reported had it on a Core 2 Quad that internally
has two dies.

I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
over the weekend and report what happens.

Cheers,

Jean-Marc


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Jean-Marc Valin
On 04/11/2018 05:37 AM, Christian König wrote:
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
> 
> The hardware is irrelevant for this. We need to know what software stack
> you use on top of it.

Well, the hardware appears to be part of the issue too. I don't think
it's a coincidence that Gabriel has the problem on 2xEPYC, I have it on
2xXeon and the previous reported had it on a Core 2 Quad that internally
has two dies.

I've not yet tested your disable CONFIG_SWIOTLB fix yet -- might try it
over the weekend and report what happens.

Cheers,

Jean-Marc


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-11 16:26 GMT+02:00 Gabriel C :
> 2018-04-11 11:37 GMT+02:00 Christian König :
>> Am 11.04.2018 um 06:00 schrieb Gabriel C:

...
>>
>> Please test Alex's amd-staging-drm-next branch from
>> git://people.freedesktop.org/~agd5f/linux.
>
> I'm on it just the connection to freedesktop.org is slow as hell.
> Will take a while to get that branch with 62KiB/s :)
>

Testing done on that branch on commit 24110c70630998dc83da23cae910a9538f54ef64.

On default Plasma OpenGL 2.0 profiles things are still laggy but a lot better.
On OpenGL 3.1 things are working much better just minor gliches on
maximzing/minimizing windows.

Firefox is still broken , frames drops , video stops etc
Cromium-browser works fine
Otter-browser does not work at all
Qupzilla/Falkon has Firefox like issues too

Things I noticed while testing Firefox or Qupzilla..
Once these start acting up it does affect the whole Desktop,
for some secons scrolling lags , mouse is slow , etc.
Once these are closed the Desktop start working again after few seconds.


Do you want me to test any mesa/xorg-server/drivers git/branches too ?

If so just let me know.

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-11 16:26 GMT+02:00 Gabriel C :
> 2018-04-11 11:37 GMT+02:00 Christian König :
>> Am 11.04.2018 um 06:00 schrieb Gabriel C:

...
>>
>> Please test Alex's amd-staging-drm-next branch from
>> git://people.freedesktop.org/~agd5f/linux.
>
> I'm on it just the connection to freedesktop.org is slow as hell.
> Will take a while to get that branch with 62KiB/s :)
>

Testing done on that branch on commit 24110c70630998dc83da23cae910a9538f54ef64.

On default Plasma OpenGL 2.0 profiles things are still laggy but a lot better.
On OpenGL 3.1 things are working much better just minor gliches on
maximzing/minimizing windows.

Firefox is still broken , frames drops , video stops etc
Cromium-browser works fine
Otter-browser does not work at all
Qupzilla/Falkon has Firefox like issues too

Things I noticed while testing Firefox or Qupzilla..
Once these start acting up it does affect the whole Desktop,
for some secons scrolling lags , mouse is slow , etc.
Once these are closed the Desktop start working again after few seconds.


Do you want me to test any mesa/xorg-server/drivers git/branches too ?

If so just let me know.

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-11 11:37 GMT+02:00 Christian König :
> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> :
>>>
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

 Hi Christian,

 Thanks for the info. FYI, I've also opened a Firefox bug for that at:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
 Feel free to comment since you have a better understanding of what's
 going on.

 One last question: right now I'm running 4.15.0 with the "offending"
 patch reverted. Is that safe to run or are there possible bad
 interactions with other changes.
>>>
>>>
>>> That should work without problems.
>>>
>>> But I just had another idea as well, if you want you could still test the
>>> new code path which will be using in 4.17.
>>>
>> While Firefox may do some strange things is not about only Firefox.
>>
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
>
>
> The hardware is irrelevant for this. We need to know what software stack you
> use on top of it.
>
> E.g. desktop environment/Mesa and DDX version etc...

Plasma 5.12.4 compiled wth frameworks 5.44.0 , Qt5 5.10.1
mesa 18.0.0 and mesa 17.3.7 on the other box
Xorg is 1.19.6
xf86-video-amdgpu and xf86-video-ati both 18.0.1

>
>>
>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>
>>
>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>> 648bc3574716400acc06f99915815f80d9563783
>> from an 4.15 kernel makes things work again.
>>
>>
>>> Backporting all the detection logic is to invasive, but you could just go
>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>> code path.
>>>
>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>
>> Well you really can't be serious about these suggestions ? Are you ?
>>
>> Telling peoples to #if 0 random code is not a solution.
>
>
> That is for testing and not a permanent solution.
>
>> You broke existsing working userland with your patches and at least
>> please fix that for 4.16.
>>
>> I can help testing code for 4.17/++ if you wish but that is *different*
>> storry.
>
>
> Please test Alex's amd-staging-drm-next branch from
> git://people.freedesktop.org/~agd5f/linux.

I'm on it just the connection to freedesktop.org is slow as hell.
Will take a while to get that branch with 62KiB/s :)

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Gabriel C
2018-04-11 11:37 GMT+02:00 Christian König :
> Am 11.04.2018 um 06:00 schrieb Gabriel C:
>>
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> :
>>>
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

 Hi Christian,

 Thanks for the info. FYI, I've also opened a Firefox bug for that at:
 https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
 Feel free to comment since you have a better understanding of what's
 going on.

 One last question: right now I'm running 4.15.0 with the "offending"
 patch reverted. Is that safe to run or are there possible bad
 interactions with other changes.
>>>
>>>
>>> That should work without problems.
>>>
>>> But I just had another idea as well, if you want you could still test the
>>> new code path which will be using in 4.17.
>>>
>> While Firefox may do some strange things is not about only Firefox.
>>
>> With your patches my EPYC box is unusable with  4.15++ kernels.
>> The whole Desktop is acting weird.  This one is using
>> an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.
>>
>> Box is  2 * EPYC 7281 with 128 GB ECC RAM
>>
>> Also a 14C Xeon box with a HD7700 is broken same way.
>
>
> The hardware is irrelevant for this. We need to know what software stack you
> use on top of it.
>
> E.g. desktop environment/Mesa and DDX version etc...

Plasma 5.12.4 compiled wth frameworks 5.44.0 , Qt5 5.10.1
mesa 18.0.0 and mesa 17.3.7 on the other box
Xorg is 1.19.6
xf86-video-amdgpu and xf86-video-ati both 18.0.1

>
>>
>> Everything breaks in X .. scrolling , moving windows , flickering etc.
>>
>>
>> reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
>> 648bc3574716400acc06f99915815f80d9563783
>> from an 4.15 kernel makes things work again.
>>
>>
>>> Backporting all the detection logic is to invasive, but you could just go
>>> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
>>> code path.
>>>
>>> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>>>
>> Well you really can't be serious about these suggestions ? Are you ?
>>
>> Telling peoples to #if 0 random code is not a solution.
>
>
> That is for testing and not a permanent solution.
>
>> You broke existsing working userland with your patches and at least
>> please fix that for 4.16.
>>
>> I can help testing code for 4.17/++ if you wish but that is *different*
>> storry.
>
>
> Please test Alex's amd-staging-drm-next branch from
> git://people.freedesktop.org/~agd5f/linux.

I'm on it just the connection to freedesktop.org is slow as hell.
Will take a while to get that branch with 62KiB/s :)

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Christian König

Am 11.04.2018 um 06:00 schrieb Gabriel C:

2018-04-09 11:42 GMT+02:00 Christian König :

Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

Hi Christian,

Thanks for the info. FYI, I've also opened a Firefox bug for that at:
https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
Feel free to comment since you have a better understanding of what's
going on.

One last question: right now I'm running 4.15.0 with the "offending"
patch reverted. Is that safe to run or are there possible bad
interactions with other changes.


That should work without problems.

But I just had another idea as well, if you want you could still test the
new code path which will be using in 4.17.


While Firefox may do some strange things is not about only Firefox.

With your patches my EPYC box is unusable with  4.15++ kernels.
The whole Desktop is acting weird.  This one is using
an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

Box is  2 * EPYC 7281 with 128 GB ECC RAM

Also a 14C Xeon box with a HD7700 is broken same way.


The hardware is irrelevant for this. We need to know what software stack 
you use on top of it.


E.g. desktop environment/Mesa and DDX version etc...



Everything breaks in X .. scrolling , moving windows , flickering etc.


reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
648bc3574716400acc06f99915815f80d9563783
from an 4.15 kernel makes things work again.



Backporting all the detection logic is to invasive, but you could just go
into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
code path.

Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.


Well you really can't be serious about these suggestions ? Are you ?

Telling peoples to #if 0 random code is not a solution.


That is for testing and not a permanent solution.


You broke existsing working userland with your patches and at least
please fix that for 4.16.

I can help testing code for 4.17/++ if you wish but that is *different* storry.


Please test Alex's amd-staging-drm-next branch from 
git://people.freedesktop.org/~agd5f/linux.


Regards,
Christian.



Regards,

Gabriel C




Re: AMD graphics performance regression in 4.15 and later

2018-04-11 Thread Christian König

Am 11.04.2018 um 06:00 schrieb Gabriel C:

2018-04-09 11:42 GMT+02:00 Christian König :

Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

Hi Christian,

Thanks for the info. FYI, I've also opened a Firefox bug for that at:
https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
Feel free to comment since you have a better understanding of what's
going on.

One last question: right now I'm running 4.15.0 with the "offending"
patch reverted. Is that safe to run or are there possible bad
interactions with other changes.


That should work without problems.

But I just had another idea as well, if you want you could still test the
new code path which will be using in 4.17.


While Firefox may do some strange things is not about only Firefox.

With your patches my EPYC box is unusable with  4.15++ kernels.
The whole Desktop is acting weird.  This one is using
an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

Box is  2 * EPYC 7281 with 128 GB ECC RAM

Also a 14C Xeon box with a HD7700 is broken same way.


The hardware is irrelevant for this. We need to know what software stack 
you use on top of it.


E.g. desktop environment/Mesa and DDX version etc...



Everything breaks in X .. scrolling , moving windows , flickering etc.


reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
648bc3574716400acc06f99915815f80d9563783
from an 4.15 kernel makes things work again.



Backporting all the detection logic is to invasive, but you could just go
into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
code path.

Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.


Well you really can't be serious about these suggestions ? Are you ?

Telling peoples to #if 0 random code is not a solution.


That is for testing and not a permanent solution.


You broke existsing working userland with your patches and at least
please fix that for 4.16.

I can help testing code for 4.17/++ if you wish but that is *different* storry.


Please test Alex's amd-staging-drm-next branch from 
git://people.freedesktop.org/~agd5f/linux.


Regards,
Christian.



Regards,

Gabriel C




Re: AMD graphics performance regression in 4.15 and later

2018-04-10 Thread Gabriel C
>2018-04-11 6:00 GMT+02:00 Gabriel C :
> 2018-04-09 11:42 GMT+02:00 Christian König :
>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
...
> I can help testing code for 4.17/++ if you wish but that is *different* 
> storry.
>

Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
are broken now in this one.

radeon tells:

...

[6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000).
[6.338210] radeon :21:00.0: (-12) create WB bo failed
[6.338214] radeon :21:00.0: disabling GPU acceleration

...

And no way to start X .. flickering and hangs.

amdgpu hits an bug:

http://ftp.frugalware.org/pub/other/people/crazy/trace.txt


Do you have some git tree I can test from ?

Also if you need full , logs or any other infos just  let me know.

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-10 Thread Gabriel C
>2018-04-11 6:00 GMT+02:00 Gabriel C :
> 2018-04-09 11:42 GMT+02:00 Christian König :
>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
...
> I can help testing code for 4.17/++ if you wish but that is *different* 
> storry.
>

Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
are broken now in this one.

radeon tells:

...

[6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000).
[6.338210] radeon :21:00.0: (-12) create WB bo failed
[6.338214] radeon :21:00.0: disabling GPU acceleration

...

And no way to start X .. flickering and hangs.

amdgpu hits an bug:

http://ftp.frugalware.org/pub/other/people/crazy/trace.txt


Do you have some git tree I can test from ?

Also if you need full , logs or any other infos just  let me know.

Regards


Re: AMD graphics performance regression in 4.15 and later

2018-04-10 Thread Gabriel C
2018-04-09 11:42 GMT+02:00 Christian König :
> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>
>> Hi Christian,
>>
>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>> Feel free to comment since you have a better understanding of what's
>> going on.
>>
>> One last question: right now I'm running 4.15.0 with the "offending"
>> patch reverted. Is that safe to run or are there possible bad
>> interactions with other changes.
>
>
> That should work without problems.
>
> But I just had another idea as well, if you want you could still test the
> new code path which will be using in 4.17.
>

While Firefox may do some strange things is not about only Firefox.

With your patches my EPYC box is unusable with  4.15++ kernels.
The whole Desktop is acting weird.  This one is using
an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

Box is  2 * EPYC 7281 with 128 GB ECC RAM

Also a 14C Xeon box with a HD7700 is broken same way.

Everything breaks in X .. scrolling , moving windows , flickering etc.


reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
648bc3574716400acc06f99915815f80d9563783
from an 4.15 kernel makes things work again.


> Backporting all the detection logic is to invasive, but you could just go
> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
> code path.
>
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>

Well you really can't be serious about these suggestions ? Are you ?

Telling peoples to #if 0 random code is not a solution.

You broke existsing working userland with your patches and at least
please fix that for 4.16.

I can help testing code for 4.17/++ if you wish but that is *different* storry.

Regards,

Gabriel C


Re: AMD graphics performance regression in 4.15 and later

2018-04-10 Thread Gabriel C
2018-04-09 11:42 GMT+02:00 Christian König :
> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>
>> Hi Christian,
>>
>> Thanks for the info. FYI, I've also opened a Firefox bug for that at:
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
>> Feel free to comment since you have a better understanding of what's
>> going on.
>>
>> One last question: right now I'm running 4.15.0 with the "offending"
>> patch reverted. Is that safe to run or are there possible bad
>> interactions with other changes.
>
>
> That should work without problems.
>
> But I just had another idea as well, if you want you could still test the
> new code path which will be using in 4.17.
>

While Firefox may do some strange things is not about only Firefox.

With your patches my EPYC box is unusable with  4.15++ kernels.
The whole Desktop is acting weird.  This one is using
an Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] GPU.

Box is  2 * EPYC 7281 with 128 GB ECC RAM

Also a 14C Xeon box with a HD7700 is broken same way.

Everything breaks in X .. scrolling , moving windows , flickering etc.


reverting f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 and
648bc3574716400acc06f99915815f80d9563783
from an 4.15 kernel makes things work again.


> Backporting all the detection logic is to invasive, but you could just go
> into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the other
> code path.
>
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.
>

Well you really can't be serious about these suggestions ? Are you ?

Telling peoples to #if 0 random code is not a solution.

You broke existsing working userland with your patches and at least
please fix that for 4.16.

I can help testing code for 4.17/++ if you wish but that is *different* storry.

Regards,

Gabriel C


Re: AMD graphics performance regression in 4.15 and later

2018-04-10 Thread Christian König

Am 09.04.2018 um 17:17 schrieb Jean-Marc Valin:

On 04/09/2018 05:42 AM, Christian König wrote:

Backporting all the detection logic is to invasive, but you could just
go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the
other code path.

Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

Do you mean just taking the 4.15 code as is and replacing
"#ifdef CONFIG_SWIOTLB" with "#if 0" in
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c or are you talking about using a
different version of drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c ?


Yes, exactly. The code then won't work any more on some ARMs or systems 
with more than 1TB of memory, but I don't think you care about that :)


Christian.



Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-10 Thread Christian König

Am 09.04.2018 um 17:17 schrieb Jean-Marc Valin:

On 04/09/2018 05:42 AM, Christian König wrote:

Backporting all the detection logic is to invasive, but you could just
go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the
other code path.

Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

Do you mean just taking the 4.15 code as is and replacing
"#ifdef CONFIG_SWIOTLB" with "#if 0" in
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c or are you talking about using a
different version of drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c ?


Yes, exactly. The code then won't work any more on some ARMs or systems 
with more than 1TB of memory, but I don't think you care about that :)


Christian.



Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-09 Thread Jean-Marc Valin
On 04/09/2018 05:42 AM, Christian König wrote:
> Backporting all the detection logic is to invasive, but you could just
> go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the
> other code path.
> 
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

Do you mean just taking the 4.15 code as is and replacing
"#ifdef CONFIG_SWIOTLB" with "#if 0" in
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c or are you talking about using a
different version of drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c ?

Jean-Marc


Re: AMD graphics performance regression in 4.15 and later

2018-04-09 Thread Jean-Marc Valin
On 04/09/2018 05:42 AM, Christian König wrote:
> Backporting all the detection logic is to invasive, but you could just
> go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the
> other code path.
> 
> Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

Do you mean just taking the 4.15 code as is and replacing
"#ifdef CONFIG_SWIOTLB" with "#if 0" in
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c or are you talking about using a
different version of drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c ?

Jean-Marc


Re: AMD graphics performance regression in 4.15 and later

2018-04-09 Thread Christian König

Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

Hi Christian,

Thanks for the info. FYI, I've also opened a Firefox bug for that at:
https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
Feel free to comment since you have a better understanding of what's
going on.

One last question: right now I'm running 4.15.0 with the "offending"
patch reverted. Is that safe to run or are there possible bad
interactions with other changes.


That should work without problems.

But I just had another idea as well, if you want you could still test 
the new code path which will be using in 4.17.


Backporting all the detection logic is to invasive, but you could just 
go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the 
other code path.


Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

Regards,
Christian.



Cheers,

Jean-Marc

On 04/06/2018 01:20 PM, Christian König wrote:

Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin:

Hi Christian,

On 04/09/2018 07:48 AM, Christian König wrote:

Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:

Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time?

Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Any reason why
echo never > /sys/kernel/mm/transparent_hugepage/enabled
doesn't solve the problem?

Because we unfortunately try to allocate huge pages anyway, we
unfortunately just fail in 100% of all cases.

That basically gives you both, the extra allocation overhead and the
still bad throughput.


Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
them for everything and not just what your patch added, right?

Correct, that's why I wrote that disabling SWIOTLBs might be better.


I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer
kernels". I
tested up to 4.16 and the performance regression is just as bad as
it is
for 4.15.

Indeed 4.16 still doesn't have that. You could use the
amd-staging-drm-next branch or wait for 4.17.

Is there a way to pull just that change or is there too much
interactions with other changes?

It adds a new detection if memory allocation needs to be coherent or
not, that is not something you can easily pull into older versions.


That isn't related to the GFX hardware, but to your CPU/motherboard and
whatever else you have in the system.

Well, I have an nvidia GPU in the same system (normally only used for
CUDA) and if I use it instead of my RX 560 then I'm not seeing any
performance issue with 4.15.

That's because you are probably using the Nvidia binary driver which has
a completely separate code base.


Some part of your system needs SWIOTLB and that makes allocating memory
much slower.

What would that part be? FTR, I have a complete description of my system
at https://jmvalin.dreamwidth.org/15583.html

I don't know if it's related, but I can maybe see one thing in common
between my machine and the Core 2 Quad from the other bug report and
that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
Quad is made of two two-core CPUs glued together with little
communication between them.

Yeah, that is probably the reason.


Intel doesn't use TTM because they don't have dedicated VRAM, but the
open source nvidia driver should be affected as well.

I'm using the proprietary nvidia driver (because CUDA). Is that supposed
to be affected as well?

No.


We already mitigated that problem and I don't see any solution which
will arrive faster than 4.17.

Is that supposed to make the slowdown unnoticeable or just slightly
better?

It completely goes away. The issue with the coherent path is that it
tries to always allocate the lowest possible memory to make sure that it
fits into the DMA constrains of all devices in the system.

But since AMD GPU can handle 40bits of addresses you would need at least
1TB of memory in the system to trigger that (or a NUMA where some system
is low and some in a high area).

Christian.


The only quick workaround I can see is to avoid firefox, chrome for
example is reported to work perfectly fine.

Or use an unaffected GPU/driver ;-)

Cheers,

 Jean-Marc


___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-09 Thread Christian König

Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:

Hi Christian,

Thanks for the info. FYI, I've also opened a Firefox bug for that at:
https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
Feel free to comment since you have a better understanding of what's
going on.

One last question: right now I'm running 4.15.0 with the "offending"
patch reverted. Is that safe to run or are there possible bad
interactions with other changes.


That should work without problems.

But I just had another idea as well, if you want you could still test 
the new code path which will be using in 4.17.


Backporting all the detection logic is to invasive, but you could just 
go into drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c and forcefull use the 
other code path.


Just look out for "#ifdef CONFIG_SWIOTLB" checks and disable those.

Regards,
Christian.



Cheers,

Jean-Marc

On 04/06/2018 01:20 PM, Christian König wrote:

Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin:

Hi Christian,

On 04/09/2018 07:48 AM, Christian König wrote:

Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:

Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time?

Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Any reason why
echo never > /sys/kernel/mm/transparent_hugepage/enabled
doesn't solve the problem?

Because we unfortunately try to allocate huge pages anyway, we
unfortunately just fail in 100% of all cases.

That basically gives you both, the extra allocation overhead and the
still bad throughput.


Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
them for everything and not just what your patch added, right?

Correct, that's why I wrote that disabling SWIOTLBs might be better.


I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer
kernels". I
tested up to 4.16 and the performance regression is just as bad as
it is
for 4.15.

Indeed 4.16 still doesn't have that. You could use the
amd-staging-drm-next branch or wait for 4.17.

Is there a way to pull just that change or is there too much
interactions with other changes?

It adds a new detection if memory allocation needs to be coherent or
not, that is not something you can easily pull into older versions.


That isn't related to the GFX hardware, but to your CPU/motherboard and
whatever else you have in the system.

Well, I have an nvidia GPU in the same system (normally only used for
CUDA) and if I use it instead of my RX 560 then I'm not seeing any
performance issue with 4.15.

That's because you are probably using the Nvidia binary driver which has
a completely separate code base.


Some part of your system needs SWIOTLB and that makes allocating memory
much slower.

What would that part be? FTR, I have a complete description of my system
at https://jmvalin.dreamwidth.org/15583.html

I don't know if it's related, but I can maybe see one thing in common
between my machine and the Core 2 Quad from the other bug report and
that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
Quad is made of two two-core CPUs glued together with little
communication between them.

Yeah, that is probably the reason.


Intel doesn't use TTM because they don't have dedicated VRAM, but the
open source nvidia driver should be affected as well.

I'm using the proprietary nvidia driver (because CUDA). Is that supposed
to be affected as well?

No.


We already mitigated that problem and I don't see any solution which
will arrive faster than 4.17.

Is that supposed to make the slowdown unnoticeable or just slightly
better?

It completely goes away. The issue with the coherent path is that it
tries to always allocate the lowest possible memory to make sure that it
fits into the DMA constrains of all devices in the system.

But since AMD GPU can handle 40bits of addresses you would need at least
1TB of memory in the system to trigger that (or a NUMA where some system
is low and some in a high area).

Christian.


The only quick workaround I can see is to avoid firefox, chrome for
example is reported to work perfectly fine.

Or use an unaffected GPU/driver ;-)

Cheers,

 Jean-Marc


___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Jean-Marc Valin
Hi Christian,

Thanks for the info. FYI, I've also opened a Firefox bug for that at:
https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
Feel free to comment since you have a better understanding of what's
going on.

One last question: right now I'm running 4.15.0 with the "offending"
patch reverted. Is that safe to run or are there possible bad
interactions with other changes.

Cheers,

Jean-Marc

On 04/06/2018 01:20 PM, Christian König wrote:
> Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin:
>> Hi Christian,
>>
>> On 04/09/2018 07:48 AM, Christian König wrote:
>>> Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:
 Hi Christian,

 Is there a way to turn off these huge pages at boot-time/run-time?
>>> Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.
>> Any reason why
>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>> doesn't solve the problem?
> 
> Because we unfortunately try to allocate huge pages anyway, we
> unfortunately just fail in 100% of all cases.
> 
> That basically gives you both, the extra allocation overhead and the
> still bad throughput.
> 
>> Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
>> them for everything and not just what your patch added, right?
> 
> Correct, that's why I wrote that disabling SWIOTLBs might be better.
> 
 I'm not sure what you mean by "We mitigated the problem by avoiding the
 slow coherent DMA code path on almost all platforms on newer
 kernels". I
 tested up to 4.16 and the performance regression is just as bad as
 it is
 for 4.15.
>>> Indeed 4.16 still doesn't have that. You could use the
>>> amd-staging-drm-next branch or wait for 4.17.
>> Is there a way to pull just that change or is there too much
>> interactions with other changes?
> 
> It adds a new detection if memory allocation needs to be coherent or
> not, that is not something you can easily pull into older versions.
> 
>>> That isn't related to the GFX hardware, but to your CPU/motherboard and
>>> whatever else you have in the system.
>> Well, I have an nvidia GPU in the same system (normally only used for
>> CUDA) and if I use it instead of my RX 560 then I'm not seeing any
>> performance issue with 4.15.
> 
> That's because you are probably using the Nvidia binary driver which has
> a completely separate code base.
> 
>>> Some part of your system needs SWIOTLB and that makes allocating memory
>>> much slower.
>> What would that part be? FTR, I have a complete description of my system
>> at https://jmvalin.dreamwidth.org/15583.html
>>
>> I don't know if it's related, but I can maybe see one thing in common
>> between my machine and the Core 2 Quad from the other bug report and
>> that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
>> Quad is made of two two-core CPUs glued together with little
>> communication between them.
> 
> Yeah, that is probably the reason.
> 
>>> Intel doesn't use TTM because they don't have dedicated VRAM, but the
>>> open source nvidia driver should be affected as well.
>> I'm using the proprietary nvidia driver (because CUDA). Is that supposed
>> to be affected as well?
> 
> No.
> 
>>> We already mitigated that problem and I don't see any solution which
>>> will arrive faster than 4.17.
>> Is that supposed to make the slowdown unnoticeable or just slightly
>> better?
> 
> It completely goes away. The issue with the coherent path is that it
> tries to always allocate the lowest possible memory to make sure that it
> fits into the DMA constrains of all devices in the system.
> 
> But since AMD GPU can handle 40bits of addresses you would need at least
> 1TB of memory in the system to trigger that (or a NUMA where some system
> is low and some in a high area).
> 
> Christian.
> 
>>> The only quick workaround I can see is to avoid firefox, chrome for
>>> example is reported to work perfectly fine.
>> Or use an unaffected GPU/driver ;-)
>>
>> Cheers,
>>
>> Jean-Marc
>>
> 


Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Jean-Marc Valin
Hi Christian,

Thanks for the info. FYI, I've also opened a Firefox bug for that at:
https://bugzilla.mozilla.org/show_bug.cgi?id=1448778
Feel free to comment since you have a better understanding of what's
going on.

One last question: right now I'm running 4.15.0 with the "offending"
patch reverted. Is that safe to run or are there possible bad
interactions with other changes.

Cheers,

Jean-Marc

On 04/06/2018 01:20 PM, Christian König wrote:
> Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin:
>> Hi Christian,
>>
>> On 04/09/2018 07:48 AM, Christian König wrote:
>>> Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:
 Hi Christian,

 Is there a way to turn off these huge pages at boot-time/run-time?
>>> Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.
>> Any reason why
>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>> doesn't solve the problem?
> 
> Because we unfortunately try to allocate huge pages anyway, we
> unfortunately just fail in 100% of all cases.
> 
> That basically gives you both, the extra allocation overhead and the
> still bad throughput.
> 
>> Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
>> them for everything and not just what your patch added, right?
> 
> Correct, that's why I wrote that disabling SWIOTLBs might be better.
> 
 I'm not sure what you mean by "We mitigated the problem by avoiding the
 slow coherent DMA code path on almost all platforms on newer
 kernels". I
 tested up to 4.16 and the performance regression is just as bad as
 it is
 for 4.15.
>>> Indeed 4.16 still doesn't have that. You could use the
>>> amd-staging-drm-next branch or wait for 4.17.
>> Is there a way to pull just that change or is there too much
>> interactions with other changes?
> 
> It adds a new detection if memory allocation needs to be coherent or
> not, that is not something you can easily pull into older versions.
> 
>>> That isn't related to the GFX hardware, but to your CPU/motherboard and
>>> whatever else you have in the system.
>> Well, I have an nvidia GPU in the same system (normally only used for
>> CUDA) and if I use it instead of my RX 560 then I'm not seeing any
>> performance issue with 4.15.
> 
> That's because you are probably using the Nvidia binary driver which has
> a completely separate code base.
> 
>>> Some part of your system needs SWIOTLB and that makes allocating memory
>>> much slower.
>> What would that part be? FTR, I have a complete description of my system
>> at https://jmvalin.dreamwidth.org/15583.html
>>
>> I don't know if it's related, but I can maybe see one thing in common
>> between my machine and the Core 2 Quad from the other bug report and
>> that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
>> Quad is made of two two-core CPUs glued together with little
>> communication between them.
> 
> Yeah, that is probably the reason.
> 
>>> Intel doesn't use TTM because they don't have dedicated VRAM, but the
>>> open source nvidia driver should be affected as well.
>> I'm using the proprietary nvidia driver (because CUDA). Is that supposed
>> to be affected as well?
> 
> No.
> 
>>> We already mitigated that problem and I don't see any solution which
>>> will arrive faster than 4.17.
>> Is that supposed to make the slowdown unnoticeable or just slightly
>> better?
> 
> It completely goes away. The issue with the coherent path is that it
> tries to always allocate the lowest possible memory to make sure that it
> fits into the DMA constrains of all devices in the system.
> 
> But since AMD GPU can handle 40bits of addresses you would need at least
> 1TB of memory in the system to trigger that (or a NUMA where some system
> is low and some in a high area).
> 
> Christian.
> 
>>> The only quick workaround I can see is to avoid firefox, chrome for
>>> example is reported to work perfectly fine.
>> Or use an unaffected GPU/driver ;-)
>>
>> Cheers,
>>
>> Jean-Marc
>>
> 


Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin:

Hi Christian,

On 04/09/2018 07:48 AM, Christian König wrote:

Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:

Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time?

Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Any reason why
echo never > /sys/kernel/mm/transparent_hugepage/enabled
doesn't solve the problem?


Because we unfortunately try to allocate huge pages anyway, we 
unfortunately just fail in 100% of all cases.


That basically gives you both, the extra allocation overhead and the 
still bad throughput.



Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
them for everything and not just what your patch added, right?


Correct, that's why I wrote that disabling SWIOTLBs might be better.


I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer kernels". I
tested up to 4.16 and the performance regression is just as bad as it is
for 4.15.

Indeed 4.16 still doesn't have that. You could use the
amd-staging-drm-next branch or wait for 4.17.

Is there a way to pull just that change or is there too much
interactions with other changes?


It adds a new detection if memory allocation needs to be coherent or 
not, that is not something you can easily pull into older versions.



That isn't related to the GFX hardware, but to your CPU/motherboard and
whatever else you have in the system.

Well, I have an nvidia GPU in the same system (normally only used for
CUDA) and if I use it instead of my RX 560 then I'm not seeing any
performance issue with 4.15.


That's because you are probably using the Nvidia binary driver which has 
a completely separate code base.



Some part of your system needs SWIOTLB and that makes allocating memory
much slower.

What would that part be? FTR, I have a complete description of my system
at https://jmvalin.dreamwidth.org/15583.html

I don't know if it's related, but I can maybe see one thing in common
between my machine and the Core 2 Quad from the other bug report and
that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
Quad is made of two two-core CPUs glued together with little
communication between them.


Yeah, that is probably the reason.


Intel doesn't use TTM because they don't have dedicated VRAM, but the
open source nvidia driver should be affected as well.

I'm using the proprietary nvidia driver (because CUDA). Is that supposed
to be affected as well?


No.


We already mitigated that problem and I don't see any solution which
will arrive faster than 4.17.

Is that supposed to make the slowdown unnoticeable or just slightly better?


It completely goes away. The issue with the coherent path is that it 
tries to always allocate the lowest possible memory to make sure that it 
fits into the DMA constrains of all devices in the system.


But since AMD GPU can handle 40bits of addresses you would need at least 
1TB of memory in the system to trigger that (or a NUMA where some system 
is low and some in a high area).


Christian.


The only quick workaround I can see is to avoid firefox, chrome for
example is reported to work perfectly fine.

Or use an unaffected GPU/driver ;-)

Cheers,

Jean-Marc





Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Am 06.04.2018 um 18:42 schrieb Jean-Marc Valin:

Hi Christian,

On 04/09/2018 07:48 AM, Christian König wrote:

Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:

Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time?

Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Any reason why
echo never > /sys/kernel/mm/transparent_hugepage/enabled
doesn't solve the problem?


Because we unfortunately try to allocate huge pages anyway, we 
unfortunately just fail in 100% of all cases.


That basically gives you both, the extra allocation overhead and the 
still bad throughput.



Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
them for everything and not just what your patch added, right?


Correct, that's why I wrote that disabling SWIOTLBs might be better.


I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer kernels". I
tested up to 4.16 and the performance regression is just as bad as it is
for 4.15.

Indeed 4.16 still doesn't have that. You could use the
amd-staging-drm-next branch or wait for 4.17.

Is there a way to pull just that change or is there too much
interactions with other changes?


It adds a new detection if memory allocation needs to be coherent or 
not, that is not something you can easily pull into older versions.



That isn't related to the GFX hardware, but to your CPU/motherboard and
whatever else you have in the system.

Well, I have an nvidia GPU in the same system (normally only used for
CUDA) and if I use it instead of my RX 560 then I'm not seeing any
performance issue with 4.15.


That's because you are probably using the Nvidia binary driver which has 
a completely separate code base.



Some part of your system needs SWIOTLB and that makes allocating memory
much slower.

What would that part be? FTR, I have a complete description of my system
at https://jmvalin.dreamwidth.org/15583.html

I don't know if it's related, but I can maybe see one thing in common
between my machine and the Core 2 Quad from the other bug report and
that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
Quad is made of two two-core CPUs glued together with little
communication between them.


Yeah, that is probably the reason.


Intel doesn't use TTM because they don't have dedicated VRAM, but the
open source nvidia driver should be affected as well.

I'm using the proprietary nvidia driver (because CUDA). Is that supposed
to be affected as well?


No.


We already mitigated that problem and I don't see any solution which
will arrive faster than 4.17.

Is that supposed to make the slowdown unnoticeable or just slightly better?


It completely goes away. The issue with the coherent path is that it 
tries to always allocate the lowest possible memory to make sure that it 
fits into the DMA constrains of all devices in the system.


But since AMD GPU can handle 40bits of addresses you would need at least 
1TB of memory in the system to trigger that (or a NUMA where some system 
is low and some in a high area).


Christian.


The only quick workaround I can see is to avoid firefox, chrome for
example is reported to work perfectly fine.

Or use an unaffected GPU/driver ;-)

Cheers,

Jean-Marc





Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Jean-Marc Valin
Hi Christian,

On 04/09/2018 07:48 AM, Christian König wrote:
> Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:
>> Hi Christian,
>>
>> Is there a way to turn off these huge pages at boot-time/run-time?
> 
> Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Any reason why
echo never > /sys/kernel/mm/transparent_hugepage/enabled
doesn't solve the problem?

Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
them for everything and not just what your patch added, right?

>> I'm not sure what you mean by "We mitigated the problem by avoiding the
>> slow coherent DMA code path on almost all platforms on newer kernels". I
>> tested up to 4.16 and the performance regression is just as bad as it is
>> for 4.15.
> 
> Indeed 4.16 still doesn't have that. You could use the
> amd-staging-drm-next branch or wait for 4.17.

Is there a way to pull just that change or is there too much
interactions with other changes?

> That isn't related to the GFX hardware, but to your CPU/motherboard and
> whatever else you have in the system.

Well, I have an nvidia GPU in the same system (normally only used for
CUDA) and if I use it instead of my RX 560 then I'm not seeing any
performance issue with 4.15.

> Some part of your system needs SWIOTLB and that makes allocating memory
> much slower.

What would that part be? FTR, I have a complete description of my system
at https://jmvalin.dreamwidth.org/15583.html

I don't know if it's related, but I can maybe see one thing in common
between my machine and the Core 2 Quad from the other bug report and
that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
Quad is made of two two-core CPUs glued together with little
communication between them.

> Intel doesn't use TTM because they don't have dedicated VRAM, but the
> open source nvidia driver should be affected as well.

I'm using the proprietary nvidia driver (because CUDA). Is that supposed
to be affected as well?

> We already mitigated that problem and I don't see any solution which
> will arrive faster than 4.17.

Is that supposed to make the slowdown unnoticeable or just slightly better?

> The only quick workaround I can see is to avoid firefox, chrome for
> example is reported to work perfectly fine.

Or use an unaffected GPU/driver ;-)

Cheers,

Jean-Marc



Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Jean-Marc Valin
Hi Christian,

On 04/09/2018 07:48 AM, Christian König wrote:
> Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:
>> Hi Christian,
>>
>> Is there a way to turn off these huge pages at boot-time/run-time?
> 
> Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Any reason why
echo never > /sys/kernel/mm/transparent_hugepage/enabled
doesn't solve the problem?

Also, I assume that disabling CONFIG_TRANSPARENT_HUGEPAGE will disable
them for everything and not just what your patch added, right?

>> I'm not sure what you mean by "We mitigated the problem by avoiding the
>> slow coherent DMA code path on almost all platforms on newer kernels". I
>> tested up to 4.16 and the performance regression is just as bad as it is
>> for 4.15.
> 
> Indeed 4.16 still doesn't have that. You could use the
> amd-staging-drm-next branch or wait for 4.17.

Is there a way to pull just that change or is there too much
interactions with other changes?

> That isn't related to the GFX hardware, but to your CPU/motherboard and
> whatever else you have in the system.

Well, I have an nvidia GPU in the same system (normally only used for
CUDA) and if I use it instead of my RX 560 then I'm not seeing any
performance issue with 4.15.

> Some part of your system needs SWIOTLB and that makes allocating memory
> much slower.

What would that part be? FTR, I have a complete description of my system
at https://jmvalin.dreamwidth.org/15583.html

I don't know if it's related, but I can maybe see one thing in common
between my machine and the Core 2 Quad from the other bug report and
that's the "NUMA part". I have a dual-socket Xeon and (AFAIK) the Core 2
Quad is made of two two-core CPUs glued together with little
communication between them.

> Intel doesn't use TTM because they don't have dedicated VRAM, but the
> open source nvidia driver should be affected as well.

I'm using the proprietary nvidia driver (because CUDA). Is that supposed
to be affected as well?

> We already mitigated that problem and I don't see any solution which
> will arrive faster than 4.17.

Is that supposed to make the slowdown unnoticeable or just slightly better?

> The only quick workaround I can see is to avoid firefox, chrome for
> example is reported to work perfectly fine.

Or use an unaffected GPU/driver ;-)

Cheers,

Jean-Marc



Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:

Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time?


Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Alternatively you can avoid enabling CONFIG_SWIOTLB which will avoid the 
slow DMA path as well.



Right now the recent kernels are making Firefox pretty much unusable for me.
I've been able to revert the patch from 4.15 but it's not really a
long-term solution.

You mention that the purpose of the patch is to improve performance, but
I haven't actually noticed anything running faster on my system. Is
there any particular test where I'm supposed to see an improvement
compared to 4.14?


Mostly crypto mining, maybe some games as well.


I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer kernels". I
tested up to 4.16 and the performance regression is just as bad as it is
for 4.15.


Indeed 4.16 still doesn't have that. You could use the 
amd-staging-drm-next branch or wait for 4.17.



Unlike the older hardware reported on kernel bug 198511, the hardware I
have is quite recent (RX 560) and still being sold.


That isn't related to the GFX hardware, but to your CPU/motherboard and 
whatever else you have in the system.


Some part of your system needs SWIOTLB and that makes allocating memory 
much slower.



I've also confirmed that neither nvidia (on the same machine) nor intel GPUs 
(on a less
powerful machine) are affected, so it seems like there's a way to avoid
that slow performance.


Intel doesn't use TTM because they don't have dedicated VRAM, but the 
open source nvidia driver should be affected as well.



I'm not saying that what Firefox is doing is
ideal (I don't know what it does and why), but it still seems like
something that should still be avoided in the kernel.


We already mitigated that problem and I don't see any solution which 
will arrive faster than 4.17.


The only quick workaround I can see is to avoid firefox, chrome for 
example is reported to work perfectly fine.


Christian.



Cheers,

Jean-Marc

On 04/06/2018 04:03 AM, Christian König wrote:

Hi Jean,

yeah, that is a known problem. Using huge pages improves the performance
because of better TLB usage, but for the cost of higher allocation
overhead.

What we found is that firefox is doing something rather strange by
allocating large textures and then just trowing them away again
immediately.

We mitigated the problem by avoiding the slow coherent DMA code path on
almost all platforms on newer kernels, but essentially somebody needs to
figure out why firefox and/or the user space stack is doing this
constant allocation/freeing of memory.

There is also a bug tracker on bugs.kernel.org about this, but I can't
find it any more of hand.

Regards,
Christian.

Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:

Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

  drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

 Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Am 06.04.2018 um 17:30 schrieb Jean-Marc Valin:

Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time?


Only at compile time by not setting CONFIG_TRANSPARENT_HUGEPAGE.

Alternatively you can avoid enabling CONFIG_SWIOTLB which will avoid the 
slow DMA path as well.



Right now the recent kernels are making Firefox pretty much unusable for me.
I've been able to revert the patch from 4.15 but it's not really a
long-term solution.

You mention that the purpose of the patch is to improve performance, but
I haven't actually noticed anything running faster on my system. Is
there any particular test where I'm supposed to see an improvement
compared to 4.14?


Mostly crypto mining, maybe some games as well.


I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer kernels". I
tested up to 4.16 and the performance regression is just as bad as it is
for 4.15.


Indeed 4.16 still doesn't have that. You could use the 
amd-staging-drm-next branch or wait for 4.17.



Unlike the older hardware reported on kernel bug 198511, the hardware I
have is quite recent (RX 560) and still being sold.


That isn't related to the GFX hardware, but to your CPU/motherboard and 
whatever else you have in the system.


Some part of your system needs SWIOTLB and that makes allocating memory 
much slower.



I've also confirmed that neither nvidia (on the same machine) nor intel GPUs 
(on a less
powerful machine) are affected, so it seems like there's a way to avoid
that slow performance.


Intel doesn't use TTM because they don't have dedicated VRAM, but the 
open source nvidia driver should be affected as well.



I'm not saying that what Firefox is doing is
ideal (I don't know what it does and why), but it still seems like
something that should still be avoided in the kernel.


We already mitigated that problem and I don't see any solution which 
will arrive faster than 4.17.


The only quick workaround I can see is to avoid firefox, chrome for 
example is reported to work perfectly fine.


Christian.



Cheers,

Jean-Marc

On 04/06/2018 04:03 AM, Christian König wrote:

Hi Jean,

yeah, that is a known problem. Using huge pages improves the performance
because of better TLB usage, but for the cost of higher allocation
overhead.

What we found is that firefox is doing something rather strange by
allocating large textures and then just trowing them away again
immediately.

We mitigated the problem by avoiding the slow coherent DMA code path on
almost all platforms on newer kernels, but essentially somebody needs to
figure out why firefox and/or the user space stack is doing this
constant allocation/freeing of memory.

There is also a bug tracker on bugs.kernel.org about this, but I can't
find it any more of hand.

Regards,
Christian.

Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:

Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

  drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

 Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Jean-Marc Valin
Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time? Right
now the recent kernels are making Firefox pretty much unusable for me.
I've been able to revert the patch from 4.15 but it's not really a
long-term solution.

You mention that the purpose of the patch is to improve performance, but
I haven't actually noticed anything running faster on my system. Is
there any particular test where I'm supposed to see an improvement
compared to 4.14?

I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer kernels". I
tested up to 4.16 and the performance regression is just as bad as it is
for 4.15.

Unlike the older hardware reported on kernel bug 198511, the hardware I
have is quite recent (RX 560) and still being sold. I've also confirmed
that neither nvidia (on the same machine) nor intel GPUs (on a less
powerful machine) are affected, so it seems like there's a way to avoid
that slow performance. I'm not saying that what Firefox is doing is
ideal (I don't know what it does and why), but it still seems like
something that should still be avoided in the kernel.

Cheers,

Jean-Marc

On 04/06/2018 04:03 AM, Christian König wrote:
> Hi Jean,
> 
> yeah, that is a known problem. Using huge pages improves the performance
> because of better TLB usage, but for the cost of higher allocation
> overhead.
> 
> What we found is that firefox is doing something rather strange by
> allocating large textures and then just trowing them away again
> immediately.
> 
> We mitigated the problem by avoiding the slow coherent DMA code path on
> almost all platforms on newer kernels, but essentially somebody needs to
> figure out why firefox and/or the user space stack is doing this
> constant allocation/freeing of memory.
> 
> There is also a bug tracker on bugs.kernel.org about this, but I can't
> find it any more of hand.
> 
> Regards,
> Christian.
> 
> Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:
>> Hi,
>>
>> I noticed a serious graphics performance regression between 4.14 and
>> 4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
>> causes scrolling to be really choppy/sluggish. I've confirmed that the
>> problem is also there on 4.16, while 4.13 works fine.
>>
>> After a bisection, I've narrowed the regression down to this commit:
>>
>> commit 648bc3574716400acc06f99915815f80d9563783
>> Author: Christian König 
>> Date:   Thu Jul 6 09:59:43 2017 +0200
>>
>>  drm/ttm: add transparent huge page support for DMA allocations v2
>>
>>
>> Some details about my system:
>> Distro: Fedora 27 (up-to-date)
>> Video: MSI Radeon RX 560 AERO
>> CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
>> RAM: 128 GB ECC
>>
>>
>> As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
>> (with Intel graphics only) the responsiveness is much better then what
>> I'm getting on the Xeon machine above with the Radeon card, so this
>> really seems to be an AMD-only issue.
>>
>> Any way to fix the issue?
>>
>> Thanks,
>>
>> Jean-Marc
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 


Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Jean-Marc Valin
Hi Christian,

Is there a way to turn off these huge pages at boot-time/run-time? Right
now the recent kernels are making Firefox pretty much unusable for me.
I've been able to revert the patch from 4.15 but it's not really a
long-term solution.

You mention that the purpose of the patch is to improve performance, but
I haven't actually noticed anything running faster on my system. Is
there any particular test where I'm supposed to see an improvement
compared to 4.14?

I'm not sure what you mean by "We mitigated the problem by avoiding the
slow coherent DMA code path on almost all platforms on newer kernels". I
tested up to 4.16 and the performance regression is just as bad as it is
for 4.15.

Unlike the older hardware reported on kernel bug 198511, the hardware I
have is quite recent (RX 560) and still being sold. I've also confirmed
that neither nvidia (on the same machine) nor intel GPUs (on a less
powerful machine) are affected, so it seems like there's a way to avoid
that slow performance. I'm not saying that what Firefox is doing is
ideal (I don't know what it does and why), but it still seems like
something that should still be avoided in the kernel.

Cheers,

Jean-Marc

On 04/06/2018 04:03 AM, Christian König wrote:
> Hi Jean,
> 
> yeah, that is a known problem. Using huge pages improves the performance
> because of better TLB usage, but for the cost of higher allocation
> overhead.
> 
> What we found is that firefox is doing something rather strange by
> allocating large textures and then just trowing them away again
> immediately.
> 
> We mitigated the problem by avoiding the slow coherent DMA code path on
> almost all platforms on newer kernels, but essentially somebody needs to
> figure out why firefox and/or the user space stack is doing this
> constant allocation/freeing of memory.
> 
> There is also a bug tracker on bugs.kernel.org about this, but I can't
> find it any more of hand.
> 
> Regards,
> Christian.
> 
> Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:
>> Hi,
>>
>> I noticed a serious graphics performance regression between 4.14 and
>> 4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
>> causes scrolling to be really choppy/sluggish. I've confirmed that the
>> problem is also there on 4.16, while 4.13 works fine.
>>
>> After a bisection, I've narrowed the regression down to this commit:
>>
>> commit 648bc3574716400acc06f99915815f80d9563783
>> Author: Christian König 
>> Date:   Thu Jul 6 09:59:43 2017 +0200
>>
>>  drm/ttm: add transparent huge page support for DMA allocations v2
>>
>>
>> Some details about my system:
>> Distro: Fedora 27 (up-to-date)
>> Video: MSI Radeon RX 560 AERO
>> CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
>> RAM: 128 GB ECC
>>
>>
>> As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
>> (with Intel graphics only) the responsiveness is much better then what
>> I'm getting on the Xeon machine above with the Radeon card, so this
>> really seems to be an AMD-only issue.
>>
>> Any way to fix the issue?
>>
>> Thanks,
>>
>> Jean-Marc
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 


Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Hi Jean,

found the bug reports.

Here is the original bug report from the kernel: 
https://bugzilla.kernel.org/show_bug.cgi?id=198511


And here is an fdo bug report where we tried to investigate the root 
cause, but didn't had time for that yet: 
https://bugs.freedesktop.org/show_bug.cgi?id=105038


Regards,
Christian.

Am 06.04.2018 um 10:03 schrieb Christian König:

Hi Jean,

yeah, that is a known problem. Using huge pages improves the 
performance because of better TLB usage, but for the cost of higher 
allocation overhead.


What we found is that firefox is doing something rather strange by 
allocating large textures and then just trowing them away again 
immediately.


We mitigated the problem by avoiding the slow coherent DMA code path 
on almost all platforms on newer kernels, but essentially somebody 
needs to figure out why firefox and/or the user space stack is doing 
this constant allocation/freeing of memory.


There is also a bug tracker on bugs.kernel.org about this, but I can't 
find it any more of hand.


Regards,
Christian.

Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:

Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

 drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel






Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Hi Jean,

found the bug reports.

Here is the original bug report from the kernel: 
https://bugzilla.kernel.org/show_bug.cgi?id=198511


And here is an fdo bug report where we tried to investigate the root 
cause, but didn't had time for that yet: 
https://bugs.freedesktop.org/show_bug.cgi?id=105038


Regards,
Christian.

Am 06.04.2018 um 10:03 schrieb Christian König:

Hi Jean,

yeah, that is a known problem. Using huge pages improves the 
performance because of better TLB usage, but for the cost of higher 
allocation overhead.


What we found is that firefox is doing something rather strange by 
allocating large textures and then just trowing them away again 
immediately.


We mitigated the problem by avoiding the slow coherent DMA code path 
on almost all platforms on newer kernels, but essentially somebody 
needs to figure out why firefox and/or the user space stack is doing 
this constant allocation/freeing of memory.


There is also a bug tracker on bugs.kernel.org about this, but I can't 
find it any more of hand.


Regards,
Christian.

Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:

Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

 drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel






Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Hi Jean,

yeah, that is a known problem. Using huge pages improves the performance 
because of better TLB usage, but for the cost of higher allocation overhead.


What we found is that firefox is doing something rather strange by 
allocating large textures and then just trowing them away again immediately.


We mitigated the problem by avoiding the slow coherent DMA code path on 
almost all platforms on newer kernels, but essentially somebody needs to 
figure out why firefox and/or the user space stack is doing this 
constant allocation/freeing of memory.


There is also a bug tracker on bugs.kernel.org about this, but I can't 
find it any more of hand.


Regards,
Christian.

Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:

Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

 drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




Re: AMD graphics performance regression in 4.15 and later

2018-04-06 Thread Christian König

Hi Jean,

yeah, that is a known problem. Using huge pages improves the performance 
because of better TLB usage, but for the cost of higher allocation overhead.


What we found is that firefox is doing something rather strange by 
allocating large textures and then just trowing them away again immediately.


We mitigated the problem by avoiding the slow coherent DMA code path on 
almost all platforms on newer kernels, but essentially somebody needs to 
figure out why firefox and/or the user space stack is doing this 
constant allocation/freeing of memory.


There is also a bug tracker on bugs.kernel.org about this, but I can't 
find it any more of hand.


Regards,
Christian.

Am 06.04.2018 um 02:30 schrieb Jean-Marc Valin:

Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

 drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

Jean-Marc
___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




AMD graphics performance regression in 4.15 and later

2018-04-05 Thread Jean-Marc Valin
Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

Jean-Marc


AMD graphics performance regression in 4.15 and later

2018-04-05 Thread Jean-Marc Valin
Hi,

I noticed a serious graphics performance regression between 4.14 and
4.15. It is most noticeable with Firefox (tried FF57 through FF60) and
causes scrolling to be really choppy/sluggish. I've confirmed that the
problem is also there on 4.16, while 4.13 works fine.

After a bisection, I've narrowed the regression down to this commit:

commit 648bc3574716400acc06f99915815f80d9563783
Author: Christian König 
Date:   Thu Jul 6 09:59:43 2017 +0200

drm/ttm: add transparent huge page support for DMA allocations v2


Some details about my system:
Distro: Fedora 27 (up-to-date)
Video: MSI Radeon RX 560 AERO
CPU: Dual-socket Xeon E5-2640 v4 (20 cores total)
RAM: 128 GB ECC


As a comparison, when running Firefox with 4.15 on a Lenovo W540 laptop
(with Intel graphics only) the responsiveness is much better then what
I'm getting on the Xeon machine above with the Radeon card, so this
really seems to be an AMD-only issue.

Any way to fix the issue?

Thanks,

Jean-Marc