Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-11 Thread Linus Torvalds
On Mon, Jun 11, 2018 at 12:07 AM Christoph Hellwig  wrote:
>
> For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
> addressing these issues properly.

Ok, reverted in my tree, and marked for stable (for 4.17). Thanks,

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-11 Thread Gabriel C
2018-06-08 8:52 GMT+02:00 Christian König :
> Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
>>
>> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>>>
>>> Ok done.. bisect points to:
>>
>> What is the failure mode you are seeing?  Can't find anything in the
>> mail unfortunately.
>
>
> As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
>
> Still need to figure out which parameters we want to use for the allocation,
> but I think it is only 4k or 8k.

When you guys need me to test something , or run debug patches
or patches of any sort just let me know..

>
> Regards,
> Christian.

BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-11 Thread Christoph Hellwig
I think the prime issue is that dma_direct_alloc respects the dma
mask.  Which we don't need if actually using the iommu.  This would
be mostly harmless exept for the the SEV bit high in the address that
makes the checks fail.

For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
addressing these issues properly.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-08 Thread Gabriel C
2018-06-07 9:07 GMT+02:00 Christian König :
> Am 06.06.2018 um 17:44 schrieb Gabriel C:
>>
>> 2018-06-06 17:03 GMT+02:00 Michel Dänzer :
>>>
>>> On 2018-06-06 04:44 PM, Christian König wrote:

 Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
 [SNIP]
 At least in theory it should work when we use the coherent DMA
 allocator.

 When that really worked before, so the most likely commit which broke
 this is:

 commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
 Author: Chunming Zhou 
 Date:   Fri Feb 9 10:44:09 2018 +0800

  drm/amdgpu: only enable swiotlb alloc when need v2

  get the max io mapping address of system memory to see if it is
 over
  our card accessing range.
  v2: move checking later

  Signed-off-by: Chunming Zhou 
  Reviewed-by: Monk Liu 
  Reviewed-by: Christian König 
  Signed-off-by: Alex Deucher 

 Currently looking into how we could somehow improve this detection.
>>>
>>> I guess this could fit for Gabriel, but e.g.
>>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
>>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
>>> earlier).
>
>
> And what I totally missed is that Gabriel is using radeon and not amdgpu.
>
> So Gabriel you need to revert this one for testing:
> commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
> Author: Chunming Zhou 
> Date:   Fri Feb 9 10:44:10 2018 +0800
>
> drm/radeon: only enable swiotlb path when need v2
>
> swiotlb expands our card accessing range, but its path always is slower
> than ttm pool allocation.
> So add condition to use it.
> v2: move a bit later
>
> Signed-off-by: Chunming Zhou 
> Reviewed-by: Monk Liu 
> Reviewed-by: Christian König 
> Signed-off-by: Alex Deucher 
> Link:
> https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...@amd.com
>
>> I got strange performance issue with 4.15 and 4.16 .. but SME was ON
>> on that setup ( even before it hit mainline ) and never broke the GPU like
>> this.
>
>
> Well that is very interesting, you are the first one who reports that SME +
> GFX works in some way. So far we only got negative reports for that.
>
>> There is a 4.16.13 boot dmesg which has no such issue:
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>
>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>
>
> Please do the bisect if the patch I've mentioned above doesn't help.

Ok done.. bisect points to:

b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
Author: Christoph Hellwig 
Date:   Mon Mar 19 11:38:19 2018 +0100

   iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()

   This cleans up the code a lot by removing duplicate logic.

   Tested-by: Tom Lendacky 
   Tested-by: Joerg Roedel 
   Signed-off-by: Christoph Hellwig 
   Reviewed-by: Thomas Gleixner 
   Acked-by: Joerg Roedel 
   Cc: David Woodhouse 
   Cc: Joerg Roedel 
   Cc: Jon Mason 
   Cc: Konrad Rzeszutek Wilk 
   Cc: Linus Torvalds 
   Cc: Muli Ben-Yehuda 
   Cc: Peter Zijlstra 
   Cc: io...@lists.linux-foundation.org
   Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de
   Signed-off-by: Ingo Molnar 


I'll try to revert this once I'm home.

BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-08 Thread Gabriel C
>> Well that is very interesting, you are the first one who reports that SME +
>> GFX works in some way. So far we only got negative reports for that.
>>
>>> There is a 4.16.13 boot dmesg which has no such issue:
>>>
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>>
>>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>>
>>
>> Please do the bisect if the patch I've mentioned above doesn't help.
>
> Ok done.. bisect points to:
>
> b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
> commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
> Author: Christoph Hellwig 
> Date:   Mon Mar 19 11:38:19 2018 +0100
>
>iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
>
>This cleans up the code a lot by removing duplicate logic.
>
>Tested-by: Tom Lendacky 
>Tested-by: Joerg Roedel 
>Signed-off-by: Christoph Hellwig 
>Reviewed-by: Thomas Gleixner 
>Acked-by: Joerg Roedel 
>Cc: David Woodhouse 
>Cc: Joerg Roedel 
>Cc: Jon Mason 
>Cc: Konrad Rzeszutek Wilk 
>Cc: Linus Torvalds 
>Cc: Muli Ben-Yehuda 
>Cc: Peter Zijlstra 
>Cc: io...@lists.linux-foundation.org
>Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de
>Signed-off-by: Ingo Molnar 
>
>
> I'll try to revert this once I'm home.

I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
fixes that issue for me.

The GPU is working fine with SME enabled.

Now with working GPU :) I can also confirm performance is back to normal
without doing any other workarounds.

The only app still acting up a bit is Firefox , just minor frame drops,
but nothing to bad.  ( probably an Firefox bug too )

crhomium/chrome is fine .. even with 10 tabs open , each one playing
an video on youtube no glitches at all.

Desktop is also fine now,  could not find anything wrong.


BR
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-08 Thread Christian König

Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:

On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:

Ok done.. bisect points to:

What is the failure mode you are seeing?  Can't find anything in the
mail unfortunately.


As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in 
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.


Still need to figure out which parameters we want to use for the 
allocation, but I think it is only 4k or 8k.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-08 Thread Christian König

Hi Christoph,

Am 08.06.2018 um 08:01 schrieb Christoph Hellwig:

On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote:

Hi Christopher,

I don't see a Christopher on the Cc list..


Sorry, auto-uncorrection. I indeed meant you :)

Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Christoph Hellwig
On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
> Ok done.. bisect points to:

What is the failure mode you are seeing?  Can't find anything in the
mail unfortunately.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Christoph Hellwig
On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote:
> Hi Christopher,

I don't see a Christopher on the Cc list..
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Christian König

Hi Christopher,

Am 07.06.2018 um 18:24 schrieb Gabriel C:

[SNIP]
Ok done.. bisect points to:

b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
Author: Christoph Hellwig 
Date:   Mon Mar 19 11:38:19 2018 +0100

iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()

This cleans up the code a lot by removing duplicate logic.

Tested-by: Tom Lendacky 
Tested-by: Joerg Roedel 
Signed-off-by: Christoph Hellwig 
Reviewed-by: Thomas Gleixner 
Acked-by: Joerg Roedel 
Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Jon Mason 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Muli Ben-Yehuda 
Cc: Peter Zijlstra 
Cc: io...@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de
Signed-off-by: Ingo Molnar 


I'll try to revert this once I'm home.

I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
fixes that issue for me.


any idea what could cause that? Basically this patch breaks radeon when 
SME is enabled.



The GPU is working fine with SME enabled.

Now with working GPU :) I can also confirm performance is back to normal
without doing any other workarounds.

The only app still acting up a bit is Firefox , just minor frame drops,
but nothing to bad.  ( probably an Firefox bug too )

crhomium/chrome is fine .. even with 10 tabs open , each one playing
an video on youtube no glitches at all.

Desktop is also fine now,  could not find anything wrong.


Thanks for testing,
Christian.




BR


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Christian König

Am 06.06.2018 um 17:44 schrieb Gabriel C:

2018-06-06 17:03 GMT+02:00 Michel Dänzer :

On 2018-06-06 04:44 PM, Christian König wrote:

Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
[SNIP]
At least in theory it should work when we use the coherent DMA allocator.

When that really worked before, so the most likely commit which broke
this is:

commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou 
Date:   Fri Feb 9 10:44:09 2018 +0800

 drm/amdgpu: only enable swiotlb alloc when need v2

 get the max io mapping address of system memory to see if it is over
 our card accessing range.
 v2: move checking later

 Signed-off-by: Chunming Zhou 
 Reviewed-by: Monk Liu 
 Reviewed-by: Christian König 
 Signed-off-by: Alex Deucher 

Currently looking into how we could somehow improve this detection.

I guess this could fit for Gabriel, but e.g.
https://bugs.freedesktop.org/104437 says amdgpu was already broken with
SME in 4.15, if not 4.14 (I suspect there was simply no SME support
earlier).


And what I totally missed is that Gabriel is using radeon and not amdgpu.

So Gabriel you need to revert this one for testing:
commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
Author: Chunming Zhou 
Date:   Fri Feb 9 10:44:10 2018 +0800

    drm/radeon: only enable swiotlb path when need v2

    swiotlb expands our card accessing range, but its path always is slower
    than ttm pool allocation.
    So add condition to use it.
    v2: move a bit later

    Signed-off-by: Chunming Zhou 
    Reviewed-by: Monk Liu 
    Reviewed-by: Christian König 
    Signed-off-by: Alex Deucher 
    Link: 
https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...@amd.com



I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.


Well that is very interesting, you are the first one who reports that 
SME + GFX works in some way. So far we only got negative reports for that.



There is a 4.16.13 boot dmesg which has no such issue:

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt

With the setup as is booting 4.16.x works , while 4.17 trows the errors.


Please do the bisect if the patch I've mentioned above doesn't help.

Thanks,
Christian.





--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Gabriel C
2018-06-06 17:03 GMT+02:00 Michel Dänzer :
> On 2018-06-06 04:44 PM, Christian König wrote:
>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> On 2018-06-06 03:33 PM, Gabriel C wrote:
 2018-06-06 14:19 GMT+02:00 Christian König :
> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>> 2018-06-06 13:33 GMT+02:00 Christian König :
>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>
>>
>> Also nothing else changed in that setup just testing kernel 4.17.
>
>
> That has nothing TODO with the driver nor the original bug you
> reported. The
> problem is that SME is active and that is currently not supported at
> all
> with a that hardware.

 Ok .. so are we playing now kernel an AMD Hardware roulette on each
 release ?

 SME was like this in kernel 4.16.x here and all worked.
>>>
>>> If that is true, again please bisect which commit broke it.
>>>
>>> All the reports I've seen before this indicated that at least amdgpu
>>> has never worked with SME (which BTW doesn't mean it's never going to
>>> work or that we don't want to support it, just that as far as we know
>>> it's currently not working).
>>
>> At least in theory it should work when we use the coherent DMA allocator.
>>
>> When that really worked before, so the most likely commit which broke
>> this is:
>>
>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>> Author: Chunming Zhou 
>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>
>> drm/amdgpu: only enable swiotlb alloc when need v2
>>
>> get the max io mapping address of system memory to see if it is over
>> our card accessing range.
>> v2: move checking later
>>
>> Signed-off-by: Chunming Zhou 
>> Reviewed-by: Monk Liu 
>> Reviewed-by: Christian König 
>> Signed-off-by: Alex Deucher 
>>
>> Currently looking into how we could somehow improve this detection.
>
> I guess this could fit for Gabriel, but e.g.
> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
> earlier).

I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.

There is a 4.16.13 boot dmesg which has no such issue:

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt

With the setup as is booting 4.16.x works , while 4.17 trows the errors.

>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Gabriel C
2018-06-06 14:19 GMT+02:00 Christian König :
> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>
>> 2018-06-06 13:33 GMT+02:00 Christian König :
>>>
>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:

 2018-04-11 7:02 GMT+02:00 Gabriel C :
>>
>> 2018-04-11 6:00 GMT+02:00 Gabriel C :
>> 2018-04-09 11:42 GMT+02:00 Christian König
>> :
>>>
>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>
> ...
>>
>> I can help testing code for 4.17/++ if you wish but that is
>> *different*
>> storry.
>>
> Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
> are broken now in this one.
>
> radeon tells:
>
> ...
>
> [6.337838] [drm] PCIE GART of 2048M enabled (table at
> 0x001D6000).
> [6.338210] radeon :21:00.0: (-12) create WB bo failed
> [6.338214] radeon :21:00.0: disabling GPU acceleration
>
> ...
>
 I have the same Issue now on final 4.17.
>>>
>>>
>>> Actually Michel came up with a fix for the performance regression which
>>> is
>>> now backported to older kernels as well.
>>>
>>> So the original issue of this mail thread should be fixed by now.
>>
>> Ok , will test as soon I get the GPU to work :))
>>
 Also I played with BIOS options also which does not fix anything but
 changes the error message.

 IOMMU && SR-IOV disabled the error changes to this :

 [7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
 test failed (scratch(0x850C)=0xCAFEDEAD)
 [7.092059] radeon :21:00.0: disabling GPU acceleration


 While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
 kill the GPU with no way
 for me to make it work ( at least I could not find any workaround by now
 )
>>>
>>>
>>> That actually sounds like something completely different. Can you provide
>>> a
>>> full dmesg of radeon and/or amdgpu?
>>
>> Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>
>> Also nothing else changed in that setup just testing kernel 4.17.
>
>
> That has nothing TODO with the driver nor the original bug you reported. The
> problem is that SME is active and that is currently not supported at all
> with a that hardware.

Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?

SME was like this in kernel 4.16.x here and all worked.

Also if you don't support SME at all now on that Hardware while worked before
please add proper error handling and proper dmesg messages
letting the user know.

radeon:  : SME not supported on that Hardware anymore , please
disable SME...
radeon: : Update your GPU < or whatever >

How hard would be that ?

No one but developers , can guess from these error messges why his
hardware  suddenly  isn't working anymore by just updating the kernel.


>
> Try to disable SME either in the BIOS or on the kernel command line.

Yes that works but is not the point.

Really you just can't break users setups like this.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-07 Thread Gabriel C
2018-06-06 16:44 GMT+02:00 Christian König :
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>
>>> 2018-06-06 14:19 GMT+02:00 Christian König :

 Am 06.06.2018 um 14:08 schrieb Gabriel C:
>
> 2018-06-06 13:33 GMT+02:00 Christian König :
>>
>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>>> 2018-04-11 7:02 GMT+02:00 Gabriel C :


 [6.337838] [drm] PCIE GART of 2048M enabled (table at
 0x001D6000).
 [6.338210] radeon :21:00.0: (-12) create WB bo failed
 [6.338214] radeon :21:00.0: disabling GPU acceleration

 ...

>>> I have the same Issue now on final 4.17.
>>
>>
>> Please file a bug report, and ideally bisect which commit(s) introduced
>> the issue(s).
>>
>>
>
> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>
>
> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>
> Also nothing else changed in that setup just testing kernel 4.17.



 That has nothing TODO with the driver nor the original bug you reported.
 The
 problem is that SME is active and that is currently not supported at all
 with a that hardware.
>>>
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu has
>> never worked with SME (which BTW doesn't mean it's never going to work or
>> that we don't want to support it, just that as far as we know it's currently
>> not working).
>
>
> At least in theory it should work when we use the coherent DMA allocator.
>
> When that really worked before, so the most likely commit which broke this
> is:
>
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou 
> Date:   Fri Feb 9 10:44:09 2018 +0800
>
> drm/amdgpu: only enable swiotlb alloc when need v2
>
> get the max io mapping address of system memory to see if it is over
> our card accessing range.
> v2: move checking later
>
> Signed-off-by: Chunming Zhou 
> Reviewed-by: Monk Liu 
> Reviewed-by: Christian König 
> Signed-off-by: Alex Deucher 
>
> Currently looking into how we could somehow improve this detection.

Is not this one , I've build an kernel with this reverted.

I'll do an bisect tonight or tomorrow.

>
> Regards,
> Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-06 Thread Michel Dänzer
On 2018-06-06 04:44 PM, Christian König wrote:
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>> 2018-06-06 14:19 GMT+02:00 Christian König :
 Am 06.06.2018 um 14:08 schrieb Gabriel C:
> 2018-06-06 13:33 GMT+02:00 Christian König :
>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>
> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>
>
> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>
>
> Also nothing else changed in that setup just testing kernel 4.17.


 That has nothing TODO with the driver nor the original bug you
 reported. The
 problem is that SME is active and that is currently not supported at
 all
 with a that hardware.
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu
>> has never worked with SME (which BTW doesn't mean it's never going to
>> work or that we don't want to support it, just that as far as we know
>> it's currently not working).
> 
> At least in theory it should work when we use the coherent DMA allocator.
> 
> When that really worked before, so the most likely commit which broke
> this is:
> 
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou 
> Date:   Fri Feb 9 10:44:09 2018 +0800
> 
>     drm/amdgpu: only enable swiotlb alloc when need v2
> 
>     get the max io mapping address of system memory to see if it is over
>     our card accessing range.
>     v2: move checking later
> 
>     Signed-off-by: Chunming Zhou 
>     Reviewed-by: Monk Liu 
>     Reviewed-by: Christian König 
>     Signed-off-by: Alex Deucher 
> 
> Currently looking into how we could somehow improve this detection.

I guess this could fit for Gabriel, but e.g.
https://bugs.freedesktop.org/104437 says amdgpu was already broken with
SME in 4.15, if not 4.14 (I suspect there was simply no SME support
earlier).


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-06 Thread Christian König

Am 06.06.2018 um 16:12 schrieb Michel Dänzer:

On 2018-06-06 03:33 PM, Gabriel C wrote:

2018-06-06 14:19 GMT+02:00 Christian König :

Am 06.06.2018 um 14:08 schrieb Gabriel C:

2018-06-06 13:33 GMT+02:00 Christian König :

Am 06.06.2018 um 13:28 schrieb Gabriel C:

2018-04-11 7:02 GMT+02:00 Gabriel C :


[    6.337838] [drm] PCIE GART of 2048M enabled (table at
0x001D6000).
[    6.338210] radeon :21:00.0: (-12) create WB bo failed
[    6.338214] radeon :21:00.0: disabling GPU acceleration

...


I have the same Issue now on final 4.17.


Please file a bug report, and ideally bisect which commit(s) 
introduced the issue(s).



http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt 



http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt 



Also nothing else changed in that setup just testing kernel 4.17.



That has nothing TODO with the driver nor the original bug you 
reported. The
problem is that SME is active and that is currently not supported at 
all

with a that hardware.


Ok .. so are we playing now kernel an AMD Hardware roulette on each 
release ?


SME was like this in kernel 4.16.x here and all worked.


If that is true, again please bisect which commit broke it.

All the reports I've seen before this indicated that at least amdgpu 
has never worked with SME (which BTW doesn't mean it's never going to 
work or that we don't want to support it, just that as far as we know 
it's currently not working).


At least in theory it should work when we use the coherent DMA allocator.

When that really worked before, so the most likely commit which broke 
this is:


commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou 
Date:   Fri Feb 9 10:44:09 2018 +0800

    drm/amdgpu: only enable swiotlb alloc when need v2

    get the max io mapping address of system memory to see if it is over
    our card accessing range.
    v2: move checking later

    Signed-off-by: Chunming Zhou 
    Reviewed-by: Monk Liu 
    Reviewed-by: Christian König 
    Signed-off-by: Alex Deucher 

Currently looking into how we could somehow improve this detection.

Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-06 Thread Christian König

Am 06.06.2018 um 15:33 schrieb Gabriel C:

2018-06-06 14:19 GMT+02:00 Christian König :

Am 06.06.2018 um 14:08 schrieb Gabriel C:

[SNIP]

That has nothing TODO with the driver nor the original bug you reported. The
problem is that SME is active and that is currently not supported at all
with a that hardware.

Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?

SME was like this in kernel 4.16.x here and all worked.

Also if you don't support SME at all now on that Hardware while worked before
please add proper error handling and proper dmesg messages
letting the user know.

radeon:  : SME not supported on that Hardware anymore , please
disable SME...
radeon: : Update your GPU < or whatever >

How hard would be that ?


Yes, to be precise that isn't the job of the GFX driver to care about 
such things.


It is a well known and documented limitation of SME that it is in 
general mostly incompatible with GFX (or compute) hardware, and it 
actually doesn't matter which hardware or driver you use.


In other words what happens is that as soon as you use GFX (or compute) 
SME gets disabled transparently.


The problem is that this happens only on the DMA slow path we just 
disabled because of the performance problems.


Going to propose to revert that or at least only use it when SME is 
disabled.


Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

2018-06-06 Thread Michel Dänzer

On 2018-06-06 03:33 PM, Gabriel C wrote:

2018-06-06 14:19 GMT+02:00 Christian König :

Am 06.06.2018 um 14:08 schrieb Gabriel C:

2018-06-06 13:33 GMT+02:00 Christian König :

Am 06.06.2018 um 13:28 schrieb Gabriel C:

2018-04-11 7:02 GMT+02:00 Gabriel C :


[6.337838] [drm] PCIE GART of 2048M enabled (table at
0x001D6000).
[6.338210] radeon :21:00.0: (-12) create WB bo failed
[6.338214] radeon :21:00.0: disabling GPU acceleration

...


I have the same Issue now on final 4.17.


Please file a bug report, and ideally bisect which commit(s) introduced 
the issue(s).




http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt

Also nothing else changed in that setup just testing kernel 4.17.



That has nothing TODO with the driver nor the original bug you reported. The
problem is that SME is active and that is currently not supported at all
with a that hardware.


Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?

SME was like this in kernel 4.16.x here and all worked.


If that is true, again please bisect which commit broke it.

All the reports I've seen before this indicated that at least amdgpu has 
never worked with SME (which BTW doesn't mean it's never going to work 
or that we don't want to support it, just that as far as we know it's 
currently not working).



--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel