Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
On Mon, Jun 11, 2018 at 12:07 AM Christoph Hellwig wrote: > > For now I'd say revert this commit for 4.17/4.18-rc and I'll look into > addressing these issues properly. Ok, reverted in my tree, and marked for stable (for 4.17). Thanks, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
2018-06-08 8:52 GMT+02:00 Christian König : > Am 08.06.2018 um 08:02 schrieb Christoph Hellwig: >> >> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote: >>> >>> Ok done.. bisect points to: >> >> What is the failure mode you are seeing? Can't find anything in the >> mail unfortunately. > > > As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in > drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled. > > Still need to figure out which parameters we want to use for the allocation, > but I think it is only 4k or 8k. When you guys need me to test something , or run debug patches or patches of any sort just let me know.. > > Regards, > Christian. BR ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
I think the prime issue is that dma_direct_alloc respects the dma mask. Which we don't need if actually using the iommu. This would be mostly harmless exept for the the SEV bit high in the address that makes the checks fail. For now I'd say revert this commit for 4.17/4.18-rc and I'll look into addressing these issues properly. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
2018-06-07 9:07 GMT+02:00 Christian König : > Am 06.06.2018 um 17:44 schrieb Gabriel C: >> >> 2018-06-06 17:03 GMT+02:00 Michel Dänzer : >>> >>> On 2018-06-06 04:44 PM, Christian König wrote: Am 06.06.2018 um 16:12 schrieb Michel Dänzer: [SNIP] At least in theory it should work when we use the coherent DMA allocator. When that really worked before, so the most likely commit which broke this is: commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou Date: Fri Feb 9 10:44:09 2018 +0800 drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later Signed-off-by: Chunming Zhou Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher Currently looking into how we could somehow improve this detection. >>> >>> I guess this could fit for Gabriel, but e.g. >>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with >>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support >>> earlier). > > > And what I totally missed is that Gabriel is using radeon and not amdgpu. > > So Gabriel you need to revert this one for testing: > commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f > Author: Chunming Zhou > Date: Fri Feb 9 10:44:10 2018 +0800 > > drm/radeon: only enable swiotlb path when need v2 > > swiotlb expands our card accessing range, but its path always is slower > than ttm pool allocation. > So add condition to use it. > v2: move a bit later > > Signed-off-by: Chunming Zhou > Reviewed-by: Monk Liu > Reviewed-by: Christian König > Signed-off-by: Alex Deucher > Link: > https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...@amd.com > >> I got strange performance issue with 4.15 and 4.16 .. but SME was ON >> on that setup ( even before it hit mainline ) and never broke the GPU like >> this. > > > Well that is very interesting, you are the first one who reports that SME + > GFX works in some way. So far we only got negative reports for that. > >> There is a 4.16.13 boot dmesg which has no such issue: >> >> >> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt >> >> With the setup as is booting 4.16.x works , while 4.17 trows the errors. > > > Please do the bisect if the patch I've mentioned above doesn't help. Ok done.. bisect points to: b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 Author: Christoph Hellwig Date: Mon Mar 19 11:38:19 2018 +0100 iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() This cleans up the code a lot by removing duplicate logic. Tested-by: Tom Lendacky Tested-by: Joerg Roedel Signed-off-by: Christoph Hellwig Reviewed-by: Thomas Gleixner Acked-by: Joerg Roedel Cc: David Woodhouse Cc: Joerg Roedel Cc: Jon Mason Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Muli Ben-Yehuda Cc: Peter Zijlstra Cc: io...@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de Signed-off-by: Ingo Molnar I'll try to revert this once I'm home. BR ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
>> Well that is very interesting, you are the first one who reports that SME + >> GFX works in some way. So far we only got negative reports for that. >> >>> There is a 4.16.13 boot dmesg which has no such issue: >>> >>> >>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt >>> >>> With the setup as is booting 4.16.x works , while 4.17 trows the errors. >> >> >> Please do the bisect if the patch I've mentioned above doesn't help. > > Ok done.. bisect points to: > > b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit > commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 > Author: Christoph Hellwig > Date: Mon Mar 19 11:38:19 2018 +0100 > >iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() > >This cleans up the code a lot by removing duplicate logic. > >Tested-by: Tom Lendacky >Tested-by: Joerg Roedel >Signed-off-by: Christoph Hellwig >Reviewed-by: Thomas Gleixner >Acked-by: Joerg Roedel >Cc: David Woodhouse >Cc: Joerg Roedel >Cc: Jon Mason >Cc: Konrad Rzeszutek Wilk >Cc: Linus Torvalds >Cc: Muli Ben-Yehuda >Cc: Peter Zijlstra >Cc: io...@lists.linux-foundation.org >Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de >Signed-off-by: Ingo Molnar > > > I'll try to revert this once I'm home. I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 fixes that issue for me. The GPU is working fine with SME enabled. Now with working GPU :) I can also confirm performance is back to normal without doing any other workarounds. The only app still acting up a bit is Firefox , just minor frame drops, but nothing to bad. ( probably an Firefox bug too ) crhomium/chrome is fine .. even with 10 tabs open , each one playing an video on youtube no glitches at all. Desktop is also fine now, could not find anything wrong. BR ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
Am 08.06.2018 um 08:02 schrieb Christoph Hellwig: On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote: Ok done.. bisect points to: What is the failure mode you are seeing? Can't find anything in the mail unfortunately. As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled. Still need to figure out which parameters we want to use for the allocation, but I think it is only 4k or 8k. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
Hi Christoph, Am 08.06.2018 um 08:01 schrieb Christoph Hellwig: On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote: Hi Christopher, I don't see a Christopher on the Cc list.. Sorry, auto-uncorrection. I indeed meant you :) Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote: > Ok done.. bisect points to: What is the failure mode you are seeing? Can't find anything in the mail unfortunately. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote: > Hi Christopher, I don't see a Christopher on the Cc list.. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
Hi Christopher, Am 07.06.2018 um 18:24 schrieb Gabriel C: [SNIP] Ok done.. bisect points to: b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 Author: Christoph Hellwig Date: Mon Mar 19 11:38:19 2018 +0100 iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() This cleans up the code a lot by removing duplicate logic. Tested-by: Tom Lendacky Tested-by: Joerg Roedel Signed-off-by: Christoph Hellwig Reviewed-by: Thomas Gleixner Acked-by: Joerg Roedel Cc: David Woodhouse Cc: Joerg Roedel Cc: Jon Mason Cc: Konrad Rzeszutek Wilk Cc: Linus Torvalds Cc: Muli Ben-Yehuda Cc: Peter Zijlstra Cc: io...@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-8-...@lst.de Signed-off-by: Ingo Molnar I'll try to revert this once I'm home. I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 fixes that issue for me. any idea what could cause that? Basically this patch breaks radeon when SME is enabled. The GPU is working fine with SME enabled. Now with working GPU :) I can also confirm performance is back to normal without doing any other workarounds. The only app still acting up a bit is Firefox , just minor frame drops, but nothing to bad. ( probably an Firefox bug too ) crhomium/chrome is fine .. even with 10 tabs open , each one playing an video on youtube no glitches at all. Desktop is also fine now, could not find anything wrong. Thanks for testing, Christian. BR ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
Am 06.06.2018 um 17:44 schrieb Gabriel C: 2018-06-06 17:03 GMT+02:00 Michel Dänzer : On 2018-06-06 04:44 PM, Christian König wrote: Am 06.06.2018 um 16:12 schrieb Michel Dänzer: [SNIP] At least in theory it should work when we use the coherent DMA allocator. When that really worked before, so the most likely commit which broke this is: commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou Date: Fri Feb 9 10:44:09 2018 +0800 drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later Signed-off-by: Chunming Zhou Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher Currently looking into how we could somehow improve this detection. I guess this could fit for Gabriel, but e.g. https://bugs.freedesktop.org/104437 says amdgpu was already broken with SME in 4.15, if not 4.14 (I suspect there was simply no SME support earlier). And what I totally missed is that Gabriel is using radeon and not amdgpu. So Gabriel you need to revert this one for testing: commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f Author: Chunming Zhou Date: Fri Feb 9 10:44:10 2018 +0800 drm/radeon: only enable swiotlb path when need v2 swiotlb expands our card accessing range, but its path always is slower than ttm pool allocation. So add condition to use it. v2: move a bit later Signed-off-by: Chunming Zhou Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.z...@amd.com I got strange performance issue with 4.15 and 4.16 .. but SME was ON on that setup ( even before it hit mainline ) and never broke the GPU like this. Well that is very interesting, you are the first one who reports that SME + GFX works in some way. So far we only got negative reports for that. There is a 4.16.13 boot dmesg which has no such issue: http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt With the setup as is booting 4.16.x works , while 4.17 trows the errors. Please do the bisect if the patch I've mentioned above doesn't help. Thanks, Christian. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
2018-06-06 17:03 GMT+02:00 Michel Dänzer : > On 2018-06-06 04:44 PM, Christian König wrote: >> Am 06.06.2018 um 16:12 schrieb Michel Dänzer: >>> On 2018-06-06 03:33 PM, Gabriel C wrote: 2018-06-06 14:19 GMT+02:00 Christian König : > Am 06.06.2018 um 14:08 schrieb Gabriel C: >> 2018-06-06 13:33 GMT+02:00 Christian König : >>> Am 06.06.2018 um 13:28 schrieb Gabriel C: >>> >> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt >> >> >> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt >> >> >> Also nothing else changed in that setup just testing kernel 4.17. > > > That has nothing TODO with the driver nor the original bug you > reported. The > problem is that SME is active and that is currently not supported at > all > with a that hardware. Ok .. so are we playing now kernel an AMD Hardware roulette on each release ? SME was like this in kernel 4.16.x here and all worked. >>> >>> If that is true, again please bisect which commit broke it. >>> >>> All the reports I've seen before this indicated that at least amdgpu >>> has never worked with SME (which BTW doesn't mean it's never going to >>> work or that we don't want to support it, just that as far as we know >>> it's currently not working). >> >> At least in theory it should work when we use the coherent DMA allocator. >> >> When that really worked before, so the most likely commit which broke >> this is: >> >> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f >> Author: Chunming Zhou >> Date: Fri Feb 9 10:44:09 2018 +0800 >> >> drm/amdgpu: only enable swiotlb alloc when need v2 >> >> get the max io mapping address of system memory to see if it is over >> our card accessing range. >> v2: move checking later >> >> Signed-off-by: Chunming Zhou >> Reviewed-by: Monk Liu >> Reviewed-by: Christian König >> Signed-off-by: Alex Deucher >> >> Currently looking into how we could somehow improve this detection. > > I guess this could fit for Gabriel, but e.g. > https://bugs.freedesktop.org/104437 says amdgpu was already broken with > SME in 4.15, if not 4.14 (I suspect there was simply no SME support > earlier). I got strange performance issue with 4.15 and 4.16 .. but SME was ON on that setup ( even before it hit mainline ) and never broke the GPU like this. There is a 4.16.13 boot dmesg which has no such issue: http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt With the setup as is booting 4.16.x works , while 4.17 trows the errors. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
2018-06-06 14:19 GMT+02:00 Christian König : > Am 06.06.2018 um 14:08 schrieb Gabriel C: >> >> 2018-06-06 13:33 GMT+02:00 Christian König : >>> >>> Am 06.06.2018 um 13:28 schrieb Gabriel C: 2018-04-11 7:02 GMT+02:00 Gabriel C : >> >> 2018-04-11 6:00 GMT+02:00 Gabriel C : >> 2018-04-09 11:42 GMT+02:00 Christian König >> : >>> >>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin: > > ... >> >> I can help testing code for 4.17/++ if you wish but that is >> *different* >> storry. >> > Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver > are broken now in this one. > > radeon tells: > > ... > > [6.337838] [drm] PCIE GART of 2048M enabled (table at > 0x001D6000). > [6.338210] radeon :21:00.0: (-12) create WB bo failed > [6.338214] radeon :21:00.0: disabling GPU acceleration > > ... > I have the same Issue now on final 4.17. >>> >>> >>> Actually Michel came up with a fix for the performance regression which >>> is >>> now backported to older kernels as well. >>> >>> So the original issue of this mail thread should be fixed by now. >> >> Ok , will test as soon I get the GPU to work :)) >> Also I played with BIOS options also which does not fix anything but changes the error message. IOMMU && SR-IOV disabled the error changes to this : [7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD) [7.092059] radeon :21:00.0: disabling GPU acceleration While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to kill the GPU with no way for me to make it work ( at least I could not find any workaround by now ) >>> >>> >>> That actually sounds like something completely different. Can you provide >>> a >>> full dmesg of radeon and/or amdgpu? >> >> Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS : >> >> >> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt >> >> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt >> >> Also nothing else changed in that setup just testing kernel 4.17. > > > That has nothing TODO with the driver nor the original bug you reported. The > problem is that SME is active and that is currently not supported at all > with a that hardware. Ok .. so are we playing now kernel an AMD Hardware roulette on each release ? SME was like this in kernel 4.16.x here and all worked. Also if you don't support SME at all now on that Hardware while worked before please add proper error handling and proper dmesg messages letting the user know. radeon: : SME not supported on that Hardware anymore , please disable SME... radeon: : Update your GPU < or whatever > How hard would be that ? No one but developers , can guess from these error messges why his hardware suddenly isn't working anymore by just updating the kernel. > > Try to disable SME either in the BIOS or on the kernel command line. Yes that works but is not the point. Really you just can't break users setups like this. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
2018-06-06 16:44 GMT+02:00 Christian König : > Am 06.06.2018 um 16:12 schrieb Michel Dänzer: >> >> On 2018-06-06 03:33 PM, Gabriel C wrote: >>> >>> 2018-06-06 14:19 GMT+02:00 Christian König : Am 06.06.2018 um 14:08 schrieb Gabriel C: > > 2018-06-06 13:33 GMT+02:00 Christian König : >> >> Am 06.06.2018 um 13:28 schrieb Gabriel C: >>> >>> 2018-04-11 7:02 GMT+02:00 Gabriel C : [6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000). [6.338210] radeon :21:00.0: (-12) create WB bo failed [6.338214] radeon :21:00.0: disabling GPU acceleration ... >>> I have the same Issue now on final 4.17. >> >> >> Please file a bug report, and ideally bisect which commit(s) introduced >> the issue(s). >> >> > > http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt > > > http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt > > Also nothing else changed in that setup just testing kernel 4.17. That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware. >>> >>> >>> Ok .. so are we playing now kernel an AMD Hardware roulette on each >>> release ? >>> >>> SME was like this in kernel 4.16.x here and all worked. >> >> >> If that is true, again please bisect which commit broke it. >> >> All the reports I've seen before this indicated that at least amdgpu has >> never worked with SME (which BTW doesn't mean it's never going to work or >> that we don't want to support it, just that as far as we know it's currently >> not working). > > > At least in theory it should work when we use the coherent DMA allocator. > > When that really worked before, so the most likely commit which broke this > is: > > commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f > Author: Chunming Zhou > Date: Fri Feb 9 10:44:09 2018 +0800 > > drm/amdgpu: only enable swiotlb alloc when need v2 > > get the max io mapping address of system memory to see if it is over > our card accessing range. > v2: move checking later > > Signed-off-by: Chunming Zhou > Reviewed-by: Monk Liu > Reviewed-by: Christian König > Signed-off-by: Alex Deucher > > Currently looking into how we could somehow improve this detection. Is not this one , I've build an kernel with this reverted. I'll do an bisect tonight or tomorrow. > > Regards, > Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
On 2018-06-06 04:44 PM, Christian König wrote: > Am 06.06.2018 um 16:12 schrieb Michel Dänzer: >> On 2018-06-06 03:33 PM, Gabriel C wrote: >>> 2018-06-06 14:19 GMT+02:00 Christian König : Am 06.06.2018 um 14:08 schrieb Gabriel C: > 2018-06-06 13:33 GMT+02:00 Christian König : >> Am 06.06.2018 um 13:28 schrieb Gabriel C: >> > http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt > > > http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt > > > Also nothing else changed in that setup just testing kernel 4.17. That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware. >>> >>> Ok .. so are we playing now kernel an AMD Hardware roulette on each >>> release ? >>> >>> SME was like this in kernel 4.16.x here and all worked. >> >> If that is true, again please bisect which commit broke it. >> >> All the reports I've seen before this indicated that at least amdgpu >> has never worked with SME (which BTW doesn't mean it's never going to >> work or that we don't want to support it, just that as far as we know >> it's currently not working). > > At least in theory it should work when we use the coherent DMA allocator. > > When that really worked before, so the most likely commit which broke > this is: > > commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f > Author: Chunming Zhou > Date: Fri Feb 9 10:44:09 2018 +0800 > > drm/amdgpu: only enable swiotlb alloc when need v2 > > get the max io mapping address of system memory to see if it is over > our card accessing range. > v2: move checking later > > Signed-off-by: Chunming Zhou > Reviewed-by: Monk Liu > Reviewed-by: Christian König > Signed-off-by: Alex Deucher > > Currently looking into how we could somehow improve this detection. I guess this could fit for Gabriel, but e.g. https://bugs.freedesktop.org/104437 says amdgpu was already broken with SME in 4.15, if not 4.14 (I suspect there was simply no SME support earlier). -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
Am 06.06.2018 um 16:12 schrieb Michel Dänzer: On 2018-06-06 03:33 PM, Gabriel C wrote: 2018-06-06 14:19 GMT+02:00 Christian König : Am 06.06.2018 um 14:08 schrieb Gabriel C: 2018-06-06 13:33 GMT+02:00 Christian König : Am 06.06.2018 um 13:28 schrieb Gabriel C: 2018-04-11 7:02 GMT+02:00 Gabriel C : [ 6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000). [ 6.338210] radeon :21:00.0: (-12) create WB bo failed [ 6.338214] radeon :21:00.0: disabling GPU acceleration ... I have the same Issue now on final 4.17. Please file a bug report, and ideally bisect which commit(s) introduced the issue(s). http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt Also nothing else changed in that setup just testing kernel 4.17. That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware. Ok .. so are we playing now kernel an AMD Hardware roulette on each release ? SME was like this in kernel 4.16.x here and all worked. If that is true, again please bisect which commit broke it. All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working). At least in theory it should work when we use the coherent DMA allocator. When that really worked before, so the most likely commit which broke this is: commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f Author: Chunming Zhou Date: Fri Feb 9 10:44:09 2018 +0800 drm/amdgpu: only enable swiotlb alloc when need v2 get the max io mapping address of system memory to see if it is over our card accessing range. v2: move checking later Signed-off-by: Chunming Zhou Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher Currently looking into how we could somehow improve this detection. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
Am 06.06.2018 um 15:33 schrieb Gabriel C: 2018-06-06 14:19 GMT+02:00 Christian König : Am 06.06.2018 um 14:08 schrieb Gabriel C: [SNIP] That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware. Ok .. so are we playing now kernel an AMD Hardware roulette on each release ? SME was like this in kernel 4.16.x here and all worked. Also if you don't support SME at all now on that Hardware while worked before please add proper error handling and proper dmesg messages letting the user know. radeon: : SME not supported on that Hardware anymore , please disable SME... radeon: : Update your GPU < or whatever > How hard would be that ? Yes, to be precise that isn't the job of the GFX driver to care about such things. It is a well known and documented limitation of SME that it is in general mostly incompatible with GFX (or compute) hardware, and it actually doesn't matter which hardware or driver you use. In other words what happens is that as soon as you use GFX (or compute) SME gets disabled transparently. The problem is that this happens only on the DMA slow path we just disabled because of the performance problems. Going to propose to revert that or at least only use it when SME is disabled. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
On 2018-06-06 03:33 PM, Gabriel C wrote: 2018-06-06 14:19 GMT+02:00 Christian König : Am 06.06.2018 um 14:08 schrieb Gabriel C: 2018-06-06 13:33 GMT+02:00 Christian König : Am 06.06.2018 um 13:28 schrieb Gabriel C: 2018-04-11 7:02 GMT+02:00 Gabriel C : [6.337838] [drm] PCIE GART of 2048M enabled (table at 0x001D6000). [6.338210] radeon :21:00.0: (-12) create WB bo failed [6.338214] radeon :21:00.0: disabling GPU acceleration ... I have the same Issue now on final 4.17. Please file a bug report, and ideally bisect which commit(s) introduced the issue(s). http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt Also nothing else changed in that setup just testing kernel 4.17. That has nothing TODO with the driver nor the original bug you reported. The problem is that SME is active and that is currently not supported at all with a that hardware. Ok .. so are we playing now kernel an AMD Hardware roulette on each release ? SME was like this in kernel 4.16.x here and all worked. If that is true, again please bisect which commit broke it. All the reports I've seen before this indicated that at least amdgpu has never worked with SME (which BTW doesn't mean it's never going to work or that we don't want to support it, just that as far as we know it's currently not working). -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel