Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Fri, Dec 02, 2016 at 11:26:02AM +0100, Lucas Stach wrote: > Am Donnerstag, den 01.12.2016, 15:11 +0100 schrieb Michal Hocko: > > Let's also CC Marek > > > > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote: > > > On 12/01/2016 08:21 AM, Michal Hocko wrote: > > > > Forgot to CC Joonsoo. The email thread starts more or less here > > > > http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz > > > > > > > > On Thu 01-12-16 08:15:07, Michal Hocko wrote: > > > > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: > > > > > [...] > > > > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy > > > > > > > > > > Huh, do I get it right that the request was for a _single_ page? Why > > > > > do > > > > > we need CMA for that? > > > > > > Ugh, good point. I assumed that was just the PFNs that it failed to > > > migrate > > > away, but it seems that's indeed the whole requested range. Yeah sounds > > > some > > > part of the dma-cma chain could be smarter and attempt CMA only for e.g. > > > costly orders. > > > > Is there any reason why the DMA api doesn't try the page allocator first > > before falling back to the CMA? I simply have a hard time to see why the > > CMA should be used (and fragment) for small requests size. > > On x86 that is true, but on ARM CMA is the only (low memory) region that > can change the memory attributes, by being excluded from the lowmem > section mapping. Changing the memory attributes to > uncached/writecombined for DMA is crucial on ARM to fulfill the > requirement that no there aren't any conflicting mappings of the same > physical page. > > On ARM we can possibly do the optimization of asking the page allocator, > but only if we can request _only_ highmem pages. > So this memory allocation strategy should only apply to ARM and not x86 we already had fall out couple year ago when Ubuntu decided to enable CMA on x86 where it does not make sense as i don't think we have any single device we care that is not behind an IOMMU and thus does not require contiguous memory allocation. The DMA API should only use CMA on architecture where it is necessary not on all of them. Cheers, Jérôme
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
Am Donnerstag, den 01.12.2016, 15:11 +0100 schrieb Michal Hocko: > Let's also CC Marek > > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote: > > On 12/01/2016 08:21 AM, Michal Hocko wrote: > > > Forgot to CC Joonsoo. The email thread starts more or less here > > > http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz > > > > > > On Thu 01-12-16 08:15:07, Michal Hocko wrote: > > > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: > > > > [...] > > > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy > > > > > > > > Huh, do I get it right that the request was for a _single_ page? Why do > > > > we need CMA for that? > > > > Ugh, good point. I assumed that was just the PFNs that it failed to migrate > > away, but it seems that's indeed the whole requested range. Yeah sounds some > > part of the dma-cma chain could be smarter and attempt CMA only for e.g. > > costly orders. > > Is there any reason why the DMA api doesn't try the page allocator first > before falling back to the CMA? I simply have a hard time to see why the > CMA should be used (and fragment) for small requests size. On x86 that is true, but on ARM CMA is the only (low memory) region that can change the memory attributes, by being excluded from the lowmem section mapping. Changing the memory attributes to uncached/writecombined for DMA is crucial on ARM to fulfill the requirement that no there aren't any conflicting mappings of the same physical page. On ARM we can possibly do the optimization of asking the page allocator, but only if we can request _only_ highmem pages. Regards, Lucas
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On 12/01/2016 10:02 PM, Michal Nazarewicz wrote: On Thu, Dec 01 2016, Michal Hocko wrote: I am not familiar with this code so I cannot really argue but a quick look at rmem_cma_setup doesn't suggest any speicific placing or anything... early_cma parses ‘cma’ command line argument which can specify where exactly the default CMA area is to be located. Furthermore, CMA areas can be assigned per-device (via the Device Tree IIRC). OK, but the context of this bug report is a generic cma pool and generic dma alloc, which tries cma first and then fallback to alloc_pages_node(). If a device really requires specific placing as you suggest, then it probably uses a different allocation interface, otherwise there would be some flag to disallow the alloc_pages_node() fallback?
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Thu, Dec 01 2016, Michal Hocko wrote: > I am not familiar with this code so I cannot really argue but a quick > look at rmem_cma_setup doesn't suggest any speicific placing or > anything... early_cma parses ‘cma’ command line argument which can specify where exactly the default CMA area is to be located. Furthermore, CMA areas can be assigned per-device (via the Device Tree IIRC). -- Best regards ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ «If at first you don’t succeed, give up skydiving»
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Thu 01-12-16 17:03:52, Michal Nazarewicz wrote: > On Thu, Dec 01 2016, Michal Hocko wrote: > > Let's also CC Marek > > > > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote: > >> On 12/01/2016 08:21 AM, Michal Hocko wrote: > >> > Forgot to CC Joonsoo. The email thread starts more or less here > >> > http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz > >> > > >> > On Thu 01-12-16 08:15:07, Michal Hocko wrote: > >> > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: > >> > > [...] > >> > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy > >> > > > >> > > Huh, do I get it right that the request was for a _single_ page? Why do > >> > > we need CMA for that? > >> > >> Ugh, good point. I assumed that was just the PFNs that it failed to migrate > >> away, but it seems that's indeed the whole requested range. Yeah sounds > >> some > >> part of the dma-cma chain could be smarter and attempt CMA only for e.g. > >> costly orders. > > > > Is there any reason why the DMA api doesn't try the page allocator first > > before falling back to the CMA? I simply have a hard time to see why the > > CMA should be used (and fragment) for small requests size. > > There actually may be reasons to always go with CMA even if small > regions are requested. CMA areas may be defined to map to particular > physical addresses and given device may require allocations from those > addresses. This may be more than just a matter of DMA address space. > I cannot give you specific examples though and I might be talking > nonsense. I am not familiar with this code so I cannot really argue but a quick look at rmem_cma_setup doesn't suggest any speicific placing or anything... -- Michal Hocko SUSE Labs
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Thu, Dec 01 2016, Michal Hocko wrote: > Let's also CC Marek > > On Thu 01-12-16 08:43:40, Vlastimil Babka wrote: >> On 12/01/2016 08:21 AM, Michal Hocko wrote: >> > Forgot to CC Joonsoo. The email thread starts more or less here >> > http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz >> > >> > On Thu 01-12-16 08:15:07, Michal Hocko wrote: >> > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: >> > > [...] >> > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy >> > > >> > > Huh, do I get it right that the request was for a _single_ page? Why do >> > > we need CMA for that? >> >> Ugh, good point. I assumed that was just the PFNs that it failed to migrate >> away, but it seems that's indeed the whole requested range. Yeah sounds some >> part of the dma-cma chain could be smarter and attempt CMA only for e.g. >> costly orders. > > Is there any reason why the DMA api doesn't try the page allocator first > before falling back to the CMA? I simply have a hard time to see why the > CMA should be used (and fragment) for small requests size. There actually may be reasons to always go with CMA even if small regions are requested. CMA areas may be defined to map to particular physical addresses and given device may require allocations from those addresses. This may be more than just a matter of DMA address space. I cannot give you specific examples though and I might be talking nonsense. > -- > Michal Hocko > SUSE Labs -- Best regards ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ «If at first you don’t succeed, give up skydiving»
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
Let's also CC Marek On Thu 01-12-16 08:43:40, Vlastimil Babka wrote: > On 12/01/2016 08:21 AM, Michal Hocko wrote: > > Forgot to CC Joonsoo. The email thread starts more or less here > > http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz > > > > On Thu 01-12-16 08:15:07, Michal Hocko wrote: > > > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: > > > [...] > > > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy > > > > > > Huh, do I get it right that the request was for a _single_ page? Why do > > > we need CMA for that? > > Ugh, good point. I assumed that was just the PFNs that it failed to migrate > away, but it seems that's indeed the whole requested range. Yeah sounds some > part of the dma-cma chain could be smarter and attempt CMA only for e.g. > costly orders. Is there any reason why the DMA api doesn't try the page allocator first before falling back to the CMA? I simply have a hard time to see why the CMA should be used (and fragment) for small requests size. -- Michal Hocko SUSE Labs
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Thu, Dec 01, 2016 at 08:38:15AM +0100, Vlastimil Babka wrote: > >> By default config this should not be used on x86. > > What do you mean by that statement? > > I mean that the 16 mbytes for generic CMA area is not a default on x86: > > config CMA_SIZE_MBYTES > int "Size in Mega Bytes" > depends on !CMA_SIZE_SEL_PERCENTAGE > default 0 if X86 > default 16 d7be003a9d275299f5ee36bbdf156654f59e08e9 (v3.18-2122-gd7be003a9d27) is there the 0MB if-x86 default was added to the tree. Prior to that, it was 16MiB, and that's where my system picked up the value from. I have a record of all my kconfigs, because I use oldconfig each time (going back 8 years to 2.6.27) # Added in 3.12.0-1-g5f258d0 CONFIG_CMA=y # Added in 3.16.0-rc6-00042-g67dd8f3 CONFIG_CMA_ALIGNMENT=8 CONFIG_CMA_AREAS=7 CONFIG_CMA_SIZE_MBYTES=16 CONFIG_CMA_SIZE_SEL_MBYTES=y CONFIG_DMA_CMA=y So the next question, is why did I pick up CMA in 3.16.0-rc6-00042-g67dd8f3... I'll poke at that. > > Yes, I'd say if there's a fallback without much penalty, nowarn makes > > sense. If the fallback just tries multiple addresses until success, then > > the warning should only be issued when too many attempts have been made. > On the other hand, if the warnings are correlated with high kernel CPU usage, > it's arguably better to be warned. Keep the rate-limit on the warning for cases like this? > >> > The rate of the problem starts slow, and also is relatively low on an > >> > idle > >> > system (my screens blank at night, no xscreensaver running), but it > >> > still ramps > >> > up over time (to the point of generating 2.5GB/hour of "(timestamp) > >> > alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses > >> > (~100 > >> > unique ranges for a day). > >> > > >> > My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors > >> > w/ 9 > >> > virtual desktops per monitor). > >> So IIUC, except the messages, everything actually works fine? > > There's high kernel CPU usage that seems to roughly correlate with the > > messages, but I can't yet tell if that's due to the syslog itself, or > > repeated alloc_contig_range requests. > You could try running perf top. Will do in the morning. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On 12/01/2016 08:21 AM, Michal Hocko wrote: Forgot to CC Joonsoo. The email thread starts more or less here http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz On Thu 01-12-16 08:15:07, Michal Hocko wrote: On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: [...] > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy Huh, do I get it right that the request was for a _single_ page? Why do we need CMA for that? Ugh, good point. I assumed that was just the PFNs that it failed to migrate away, but it seems that's indeed the whole requested range. Yeah sounds some part of the dma-cma chain could be smarter and attempt CMA only for e.g. costly orders.
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On 12/01/2016 07:21 AM, Robin H. Johnson wrote: On Wed, Nov 30, 2016 at 10:24:59PM +0100, Vlastimil Babka wrote: [add more CC's] On 11/30/2016 09:19 PM, Robin H. Johnson wrote: > Somewhere in the Radeon/DRM codebase, CMA page allocation has either > regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is > doing something different with pages. Could be that it didn't use dma_generic_alloc_coherent() before, or you didn't have the generic CMA pool configured. v4.9-rc7-23-gded6e842cf49: [0.00] cma: Reserved 16 MiB at 0x00083e40 [0.00] Memory: 32883108K/33519432K available (6752K kernel code, 1244K rwdata, 4716K rodata, 1772K init, 2720K bss, 619940K reserved, 16384K cma-reserved) What's the output of "grep CMA" on your .config? # grep CMA .config |grep -v -e SECMARK= -e CONFIG_BCMA -e CONFIG_USB_HCD_BCMA -e INPUT_CMA3000 -e CRYPTO_CMAC CONFIG_CMA=y # CONFIG_CMA_DEBUG is not set # CONFIG_CMA_DEBUGFS is not set CONFIG_CMA_AREAS=7 CONFIG_DMA_CMA=y CONFIG_CMA_SIZE_MBYTES=16 CONFIG_CMA_SIZE_SEL_MBYTES=y # CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set # CONFIG_CMA_SIZE_SEL_MIN is not set # CONFIG_CMA_SIZE_SEL_MAX is not set CONFIG_CMA_ALIGNMENT=8 Or any kernel boot options with cma in name? None. By default config this should not be used on x86. What do you mean by that statement? I mean that the 16 mbytes for generic CMA area is not a default on x86: config CMA_SIZE_MBYTES int "Size in Mega Bytes" depends on !CMA_SIZE_SEL_PERCENTAGE default 0 if X86 default 16 Which explains why it's rare to see these reports in the context such as yours. I'd recommend just disabling it, as the primary use case for CMA are devices on mobile phones that don't have any other fallback (unlike the dma alloc). It should be disallowed to enable CONFIG_CMA? Radeon and CMA should be mutually exclusive? I don't think this is a specific problem of radeon. But looks like it's a heavy user of the dma alloc. There might be others. > Given that I haven't seen ANY other reports of this, I'm inclined to > believe the problem is drm/radeon specific (if I don't start X, I can't > reproduce the problem). It's rather CMA specific, the allocation attemps just can't be 100% reliable due to how CMA works. The question is if it should be spewing in the log in the context of dma-cma, which has a fallback allocation option. It even uses __GFP_NOWARN, perhaps the CMA path should respect that? Yes, I'd say if there's a fallback without much penalty, nowarn makes sense. If the fallback just tries multiple addresses until success, then the warning should only be issued when too many attempts have been made. On the other hand, if the warnings are correlated with high kernel CPU usage, it's arguably better to be warned. > The rate of the problem starts slow, and also is relatively low on an idle > system (my screens blank at night, no xscreensaver running), but it still ramps > up over time (to the point of generating 2.5GB/hour of "(timestamp) > alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100 > unique ranges for a day). > > My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9 > virtual desktops per monitor). So IIUC, except the messages, everything actually works fine? There's high kernel CPU usage that seems to roughly correlate with the messages, but I can't yet tell if that's due to the syslog itself, or repeated alloc_contig_range requests. You could try running perf top.
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
Forgot to CC Joonsoo. The email thread starts more or less here http://lkml.kernel.org/r/20161130092239.gd18...@dhcp22.suse.cz On Thu 01-12-16 08:15:07, Michal Hocko wrote: > On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: > [...] > > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy > > Huh, do I get it right that the request was for a _single_ page? Why do > we need CMA for that? -- Michal Hocko SUSE Labs
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Wed 30-11-16 20:19:03, Robin H. Johnson wrote: [...] > alloc_contig_range: [83f2a3, 83f2a4) PFNs busy Huh, do I get it right that the request was for a _single_ page? Why do we need CMA for that? -- Michal Hocko SUSE Labs
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
On Wed, Nov 30, 2016 at 10:24:59PM +0100, Vlastimil Babka wrote: > [add more CC's] > > On 11/30/2016 09:19 PM, Robin H. Johnson wrote: > > Somewhere in the Radeon/DRM codebase, CMA page allocation has either > > regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is > > doing something different with pages. > > Could be that it didn't use dma_generic_alloc_coherent() before, or you > didn't > have the generic CMA pool configured. v4.9-rc7-23-gded6e842cf49: [0.00] cma: Reserved 16 MiB at 0x00083e40 [0.00] Memory: 32883108K/33519432K available (6752K kernel code, 1244K rwdata, 4716K rodata, 1772K init, 2720K bss, 619940K reserved, 16384K cma-reserved) > What's the output of "grep CMA" on your > .config? # grep CMA .config |grep -v -e SECMARK= -e CONFIG_BCMA -e CONFIG_USB_HCD_BCMA -e INPUT_CMA3000 -e CRYPTO_CMAC CONFIG_CMA=y # CONFIG_CMA_DEBUG is not set # CONFIG_CMA_DEBUGFS is not set CONFIG_CMA_AREAS=7 CONFIG_DMA_CMA=y CONFIG_CMA_SIZE_MBYTES=16 CONFIG_CMA_SIZE_SEL_MBYTES=y # CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set # CONFIG_CMA_SIZE_SEL_MIN is not set # CONFIG_CMA_SIZE_SEL_MAX is not set CONFIG_CMA_ALIGNMENT=8 > Or any kernel boot options with cma in name? None. > By default config this should not be used on x86. What do you mean by that statement? It should be disallowed to enable CONFIG_CMA? Radeon and CMA should be mutually exclusive? > > Given that I haven't seen ANY other reports of this, I'm inclined to > > believe the problem is drm/radeon specific (if I don't start X, I can't > > reproduce the problem). > > It's rather CMA specific, the allocation attemps just can't be 100% reliable > due > to how CMA works. The question is if it should be spewing in the log in the > context of dma-cma, which has a fallback allocation option. It even uses > __GFP_NOWARN, perhaps the CMA path should respect that? Yes, I'd say if there's a fallback without much penalty, nowarn makes sense. If the fallback just tries multiple addresses until success, then the warning should only be issued when too many attempts have been made. > > > The rate of the problem starts slow, and also is relatively low on an idle > > system (my screens blank at night, no xscreensaver running), but it still > > ramps > > up over time (to the point of generating 2.5GB/hour of "(timestamp) > > alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses > > (~100 > > unique ranges for a day). > > > > My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ > > 9 > > virtual desktops per monitor). > So IIUC, except the messages, everything actually works fine? There's high kernel CPU usage that seems to roughly correlate with the messages, but I can't yet tell if that's due to the syslog itself, or repeated alloc_contig_range requests. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: Digital signature
Re: drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
[add more CC's] On 11/30/2016 09:19 PM, Robin H. Johnson wrote: Somewhere in the Radeon/DRM codebase, CMA page allocation has either regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is doing something different with pages. Could be that it didn't use dma_generic_alloc_coherent() before, or you didn't have the generic CMA pool configured. What's the output of "grep CMA" on your .config? Or any kernel boot options with cma in name? By default config this should not be used on x86. Given that I haven't seen ANY other reports of this, I'm inclined to believe the problem is drm/radeon specific (if I don't start X, I can't reproduce the problem). It's rather CMA specific, the allocation attemps just can't be 100% reliable due to how CMA works. The question is if it should be spewing in the log in the context of dma-cma, which has a fallback allocation option. It even uses __GFP_NOWARN, perhaps the CMA path should respect that? The rate of the problem starts slow, and also is relatively low on an idle system (my screens blank at night, no xscreensaver running), but it still ramps up over time (to the point of generating 2.5GB/hour of "(timestamp) alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100 unique ranges for a day). My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9 virtual desktops per monitor). So IIUC, except the messages, everything actually works fine? I added a stack trace & rate limit to alloc_contig_range's PFNs busy message (patch in previous email on LKML/-MM lists); and they point to radeon. alloc_contig_range: [83f2a3, 83f2a4) PFNs busy CPU: 3 PID: 8518 Comm: X Not tainted 4.9.0-rc7-00024-g6ad4037e18ec #27 Hardware name: System manufacturer System Product Name/P8Z68 DELUXE, BIOS 0501 05/09/2011 ad50c3d7f730 b236c873 0083f2a3 0083f2a4 ad50c3d7f810 b2183b38 999dff4d8040 20fca8c0 0083f400 0083f000 0083f2a3 0004 Call Trace: [] dump_stack+0x85/0xc2 [] alloc_contig_range+0x368/0x370 [] cma_alloc+0x127/0x2e0 [] dma_alloc_from_contiguous+0x38/0x40 [] dma_generic_alloc_coherent+0x91/0x1d0 [] x86_swiotlb_alloc_coherent+0x25/0x50 [] ttm_dma_populate+0x48a/0x9a0 [ttm] [] ? __kmalloc+0x1b6/0x250 [] radeon_ttm_tt_populate+0x22a/0x2d0 [radeon] [] ? ttm_dma_tt_init+0x67/0xc0 [ttm] [] ttm_tt_bind+0x37/0x70 [ttm] [] ttm_bo_handle_move_mem+0x528/0x5a0 [ttm] [] ? shmem_alloc_inode+0x1a/0x30 [] ttm_bo_validate+0x114/0x130 [ttm] [] ? _raw_write_unlock+0xe/0x10 [] ttm_bo_init+0x31d/0x3f0 [ttm] [] radeon_bo_create+0x19b/0x260 [radeon] [] ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon] [] radeon_gem_object_create+0xad/0x180 [radeon] [] radeon_gem_create_ioctl+0x5f/0xf0 [radeon] [] drm_ioctl+0x21b/0x4d0 [drm] [] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon] [] radeon_drm_ioctl+0x4c/0x80 [radeon] [] do_vfs_ioctl+0x92/0x5c0 [] SyS_ioctl+0x79/0x90 [] do_syscall_64+0x73/0x190 [] entry_SYSCALL64_slow_path+0x25/0x25 The Radeon card in my case is a VisionTek HD 7750 Eyefinity 6, which is reported as: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] (prog-if 00 [VGA controller]) Subsystem: VISIONTEK Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at c000 (64-bit, prefetchable) [size=256M] Memory at fbe0 (64-bit, non-prefetchable) [size=256K] I/O ports at e000 [size=256] Expansion ROM at 000c [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Kernel driver in use: radeon Kernel modules: radeon, amdgpu
drm/radeon spamming alloc_contig_range: [xxx, yyy) PFNs busy busy
Somewhere in the Radeon/DRM codebase, CMA page allocation has either regressed in the timeline of 4.5->4.9, and/or the drm/radeon code is doing something different with pages. Given that I haven't seen ANY other reports of this, I'm inclined to believe the problem is drm/radeon specific (if I don't start X, I can't reproduce the problem). The rate of the problem starts slow, and also is relatively low on an idle system (my screens blank at night, no xscreensaver running), but it still ramps up over time (to the point of generating 2.5GB/hour of "(timestamp) alloc_contig_range: [83e4d9, 83e4da) PFNs busy"), with various addresses (~100 unique ranges for a day). My X workload is ~50 chrome tabs and ~20 terminals (over 3x 24" monitors w/ 9 virtual desktops per monitor). I added a stack trace & rate limit to alloc_contig_range's PFNs busy message (patch in previous email on LKML/-MM lists); and they point to radeon. alloc_contig_range: [83f2a3, 83f2a4) PFNs busy CPU: 3 PID: 8518 Comm: X Not tainted 4.9.0-rc7-00024-g6ad4037e18ec #27 Hardware name: System manufacturer System Product Name/P8Z68 DELUXE, BIOS 0501 05/09/2011 ad50c3d7f730 b236c873 0083f2a3 0083f2a4 ad50c3d7f810 b2183b38 999dff4d8040 20fca8c0 0083f400 0083f000 0083f2a3 0004 Call Trace: [] dump_stack+0x85/0xc2 [] alloc_contig_range+0x368/0x370 [] cma_alloc+0x127/0x2e0 [] dma_alloc_from_contiguous+0x38/0x40 [] dma_generic_alloc_coherent+0x91/0x1d0 [] x86_swiotlb_alloc_coherent+0x25/0x50 [] ttm_dma_populate+0x48a/0x9a0 [ttm] [] ? __kmalloc+0x1b6/0x250 [] radeon_ttm_tt_populate+0x22a/0x2d0 [radeon] [] ? ttm_dma_tt_init+0x67/0xc0 [ttm] [] ttm_tt_bind+0x37/0x70 [ttm] [] ttm_bo_handle_move_mem+0x528/0x5a0 [ttm] [] ? shmem_alloc_inode+0x1a/0x30 [] ttm_bo_validate+0x114/0x130 [ttm] [] ? _raw_write_unlock+0xe/0x10 [] ttm_bo_init+0x31d/0x3f0 [ttm] [] radeon_bo_create+0x19b/0x260 [radeon] [] ? radeon_update_memory_usage.isra.0+0x50/0x50 [radeon] [] radeon_gem_object_create+0xad/0x180 [radeon] [] radeon_gem_create_ioctl+0x5f/0xf0 [radeon] [] drm_ioctl+0x21b/0x4d0 [drm] [] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon] [] radeon_drm_ioctl+0x4c/0x80 [radeon] [] do_vfs_ioctl+0x92/0x5c0 [] SyS_ioctl+0x79/0x90 [] do_syscall_64+0x73/0x190 [] entry_SYSCALL64_slow_path+0x25/0x25 The Radeon card in my case is a VisionTek HD 7750 Eyefinity 6, which is reported as: 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] (prog-if 00 [VGA controller]) Subsystem: VISIONTEK Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at c000 (64-bit, prefetchable) [size=256M] Memory at fbe0 (64-bit, non-prefetchable) [size=256K] I/O ports at e000 [size=256] Expansion ROM at 000c [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 Capabilities: [150] Advanced Error Reporting Kernel driver in use: radeon Kernel modules: radeon, amdgpu -- Robin Hugh Johnson E-Mail : robb...@orbis-terrarum.net Home Page : http://www.orbis-terrarum.net/?l=people.robbat2 ICQ# : 30269588 or 41961639 GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 signature.asc Description: Digital signature