Long radeon stalls on recent kernels
On 11.12.2014 14:13, Andy Lutomirski wrote: > On Wed, Dec 10, 2014 at 8:24 PM, Michel Dänzer wrote: >> On 11.12.2014 05:28, Andy Lutomirski wrote: >>> On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer >>> wrote: On 10.12.2014 06:39, Andy Lutomirski wrote: > On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski > wrote: >> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer >> wrote: >>> On 09.12.2014 09:24, Andy Lutomirski wrote: The relevant line from latencytop seems to be: 154 20441402 489139 radeon_fence_default_wait [radeon] fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] radeon_bo_fault_reserve_notify [radeon] >>> >>> Which process is this? >> >> Xorg >> >>> >>> Looks like CPU access to a BO in VRAM, but the BO is located outside of >>> the CPU visible area of VRAM, so it has to be moved into the CPU visible >>> area first. >> >> [...] >> > But I'm still waiting for the day that buggy userspace *can't* cause > kernel graphics stalls. Actually, this looks more like buggy userspace stalling itself. :) >>> >>> I thought the stall was the kernel evicting things from vram. Why >>> does it need to wait for userspace for that? Is it that userspace is >>> actively using whatever's being evicted? >> >> As I explained above, the stall happens because userspace does CPU >> access to a BO which resides in the CPU-inaccessible part of VRAM. The >> kernel has to move the BO into the CPU accessible part of VRAM before it >> can let userspace proceed. > > Sure, but why does that take nearly 500ms? Even if the object in > question is the entire framebuffer, that still seems extraordinarily > slow. It has to wait for any previously queued GPU operations and the eviction of other buffers. Also, TTM buffer moves are currently synchronous, i.e. TTM waits for a buffer to become idle before starting its move, which means we don't get maximum throughput for a series of buffer moves. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Long radeon stalls on recent kernels
On 11.12.2014 05:28, Andy Lutomirski wrote: > On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer wrote: >> On 10.12.2014 06:39, Andy Lutomirski wrote: >>> On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski >>> wrote: On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer wrote: > On 09.12.2014 09:24, Andy Lutomirski wrote: >> >> The relevant line from latencytop seems to be: >> >> 154 20441402 489139 radeon_fence_default_wait [radeon] >> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] >> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] >> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first >> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] >> radeon_bo_fault_reserve_notify [radeon] > > Which process is this? Xorg > > Looks like CPU access to a BO in VRAM, but the BO is located outside of > the CPU visible area of VRAM, so it has to be moved into the CPU visible > area first. [...] >>> But I'm still waiting for the day that buggy userspace *can't* cause >>> kernel graphics stalls. >> >> Actually, this looks more like buggy userspace stalling itself. :) > > I thought the stall was the kernel evicting things from vram. Why > does it need to wait for userspace for that? Is it that userspace is > actively using whatever's being evicted? As I explained above, the stall happens because userspace does CPU access to a BO which resides in the CPU-inaccessible part of VRAM. The kernel has to move the BO into the CPU accessible part of VRAM before it can let userspace proceed. Current Mesa (10.4 or newer I think) sets a hint for BOs which will likely be accessed by the CPU, so recent kernels can prioritize putting those into the CPU accessible part of VRAM in the first place. Or, if you're using EXA, the problem could be in the xf86-video-ati EXA code. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Long radeon stalls on recent kernels
On Wed, Dec 10, 2014 at 8:24 PM, Michel Dänzer wrote: > On 11.12.2014 05:28, Andy Lutomirski wrote: >> On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer >> wrote: >>> On 10.12.2014 06:39, Andy Lutomirski wrote: On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski wrote: > On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer > wrote: >> On 09.12.2014 09:24, Andy Lutomirski wrote: >>> >>> The relevant line from latencytop seems to be: >>> >>> 154 20441402 489139 radeon_fence_default_wait [radeon] >>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] >>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] >>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first >>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] >>> radeon_bo_fault_reserve_notify [radeon] >> >> Which process is this? > > Xorg > >> >> Looks like CPU access to a BO in VRAM, but the BO is located outside of >> the CPU visible area of VRAM, so it has to be moved into the CPU visible >> area first. > > [...] > But I'm still waiting for the day that buggy userspace *can't* cause kernel graphics stalls. >>> >>> Actually, this looks more like buggy userspace stalling itself. :) >> >> I thought the stall was the kernel evicting things from vram. Why >> does it need to wait for userspace for that? Is it that userspace is >> actively using whatever's being evicted? > > As I explained above, the stall happens because userspace does CPU > access to a BO which resides in the CPU-inaccessible part of VRAM. The > kernel has to move the BO into the CPU accessible part of VRAM before it > can let userspace proceed. Sure, but why does that take nearly 500ms? Even if the object in question is the entire framebuffer, that still seems extraordinarily slow. --Andy > > Current Mesa (10.4 or newer I think) sets a hint for BOs which will > likely be accessed by the CPU, so recent kernels can prioritize putting > those into the CPU accessible part of VRAM in the first place. > > Or, if you're using EXA, the problem could be in the xf86-video-ati EXA > code. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On 10.12.2014 06:39, Andy Lutomirski wrote: > On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski > wrote: >> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer wrote: >>> On 09.12.2014 09:24, Andy Lutomirski wrote: The relevant line from latencytop seems to be: 154 20441402 489139 radeon_fence_default_wait [radeon] fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] radeon_bo_fault_reserve_notify [radeon] >>> >>> Which process is this? >> >> Xorg >> >>> >>> Looks like CPU access to a BO in VRAM, but the BO is located outside of >>> the CPU visible area of VRAM, so it has to be moved into the CPU visible >>> area first. >>> >>> Which version of Mesa are you using? >>> >> >> mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64 >> >> I'm planning on upgrading to Fedora 21 fairly soon. > > Upgrading to mesa-dri-drivers-10.3.3-1.20141110.fc21.x86_64 seems to > have helped enough that my usual test (open a couple of Firefox tabs > with graphics in them) doesn't hang anymore. Hmm, since that looks like the exact same upstream version, maybe it was actually upgrading something else that made the difference? > This card still isn't *fast*. I'm afraid it wasn't exactly a high-end card even when it was new. What kind of operations are slow? > Is there some way I can check that I'm actually using all 16 PCIe lanes? > In my tinkering w/ power management settings, I got some odd logs > suggesting that only one lane was in use. You can try forcing off ASPM with radeon.aspm=0, other than that I'm not sure. > But I'm still waiting for the day that buggy userspace *can't* cause > kernel graphics stalls. Actually, this looks more like buggy userspace stalling itself. :) -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Long radeon stalls on recent kernels
On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer wrote: > On 10.12.2014 06:39, Andy Lutomirski wrote: >> On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski >> wrote: >>> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer >>> wrote: On 09.12.2014 09:24, Andy Lutomirski wrote: > > The relevant line from latencytop seems to be: > > 154 20441402 489139 radeon_fence_default_wait [radeon] > fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] > radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] > ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first > [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] > radeon_bo_fault_reserve_notify [radeon] Which process is this? >>> >>> Xorg >>> Looks like CPU access to a BO in VRAM, but the BO is located outside of the CPU visible area of VRAM, so it has to be moved into the CPU visible area first. Which version of Mesa are you using? >>> >>> mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64 >>> >>> I'm planning on upgrading to Fedora 21 fairly soon. >> >> Upgrading to mesa-dri-drivers-10.3.3-1.20141110.fc21.x86_64 seems to >> have helped enough that my usual test (open a couple of Firefox tabs >> with graphics in them) doesn't hang anymore. > > Hmm, since that looks like the exact same upstream version, maybe it was > actually upgrading something else that made the difference? > Maybe mutter? > >> This card still isn't *fast*. > > I'm afraid it wasn't exactly a high-end card even when it was new. What > kind of operations are slow? Things like scrolling in Google Maps. It's not *that* bad, but older Intel IGPs still seem considerably smoother. > > >> Is there some way I can check that I'm actually using all 16 PCIe lanes? >> In my tinkering w/ power management settings, I got some odd logs >> suggesting that only one lane was in use. > > You can try forcing off ASPM with radeon.aspm=0, other than that I'm not > sure. > > >> But I'm still waiting for the day that buggy userspace *can't* cause >> kernel graphics stalls. > > Actually, this looks more like buggy userspace stalling itself. :) I thought the stall was the kernel evicting things from vram. Why does it need to wait for userspace for that? Is it that userspace is actively using whatever's being evicted? --Andy > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On Tue, Dec 9, 2014 at 4:39 PM, Andy Lutomirski wrote: > On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski > wrote: >> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer wrote: >>> On 09.12.2014 09:24, Andy Lutomirski wrote: The relevant line from latencytop seems to be: 154 20441402 489139 radeon_fence_default_wait [radeon] fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] radeon_bo_fault_reserve_notify [radeon] >>> >>> Which process is this? >> >> Xorg >> >>> >>> Looks like CPU access to a BO in VRAM, but the BO is located outside of >>> the CPU visible area of VRAM, so it has to be moved into the CPU visible >>> area first. >>> >>> Which version of Mesa are you using? >>> >> >> mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64 >> >> I'm planning on upgrading to Fedora 21 fairly soon. > > Upgrading to mesa-dri-drivers-10.3.3-1.20141110.fc21.x86_64 seems to > have helped enough that my usual test (open a couple of Firefox tabs > with graphics in them) doesn't hang anymore. > > This card still isn't *fast*. Is there some way I can check that I'm > actually using all 16 PCIe lanes? In my tinkering w/ power management > settings, I got some odd logs suggesting that only one lane was in > use. > You should be using all the lanes available. The main issue with that card is vram memory bandwidth. Those chips have a single channel memory interface and most OEMs populate them with DDR3 memory rather than GDDR5. from your log: [3.079577] [drm] RAM width 64bits DDR ... [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 Alex
Long radeon stalls on recent kernels
On 09.12.2014 09:24, Andy Lutomirski wrote: > > The relevant line from latencytop seems to be: > > 154 20441402 489139 radeon_fence_default_wait [radeon] > fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] > radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] > ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first > [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] > radeon_bo_fault_reserve_notify [radeon] Which process is this? Looks like CPU access to a BO in VRAM, but the BO is located outside of the CPU visible area of VRAM, so it has to be moved into the CPU visible area first. Which version of Mesa are you using? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Long radeon stalls on recent kernels
On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski wrote: > On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer wrote: >> On 09.12.2014 09:24, Andy Lutomirski wrote: >>> >>> The relevant line from latencytop seems to be: >>> >>> 154 20441402 489139 radeon_fence_default_wait [radeon] >>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] >>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] >>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first >>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] >>> radeon_bo_fault_reserve_notify [radeon] >> >> Which process is this? > > Xorg > >> >> Looks like CPU access to a BO in VRAM, but the BO is located outside of >> the CPU visible area of VRAM, so it has to be moved into the CPU visible >> area first. >> >> Which version of Mesa are you using? >> > > mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64 > > I'm planning on upgrading to Fedora 21 fairly soon. Upgrading to mesa-dri-drivers-10.3.3-1.20141110.fc21.x86_64 seems to have helped enough that my usual test (open a couple of Firefox tabs with graphics in them) doesn't hang anymore. This card still isn't *fast*. Is there some way I can check that I'm actually using all 16 PCIe lanes? In my tinkering w/ power management settings, I got some odd logs suggesting that only one lane was in use. Other than that, maybe everything works :) But I'm still waiting for the day that buggy userspace *can't* cause kernel graphics stalls. --Andy > > --Andy > >> >> -- >> Earthling Michel Dänzer | http://www.amd.com >> Libre software enthusiast | Mesa and X developer > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer wrote: > On 09.12.2014 09:24, Andy Lutomirski wrote: >> >> The relevant line from latencytop seems to be: >> >> 154 20441402 489139 radeon_fence_default_wait [radeon] >> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm] >> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon] >> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first >> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm] >> radeon_bo_fault_reserve_notify [radeon] > > Which process is this? Xorg > > Looks like CPU access to a BO in VRAM, but the BO is located outside of > the CPU visible area of VRAM, so it has to be moved into the CPU visible > area first. > > Which version of Mesa are you using? > mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64 I'm planning on upgrading to Fedora 21 fairly soon. --Andy > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On Wed, Nov 26, 2014 at 7:38 AM, Andy Lutomirski wrote: > On Tue, Nov 25, 2014 at 10:42 PM, Michel Dänzer > wrote: >> On 20.11.2014 09:58, Andy Lutomirski wrote: >>> >>> On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski >>> wrote: On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer wrote: > > On 19.11.2014 09:21, Andy Lutomirski wrote: >> >> >> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer >> wrote: >>> >>> >>> On 15.11.2014 07:21, Andy Lutomirski wrote: On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything graphics intensive seems to cause my system to become unusable for tens of seconds. Pointing Firefox at Google Maps is a big offender -- it can take several minutes for me to move my mouse far enough to close the tab and get my computer back. On bootup, I get this warning: [drm:btc_dpm_set_power_state] *ERROR* rv770_restrict_performance_levels_before_switch failed Setting radeon.dpm=0 seems to work around this problem at the cost of giving my rather slow graphics. Are there known issues here? >>> >>> >>> >>> >>> Can you bisect the kernel, or at least isolate which kernel version >>> first >>> introduced the problem? >> >> >> >> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, >> 3.16, and 3.18-rc4+. I haven't tried other versions. >> >> With radeon.dpm=0, I can still trigger short stalls (around one >> second), but I seem unable to trigger long stalls easily. (I say >> easily because, just as I was typing this email, my system stalled for >> about a minute.) > > > > I can only think of two things offhand that could cause such extremely > long > stalls: Swap thrashing or IRQ storms. > > With a setup where you can easily trigger long stalls, can you try > getting a > CPU profile for a stall with sysprof or perf? > > Got one with perf: 16.82% Xorg libc-2.18.so[.] __memcpy_sse2_unaligned 9.20% swapper [kernel.kallsyms] [k] intel_idle 1.00% Xorg [kernel.kallsyms] [k] evergreen_irq_set 0.83% firefox libxul.so [.] 0x01d93281 0.69% firefox libxul.so [.] 0x01d932ad 0.62% firefox [kernel.kallsyms] [k] copy_user_generic_string 0.55% swapper [kernel.kallsyms] [k] evergreen_irq_ack 0.54% firefox libpthread-2.18.so [.] pthread_mutex_lock 0.52% firefox libpthread-2.18.so [.] pthread_mutex_unlock 0.45% Xorg [kernel.kallsyms] [k] drm_mm_insert_node_in_range_generic 0.41% Xorg [kernel.kallsyms] [k] lock_release 0.40% Xorg [kernel.kallsyms] [k] lock_acquire 0.35% firefox firefox [.] 0x0001245d 0.33% Xorg [kernel.kallsyms] [k] __module_address 0.31% firefox [kernel.kallsyms] [k] clear_page_c 0.29% Xorg [kernel.kallsyms] [k] copy_user_generic_string 0.28% firefox firefox [.] 0x00013159 and: Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802 87.43% swapper [kernel.kallsyms] [k] handle_irq_event_percpu 7.52% firefox [kernel.kallsyms] [k] handle_irq_event_percpu 1.84% irq/36-ahci [kernel.kallsyms] [k] handle_irq_event_percpu 1.14% Xorg [kernel.kallsyms] [k] handle_irq_event_percpu 0.75% kworker/5:0 [kernel.kallsyms] [k] handle_irq_event_percpu 0.32% gnome-shell [kernel.kallsyms] [k] handle_irq_event_percpu 0.25% kworker/5:1H [kernel.kallsyms] [k] handle_irq_event_percpu 0.25% Media D~ode #10 [kernel.kallsyms] [k] handle_irq_event_percpu 0.19% ImageDe~er #330 [kernel.kallsyms] [k] handle_irq_event_percpu 0.07% pulseaudio [kernel.kallsyms] [k] handle_irq_event_percpu The cycles were with -e cycles:pp, so I think that iret would have shown up if there were enough IRQs to cause the problem. I'll build a kernel with latencytop. >>> >>> I just caught
Long radeon stalls on recent kernels
On 20.11.2014 09:58, Andy Lutomirski wrote: > On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski > wrote: >> On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer >> wrote: >>> On 19.11.2014 09:21, Andy Lutomirski wrote: On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer wrote: > > On 15.11.2014 07:21, Andy Lutomirski wrote: >> >> >> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything >> graphics intensive seems to cause my system to become unusable for >> tens of seconds. Pointing Firefox at Google Maps is a big offender -- >> it can take several minutes for me to move my mouse far enough to >> close the tab and get my computer back. >> >> On bootup, I get this warning: >> [drm:btc_dpm_set_power_state] *ERROR* >> rv770_restrict_performance_levels_before_switch failed >> >> Setting radeon.dpm=0 seems to work around this problem at the cost of >> giving my rather slow graphics. >> >> Are there known issues here? > > > > Can you bisect the kernel, or at least isolate which kernel version first > introduced the problem? With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, 3.16, and 3.18-rc4+. I haven't tried other versions. With radeon.dpm=0, I can still trigger short stalls (around one second), but I seem unable to trigger long stalls easily. (I say easily because, just as I was typing this email, my system stalled for about a minute.) >>> >>> >>> I can only think of two things offhand that could cause such extremely long >>> stalls: Swap thrashing or IRQ storms. >>> >>> With a setup where you can easily trigger long stalls, can you try getting a >>> CPU profile for a stall with sysprof or perf? >>> >>> >> >> Got one with perf: >> >>16.82% Xorg libc-2.18.so[.] >> __memcpy_sse2_unaligned >> 9.20% swapper [kernel.kallsyms] [k] >> intel_idle >> 1.00% Xorg [kernel.kallsyms] [k] >> evergreen_irq_set >> 0.83% firefox libxul.so [.] >> 0x01d93281 >> 0.69% firefox libxul.so [.] >> 0x01d932ad >> 0.62% firefox [kernel.kallsyms] [k] >> copy_user_generic_string >> 0.55% swapper [kernel.kallsyms] [k] >> evergreen_irq_ack >> 0.54% firefox libpthread-2.18.so [.] >> pthread_mutex_lock >> 0.52% firefox libpthread-2.18.so [.] >> pthread_mutex_unlock >> 0.45% Xorg [kernel.kallsyms] [k] >> drm_mm_insert_node_in_range_generic >> 0.41% Xorg [kernel.kallsyms] [k] >> lock_release >> 0.40% Xorg [kernel.kallsyms] [k] >> lock_acquire >> 0.35% firefox firefox [.] >> 0x0001245d >> 0.33% Xorg [kernel.kallsyms] [k] >> __module_address >> 0.31% firefox [kernel.kallsyms] [k] >> clear_page_c >> 0.29% Xorg [kernel.kallsyms] [k] >> copy_user_generic_string >> 0.28% firefox firefox [.] >> 0x00013159 >> >> and: >> >> Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802 >>87.43% swapper [kernel.kallsyms] [k] handle_irq_event_percpu >> 7.52% firefox [kernel.kallsyms] [k] handle_irq_event_percpu >> 1.84% irq/36-ahci [kernel.kallsyms] [k] handle_irq_event_percpu >> 1.14% Xorg [kernel.kallsyms] [k] handle_irq_event_percpu >> 0.75% kworker/5:0 [kernel.kallsyms] [k] handle_irq_event_percpu >> 0.32% gnome-shell [kernel.kallsyms] [k] handle_irq_event_percpu >> 0.25% kworker/5:1H [kernel.kallsyms] [k] handle_irq_event_percpu >> 0.25% Media D~ode #10 [kernel.kallsyms] [k] handle_irq_event_percpu >> 0.19% ImageDe~er #330 [kernel.kallsyms] [k] handle_irq_event_percpu >> 0.07% pulseaudio [kernel.kallsyms] [k] handle_irq_event_percpu >> >> The cycles were with -e cycles:pp, so I think that iret would have >> shown up if there were enough IRQs to cause the problem. >> >> I'll build a kernel with latencytop. >> > > I just caught call_rwsem_down_write_failed for 5379 ms in khugepaged > (holy crap) and radeon_fence_default_wait for 489.2ms in Xorg. > > Turning off THP gets rid of the khugepaged thing. The 489.2ms is > radeon_fence_default_wait is amazingly reproducible -- I've seen that > exact number three times now. Sounds like the long stalls were THP, but the shorter ones might be radeon? Can you get some call graphs for the profile or from latencytop? Make sure at least the kernel is built with frame pointers
Long radeon stalls on recent kernels
On Tue, Nov 25, 2014 at 10:42 PM, Michel Dänzer wrote: > On 20.11.2014 09:58, Andy Lutomirski wrote: >> >> On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski >> wrote: >>> >>> On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer >>> wrote: On 19.11.2014 09:21, Andy Lutomirski wrote: > > > On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer > wrote: >> >> >> On 15.11.2014 07:21, Andy Lutomirski wrote: >>> >>> >>> >>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything >>> graphics intensive seems to cause my system to become unusable for >>> tens of seconds. Pointing Firefox at Google Maps is a big offender >>> -- >>> it can take several minutes for me to move my mouse far enough to >>> close the tab and get my computer back. >>> >>> On bootup, I get this warning: >>> [drm:btc_dpm_set_power_state] *ERROR* >>> rv770_restrict_performance_levels_before_switch failed >>> >>> Setting radeon.dpm=0 seems to work around this problem at the cost of >>> giving my rather slow graphics. >>> >>> Are there known issues here? >> >> >> >> >> Can you bisect the kernel, or at least isolate which kernel version >> first >> introduced the problem? > > > > With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, > 3.16, and 3.18-rc4+. I haven't tried other versions. > > With radeon.dpm=0, I can still trigger short stalls (around one > second), but I seem unable to trigger long stalls easily. (I say > easily because, just as I was typing this email, my system stalled for > about a minute.) I can only think of two things offhand that could cause such extremely long stalls: Swap thrashing or IRQ storms. With a setup where you can easily trigger long stalls, can you try getting a CPU profile for a stall with sysprof or perf? >>> >>> Got one with perf: >>> >>>16.82% Xorg libc-2.18.so[.] >>> __memcpy_sse2_unaligned >>> 9.20% swapper [kernel.kallsyms] [k] >>> intel_idle >>> 1.00% Xorg [kernel.kallsyms] [k] >>> evergreen_irq_set >>> 0.83% firefox libxul.so [.] >>> 0x01d93281 >>> 0.69% firefox libxul.so [.] >>> 0x01d932ad >>> 0.62% firefox [kernel.kallsyms] [k] >>> copy_user_generic_string >>> 0.55% swapper [kernel.kallsyms] [k] >>> evergreen_irq_ack >>> 0.54% firefox libpthread-2.18.so [.] >>> pthread_mutex_lock >>> 0.52% firefox libpthread-2.18.so [.] >>> pthread_mutex_unlock >>> 0.45% Xorg [kernel.kallsyms] [k] >>> drm_mm_insert_node_in_range_generic >>> 0.41% Xorg [kernel.kallsyms] [k] >>> lock_release >>> 0.40% Xorg [kernel.kallsyms] [k] >>> lock_acquire >>> 0.35% firefox firefox [.] >>> 0x0001245d >>> 0.33% Xorg [kernel.kallsyms] [k] >>> __module_address >>> 0.31% firefox [kernel.kallsyms] [k] >>> clear_page_c >>> 0.29% Xorg [kernel.kallsyms] [k] >>> copy_user_generic_string >>> 0.28% firefox firefox [.] >>> 0x00013159 >>> >>> and: >>> >>> Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): >>> 11802 >>>87.43% swapper [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 7.52% firefox [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 1.84% irq/36-ahci [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 1.14% Xorg [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 0.75% kworker/5:0 [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 0.32% gnome-shell [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 0.25% kworker/5:1H [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 0.25% Media D~ode #10 [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 0.19% ImageDe~er #330 [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> 0.07% pulseaudio [kernel.kallsyms] [k] >>> handle_irq_event_percpu >>> >>> The cycles were with -e cycles:pp, so I think that iret would have >>> shown up if there were enough IRQs to cause the problem. >>> >>> I'll build a kernel with latencytop. >>> >> >> I just caught call_rwsem_down_write_failed for 5379 ms in khugepaged >> (holy crap) and radeon_fence_default_wait for 489.2ms in Xorg. >> >> Turning off THP gets rid of the khugepaged thing. The 489.2ms is >>
Long radeon stalls on recent kernels
On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski wrote: > On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer > wrote: >> On 19.11.2014 09:21, Andy Lutomirski wrote: >>> >>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer >>> wrote: On 15.11.2014 07:21, Andy Lutomirski wrote: > > > On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything > graphics intensive seems to cause my system to become unusable for > tens of seconds. Pointing Firefox at Google Maps is a big offender -- > it can take several minutes for me to move my mouse far enough to > close the tab and get my computer back. > > On bootup, I get this warning: > [drm:btc_dpm_set_power_state] *ERROR* > rv770_restrict_performance_levels_before_switch failed > > Setting radeon.dpm=0 seems to work around this problem at the cost of > giving my rather slow graphics. > > Are there known issues here? Can you bisect the kernel, or at least isolate which kernel version first introduced the problem? >>> >>> >>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, >>> 3.16, and 3.18-rc4+. I haven't tried other versions. >>> >>> With radeon.dpm=0, I can still trigger short stalls (around one >>> second), but I seem unable to trigger long stalls easily. (I say >>> easily because, just as I was typing this email, my system stalled for >>> about a minute.) >> >> >> I can only think of two things offhand that could cause such extremely long >> stalls: Swap thrashing or IRQ storms. >> >> With a setup where you can easily trigger long stalls, can you try getting a >> CPU profile for a stall with sysprof or perf? >> >> > > Got one with perf: > > 16.82% Xorg libc-2.18.so[.] > __memcpy_sse2_unaligned >9.20% swapper [kernel.kallsyms] [k] intel_idle >1.00% Xorg [kernel.kallsyms] [k] > evergreen_irq_set >0.83% firefox libxul.so [.] > 0x01d93281 >0.69% firefox libxul.so [.] > 0x01d932ad >0.62% firefox [kernel.kallsyms] [k] > copy_user_generic_string >0.55% swapper [kernel.kallsyms] [k] > evergreen_irq_ack >0.54% firefox libpthread-2.18.so [.] > pthread_mutex_lock >0.52% firefox libpthread-2.18.so [.] > pthread_mutex_unlock >0.45% Xorg [kernel.kallsyms] [k] > drm_mm_insert_node_in_range_generic >0.41% Xorg [kernel.kallsyms] [k] > lock_release >0.40% Xorg [kernel.kallsyms] [k] > lock_acquire >0.35% firefox firefox [.] > 0x0001245d >0.33% Xorg [kernel.kallsyms] [k] > __module_address >0.31% firefox [kernel.kallsyms] [k] > clear_page_c >0.29% Xorg [kernel.kallsyms] [k] > copy_user_generic_string >0.28% firefox firefox [.] > 0x00013159 > > and: > > Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802 > 87.43% swapper [kernel.kallsyms] [k] handle_irq_event_percpu >7.52% firefox [kernel.kallsyms] [k] handle_irq_event_percpu >1.84% irq/36-ahci [kernel.kallsyms] [k] handle_irq_event_percpu >1.14% Xorg [kernel.kallsyms] [k] handle_irq_event_percpu >0.75% kworker/5:0 [kernel.kallsyms] [k] handle_irq_event_percpu >0.32% gnome-shell [kernel.kallsyms] [k] handle_irq_event_percpu >0.25% kworker/5:1H [kernel.kallsyms] [k] handle_irq_event_percpu >0.25% Media D~ode #10 [kernel.kallsyms] [k] handle_irq_event_percpu >0.19% ImageDe~er #330 [kernel.kallsyms] [k] handle_irq_event_percpu >0.07% pulseaudio [kernel.kallsyms] [k] handle_irq_event_percpu > > The cycles were with -e cycles:pp, so I think that iret would have > shown up if there were enough IRQs to cause the problem. > > I'll build a kernel with latencytop. > I just caught call_rwsem_down_write_failed for 5379 ms in khugepaged (holy crap) and radeon_fence_default_wait for 489.2ms in Xorg. Turning off THP gets rid of the khugepaged thing. The 489.2ms is radeon_fence_default_wait is amazingly reproducible -- I've seen that exact number three times now. > --Andy -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On 19.11.2014 09:21, Andy Lutomirski wrote: > On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer wrote: >> On 15.11.2014 07:21, Andy Lutomirski wrote: >>> >>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything >>> graphics intensive seems to cause my system to become unusable for >>> tens of seconds. Pointing Firefox at Google Maps is a big offender -- >>> it can take several minutes for me to move my mouse far enough to >>> close the tab and get my computer back. >>> >>> On bootup, I get this warning: >>> [drm:btc_dpm_set_power_state] *ERROR* >>> rv770_restrict_performance_levels_before_switch failed >>> >>> Setting radeon.dpm=0 seems to work around this problem at the cost of >>> giving my rather slow graphics. >>> >>> Are there known issues here? >> >> >> Can you bisect the kernel, or at least isolate which kernel version first >> introduced the problem? > > With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, > 3.16, and 3.18-rc4+. I haven't tried other versions. > > With radeon.dpm=0, I can still trigger short stalls (around one > second), but I seem unable to trigger long stalls easily. (I say > easily because, just as I was typing this email, my system stalled for > about a minute.) I can only think of two things offhand that could cause such extremely long stalls: Swap thrashing or IRQ storms. With a setup where you can easily trigger long stalls, can you try getting a CPU profile for a stall with sysprof or perf? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Long radeon stalls on recent kernels
On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer wrote: > On 19.11.2014 09:21, Andy Lutomirski wrote: >> >> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer >> wrote: >>> >>> On 15.11.2014 07:21, Andy Lutomirski wrote: On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything graphics intensive seems to cause my system to become unusable for tens of seconds. Pointing Firefox at Google Maps is a big offender -- it can take several minutes for me to move my mouse far enough to close the tab and get my computer back. On bootup, I get this warning: [drm:btc_dpm_set_power_state] *ERROR* rv770_restrict_performance_levels_before_switch failed Setting radeon.dpm=0 seems to work around this problem at the cost of giving my rather slow graphics. Are there known issues here? >>> >>> >>> >>> Can you bisect the kernel, or at least isolate which kernel version first >>> introduced the problem? >> >> >> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, >> 3.16, and 3.18-rc4+. I haven't tried other versions. >> >> With radeon.dpm=0, I can still trigger short stalls (around one >> second), but I seem unable to trigger long stalls easily. (I say >> easily because, just as I was typing this email, my system stalled for >> about a minute.) > > > I can only think of two things offhand that could cause such extremely long > stalls: Swap thrashing or IRQ storms. > > With a setup where you can easily trigger long stalls, can you try getting a > CPU profile for a stall with sysprof or perf? > > Got one with perf: 16.82% Xorg libc-2.18.so[.] __memcpy_sse2_unaligned 9.20% swapper [kernel.kallsyms] [k] intel_idle 1.00% Xorg [kernel.kallsyms] [k] evergreen_irq_set 0.83% firefox libxul.so [.] 0x01d93281 0.69% firefox libxul.so [.] 0x01d932ad 0.62% firefox [kernel.kallsyms] [k] copy_user_generic_string 0.55% swapper [kernel.kallsyms] [k] evergreen_irq_ack 0.54% firefox libpthread-2.18.so [.] pthread_mutex_lock 0.52% firefox libpthread-2.18.so [.] pthread_mutex_unlock 0.45% Xorg [kernel.kallsyms] [k] drm_mm_insert_node_in_range_generic 0.41% Xorg [kernel.kallsyms] [k] lock_release 0.40% Xorg [kernel.kallsyms] [k] lock_acquire 0.35% firefox firefox [.] 0x0001245d 0.33% Xorg [kernel.kallsyms] [k] __module_address 0.31% firefox [kernel.kallsyms] [k] clear_page_c 0.29% Xorg [kernel.kallsyms] [k] copy_user_generic_string 0.28% firefox firefox [.] 0x00013159 and: Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802 87.43% swapper [kernel.kallsyms] [k] handle_irq_event_percpu 7.52% firefox [kernel.kallsyms] [k] handle_irq_event_percpu 1.84% irq/36-ahci [kernel.kallsyms] [k] handle_irq_event_percpu 1.14% Xorg [kernel.kallsyms] [k] handle_irq_event_percpu 0.75% kworker/5:0 [kernel.kallsyms] [k] handle_irq_event_percpu 0.32% gnome-shell [kernel.kallsyms] [k] handle_irq_event_percpu 0.25% kworker/5:1H [kernel.kallsyms] [k] handle_irq_event_percpu 0.25% Media D~ode #10 [kernel.kallsyms] [k] handle_irq_event_percpu 0.19% ImageDe~er #330 [kernel.kallsyms] [k] handle_irq_event_percpu 0.07% pulseaudio [kernel.kallsyms] [k] handle_irq_event_percpu The cycles were with -e cycles:pp, so I think that iret would have shown up if there were enough IRQs to cause the problem. I'll build a kernel with latencytop. --Andy
Long radeon stalls on recent kernels
On Tue, Nov 18, 2014 at 4:34 PM, Andy Lutomirski wrote: > On Tue, Nov 18, 2014 at 4:21 PM, Andy Lutomirski > wrote: >> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer >> wrote: >>> On 15.11.2014 07:21, Andy Lutomirski wrote: I have a Caicos card, like this: [3.077260] [drm] radeon kernel modesetting enabled. [3.077338] checking generic (e000 60) vs hw (e000 1000) [3.077339] fb: switching to radeondrmfb from EFI VGA [3.077377] Console: switching to colour dummy device 80x25 [3.078881] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xE164). [3.078903] [drm] register mmio base: 0xF4A2 [3.078904] [drm] register mmio size: 131072 [3.078982] ATOM BIOS: C26401 [3.079572] radeon :09:00.0: VRAM: 1024M 0x - 0x3FFF (1024M used) [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 - 0x7FFF [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M [3.079577] [drm] RAM width 64bits DDR [3.079755] [TTM] Zone kernel: Available graphics memory: 8186568 kiB [3.079757] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [3.079757] [TTM] Initializing pool allocator [3.079773] [TTM] Initializing DMA pool allocator [3.080011] [drm] radeon: 1024M of VRAM memory ready [3.080012] [drm] radeon: 1024M of GTT memory ready. [3.080049] [drm] Loading CAICOS Microcode [3.080330] [drm] Internal thermal controller without fan control [3.081425] [drm] radeon: power management initialized [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144 [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [3.085030] [drm] PCIE GART of 1024M enabled (table at 0x00274000). [3.085221] radeon :09:00.0: WB enabled [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu addr 0x4c00 and cpu addr 0x88043d914c00 [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu addr 0x4c0c and cpu addr 0x88043d914c0c [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu addr 0x00072118 and cpu addr 0xc900128b2118 [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [3.097442] [drm] Driver supports precise vblank timestamp query. [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X [3.097544] radeon :09:00.0: radeon: using MSI. [3.097614] [drm] radeon: irq initialized. On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything graphics intensive seems to cause my system to become unusable for tens of seconds. Pointing Firefox at Google Maps is a big offender -- it can take several minutes for me to move my mouse far enough to close the tab and get my computer back. On bootup, I get this warning: [drm:btc_dpm_set_power_state] *ERROR* rv770_restrict_performance_levels_before_switch failed Setting radeon.dpm=0 seems to work around this problem at the cost of giving my rather slow graphics. Are there known issues here? >>> >>> >>> Can you bisect the kernel, or at least isolate which kernel version first >>> introduced the problem? >> >> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, >> 3.16, and 3.18-rc4+. I haven't tried other versions. >> >> With radeon.dpm=0, I can still trigger short stalls (around one >> second), but I seem unable to trigger long stalls easily. (I say >> easily because, just as I was typing this email, my system stalled for >> about a minute.) > > I could be wrong here, but I think that radeon.dpm=0, > power_profile=default is okay, but radeon.dpm=0, power_profile=high is > bad. I'm wrong again. power_profile=default is also bad. Grr. --Andy
Long radeon stalls on recent kernels
On Tue, Nov 18, 2014 at 4:21 PM, Andy Lutomirski wrote: > On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer wrote: >> On 15.11.2014 07:21, Andy Lutomirski wrote: >>> >>> I have a Caicos card, like this: >>> >>> [3.077260] [drm] radeon kernel modesetting enabled. >>> [3.077338] checking generic (e000 60) vs hw (e000 >>> 1000) >>> [3.077339] fb: switching to radeondrmfb from EFI VGA >>> [3.077377] Console: switching to colour dummy device 80x25 >>> [3.078881] [drm] initializing kernel modesetting (CAICOS >>> 0x1002:0x6779 0x174B:0xE164). >>> [3.078903] [drm] register mmio base: 0xF4A2 >>> [3.078904] [drm] register mmio size: 131072 >>> [3.078982] ATOM BIOS: C26401 >>> [3.079572] radeon :09:00.0: VRAM: 1024M 0x - >>> 0x3FFF (1024M used) >>> [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 - >>> 0x7FFF >>> [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M >>> [3.079577] [drm] RAM width 64bits DDR >>> [3.079755] [TTM] Zone kernel: Available graphics memory: 8186568 kiB >>> [3.079757] [TTM] Zone dma32: Available graphics memory: 2097152 kiB >>> [3.079757] [TTM] Initializing pool allocator >>> [3.079773] [TTM] Initializing DMA pool allocator >>> [3.080011] [drm] radeon: 1024M of VRAM memory ready >>> [3.080012] [drm] radeon: 1024M of GTT memory ready. >>> [3.080049] [drm] Loading CAICOS Microcode >>> [3.080330] [drm] Internal thermal controller without fan control >>> [3.081425] [drm] radeon: power management initialized >>> [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144 >>> [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with >>> radeon.pcie_gen2=0 >>> [3.085030] [drm] PCIE GART of 1024M enabled (table at >>> 0x00274000). >>> [3.085221] radeon :09:00.0: WB enabled >>> [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu >>> addr 0x4c00 and cpu addr 0x88043d914c00 >>> [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu >>> addr 0x4c0c and cpu addr 0x88043d914c0c >>> [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu >>> addr 0x00072118 and cpu addr 0xc900128b2118 >>> [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). >>> [3.097442] [drm] Driver supports precise vblank timestamp query. >>> [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X >>> [3.097544] radeon :09:00.0: radeon: using MSI. >>> [3.097614] [drm] radeon: irq initialized. >>> >>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything >>> graphics intensive seems to cause my system to become unusable for >>> tens of seconds. Pointing Firefox at Google Maps is a big offender -- >>> it can take several minutes for me to move my mouse far enough to >>> close the tab and get my computer back. >>> >>> On bootup, I get this warning: >>> [drm:btc_dpm_set_power_state] *ERROR* >>> rv770_restrict_performance_levels_before_switch failed >>> >>> Setting radeon.dpm=0 seems to work around this problem at the cost of >>> giving my rather slow graphics. >>> >>> Are there known issues here? >> >> >> Can you bisect the kernel, or at least isolate which kernel version first >> introduced the problem? > > With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, > 3.16, and 3.18-rc4+. I haven't tried other versions. > > With radeon.dpm=0, I can still trigger short stalls (around one > second), but I seem unable to trigger long stalls easily. (I say > easily because, just as I was typing this email, my system stalled for > about a minute.) I could be wrong here, but I think that radeon.dpm=0, power_profile=default is okay, but radeon.dpm=0, power_profile=high is bad. --Andy > > --Andy > >> >> >> -- >> Earthling Michel Dänzer| http://www.amd.com >> Libre software enthusiast |Mesa and X developer > > > > -- > Andy Lutomirski > AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer wrote: > On 15.11.2014 07:21, Andy Lutomirski wrote: >> >> I have a Caicos card, like this: >> >> [3.077260] [drm] radeon kernel modesetting enabled. >> [3.077338] checking generic (e000 60) vs hw (e000 >> 1000) >> [3.077339] fb: switching to radeondrmfb from EFI VGA >> [3.077377] Console: switching to colour dummy device 80x25 >> [3.078881] [drm] initializing kernel modesetting (CAICOS >> 0x1002:0x6779 0x174B:0xE164). >> [3.078903] [drm] register mmio base: 0xF4A2 >> [3.078904] [drm] register mmio size: 131072 >> [3.078982] ATOM BIOS: C26401 >> [3.079572] radeon :09:00.0: VRAM: 1024M 0x - >> 0x3FFF (1024M used) >> [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 - >> 0x7FFF >> [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M >> [3.079577] [drm] RAM width 64bits DDR >> [3.079755] [TTM] Zone kernel: Available graphics memory: 8186568 kiB >> [3.079757] [TTM] Zone dma32: Available graphics memory: 2097152 kiB >> [3.079757] [TTM] Initializing pool allocator >> [3.079773] [TTM] Initializing DMA pool allocator >> [3.080011] [drm] radeon: 1024M of VRAM memory ready >> [3.080012] [drm] radeon: 1024M of GTT memory ready. >> [3.080049] [drm] Loading CAICOS Microcode >> [3.080330] [drm] Internal thermal controller without fan control >> [3.081425] [drm] radeon: power management initialized >> [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144 >> [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with >> radeon.pcie_gen2=0 >> [3.085030] [drm] PCIE GART of 1024M enabled (table at >> 0x00274000). >> [3.085221] radeon :09:00.0: WB enabled >> [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu >> addr 0x4c00 and cpu addr 0x88043d914c00 >> [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu >> addr 0x4c0c and cpu addr 0x88043d914c0c >> [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu >> addr 0x00072118 and cpu addr 0xc900128b2118 >> [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). >> [3.097442] [drm] Driver supports precise vblank timestamp query. >> [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X >> [3.097544] radeon :09:00.0: radeon: using MSI. >> [3.097614] [drm] radeon: irq initialized. >> >> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything >> graphics intensive seems to cause my system to become unusable for >> tens of seconds. Pointing Firefox at Google Maps is a big offender -- >> it can take several minutes for me to move my mouse far enough to >> close the tab and get my computer back. >> >> On bootup, I get this warning: >> [drm:btc_dpm_set_power_state] *ERROR* >> rv770_restrict_performance_levels_before_switch failed >> >> Setting radeon.dpm=0 seems to work around this problem at the cost of >> giving my rather slow graphics. >> >> Are there known issues here? > > > Can you bisect the kernel, or at least isolate which kernel version first > introduced the problem? With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15, 3.16, and 3.18-rc4+. I haven't tried other versions. With radeon.dpm=0, I can still trigger short stalls (around one second), but I seem unable to trigger long stalls easily. (I say easily because, just as I was typing this email, my system stalled for about a minute.) --Andy > > > -- > Earthling Michel Dänzer| http://www.amd.com > Libre software enthusiast |Mesa and X developer -- Andy Lutomirski AMA Capital Management, LLC
Long radeon stalls on recent kernels
On 15.11.2014 07:21, Andy Lutomirski wrote: > I have a Caicos card, like this: > > [3.077260] [drm] radeon kernel modesetting enabled. > [3.077338] checking generic (e000 60) vs hw (e000 1000) > [3.077339] fb: switching to radeondrmfb from EFI VGA > [3.077377] Console: switching to colour dummy device 80x25 > [3.078881] [drm] initializing kernel modesetting (CAICOS > 0x1002:0x6779 0x174B:0xE164). > [3.078903] [drm] register mmio base: 0xF4A2 > [3.078904] [drm] register mmio size: 131072 > [3.078982] ATOM BIOS: C26401 > [3.079572] radeon :09:00.0: VRAM: 1024M 0x - > 0x3FFF (1024M used) > [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 - > 0x7FFF > [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M > [3.079577] [drm] RAM width 64bits DDR > [3.079755] [TTM] Zone kernel: Available graphics memory: 8186568 kiB > [3.079757] [TTM] Zone dma32: Available graphics memory: 2097152 kiB > [3.079757] [TTM] Initializing pool allocator > [3.079773] [TTM] Initializing DMA pool allocator > [3.080011] [drm] radeon: 1024M of VRAM memory ready > [3.080012] [drm] radeon: 1024M of GTT memory ready. > [3.080049] [drm] Loading CAICOS Microcode > [3.080330] [drm] Internal thermal controller without fan control > [3.081425] [drm] radeon: power management initialized > [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144 > [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with > radeon.pcie_gen2=0 > [3.085030] [drm] PCIE GART of 1024M enabled (table at 0x00274000). > [3.085221] radeon :09:00.0: WB enabled > [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu > addr 0x4c00 and cpu addr 0x88043d914c00 > [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu > addr 0x4c0c and cpu addr 0x88043d914c0c > [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu > addr 0x00072118 and cpu addr 0xc900128b2118 > [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [3.097442] [drm] Driver supports precise vblank timestamp query. > [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X > [3.097544] radeon :09:00.0: radeon: using MSI. > [3.097614] [drm] radeon: irq initialized. > > On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything > graphics intensive seems to cause my system to become unusable for > tens of seconds. Pointing Firefox at Google Maps is a big offender -- > it can take several minutes for me to move my mouse far enough to > close the tab and get my computer back. > > On bootup, I get this warning: > [drm:btc_dpm_set_power_state] *ERROR* > rv770_restrict_performance_levels_before_switch failed > > Setting radeon.dpm=0 seems to work around this problem at the cost of > giving my rather slow graphics. > > Are there known issues here? Can you bisect the kernel, or at least isolate which kernel version first introduced the problem? -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
Long radeon stalls on recent kernels
I have a Caicos card, like this: [3.077260] [drm] radeon kernel modesetting enabled. [3.077338] checking generic (e000 60) vs hw (e000 1000) [3.077339] fb: switching to radeondrmfb from EFI VGA [3.077377] Console: switching to colour dummy device 80x25 [3.078881] [drm] initializing kernel modesetting (CAICOS 0x1002:0x6779 0x174B:0xE164). [3.078903] [drm] register mmio base: 0xF4A2 [3.078904] [drm] register mmio size: 131072 [3.078982] ATOM BIOS: C26401 [3.079572] radeon :09:00.0: VRAM: 1024M 0x - 0x3FFF (1024M used) [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 - 0x7FFF [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M [3.079577] [drm] RAM width 64bits DDR [3.079755] [TTM] Zone kernel: Available graphics memory: 8186568 kiB [3.079757] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [3.079757] [TTM] Initializing pool allocator [3.079773] [TTM] Initializing DMA pool allocator [3.080011] [drm] radeon: 1024M of VRAM memory ready [3.080012] [drm] radeon: 1024M of GTT memory ready. [3.080049] [drm] Loading CAICOS Microcode [3.080330] [drm] Internal thermal controller without fan control [3.081425] [drm] radeon: power management initialized [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144 [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [3.085030] [drm] PCIE GART of 1024M enabled (table at 0x00274000). [3.085221] radeon :09:00.0: WB enabled [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu addr 0x4c00 and cpu addr 0x88043d914c00 [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu addr 0x4c0c and cpu addr 0x88043d914c0c [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu addr 0x00072118 and cpu addr 0xc900128b2118 [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [3.097442] [drm] Driver supports precise vblank timestamp query. [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X [3.097544] radeon :09:00.0: radeon: using MSI. [3.097614] [drm] radeon: irq initialized. On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything graphics intensive seems to cause my system to become unusable for tens of seconds. Pointing Firefox at Google Maps is a big offender -- it can take several minutes for me to move my mouse far enough to close the tab and get my computer back. On bootup, I get this warning: [drm:btc_dpm_set_power_state] *ERROR* rv770_restrict_performance_levels_before_switch failed Setting radeon.dpm=0 seems to work around this problem at the cost of giving my rather slow graphics. Are there known issues here? Thanks, Andy