Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: > On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > > too much trouble, a bisect would be pretty useful. > > Bisection seemingly went fine, but the result is odd. > > e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit But it really really is bad. Looking at gitk fork in the road leading to it... 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good e4e818cc2d7c drm: make drm_panel.h self-contained - good 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good Before the git highway splits, all is well. The lane with commits works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: > On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > > too much trouble, a bisect would be pretty useful. > > Bisection seemingly went fine, but the result is odd. > > e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit But it really really is bad. Looking at gitk fork in the road leading to it... 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good e4e818cc2d7c drm: make drm_panel.h self-contained - good 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good Before the git highway splits, all is well. The lane with commits works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful. Bisection seemingly went fine, but the result is odd. e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful. Bisection seemingly went fine, but the result is odd. e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 20:53 +0200, Mike Galbraith wrote: > On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > > too much trouble, a bisect would be pretty useful. > > Vacation -> back to work happens in the very early AM, so bisection > will have to wait a bit. Hm, my backup workstation (old GeForce 8600 GT box) has the same issue, so perhaps I can bisect it as I work on backlog (multitasking: screw up multiple tasks concurrently). -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 20:53 +0200, Mike Galbraith wrote: > On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > > too much trouble, a bisect would be pretty useful. > > Vacation -> back to work happens in the very early AM, so bisection > will have to wait a bit. Hm, my backup workstation (old GeForce 8600 GT box) has the same issue, so perhaps I can bisect it as I work on backlog (multitasking: screw up multiple tasks concurrently). -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > OK, thanks. So in other words, a fairly standard desktop with a PCIe > board plugged in. No funny business. (Laptops can create a ton of > additional weirdness, which I assumed you had since you were talking > about STR.) Yup, garden variety deskside box. > My best guess is that gf119_head_vblank_put either has a bogus head id > (should be in the 0..3 range) which causes it to do an out-of-bounds > read on MMIO space, or that the MMIO mapping has already been removed > by the time nouveau_display_suspend runs. Adding Ben Skeggs for > additional insight. > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful. Vacation -> back to work happens in the very early AM, so bisection will have to wait a bit. -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: > > OK, thanks. So in other words, a fairly standard desktop with a PCIe > board plugged in. No funny business. (Laptops can create a ton of > additional weirdness, which I assumed you had since you were talking > about STR.) Yup, garden variety deskside box. > My best guess is that gf119_head_vblank_put either has a bogus head id > (should be in the 0..3 range) which causes it to do an out-of-bounds > read on MMIO space, or that the MMIO mapping has already been removed > by the time nouveau_display_suspend runs. Adding Ben Skeggs for > additional insight. > > Some display stuff did change for 4.13 for GM20x+ boards. If it's not > too much trouble, a bisect would be pretty useful. Vacation -> back to work happens in the very early AM, so bisection will have to wait a bit. -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: > Some details that may be useful in analysis of the bug: > > 1. lspci -nn -d 10de: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1 > 2. What displays, if any, you have plugged into the NVIDIA board when > this happens? A Philips 273V, via DVI. > 3. Any boot parameters, esp relating to ACPI, PM, or related? None for those, what's there that will be unfamiliar to you are for patches that aren't applied. nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 ignore_loglevel crashkernel=256M,high -Mike
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote: > Some details that may be useful in analysis of the bug: > > 1. lspci -nn -d 10de: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1 > 2. What displays, if any, you have plugged into the NVIDIA board when > this happens? A Philips 273V, via DVI. > 3. Any boot parameters, esp relating to ACPI, PM, or related? None for those, what's there that will be unfamiliar to you are for patches that aren't applied. nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0 nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60 ignore_loglevel crashkernel=256M,high -Mike
[regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
Greetings, I met $subject in master-rt post drm merge, but taking the config (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible. KERNEL: vmlinux-4.12.0.g9967468-preempt.gz DUMPFILE: vmcore CPUS: 8 DATE: Tue Jul 11 18:55:28 2017 UPTIME: 00:02:03 LOAD AVERAGE: 3.43, 1.39, 0.52 TASKS: 467 NODENAME: homer RELEASE: 4.12.0.g9967468-preempt VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017 MACHINE: x86_64 (3591 Mhz) MEMORY: 16 GB PANIC: "BUG: unable to handle kernel paging request at a022990f" PID: 4658 COMMAND: "kworker/u16:26" TASK: 8803c6068f80 [THREAD_INFO: 8803c6068f80] CPU: 7 STATE: TASK_RUNNING (PANIC) crash> bt PID: 4658 TASK: 8803c6068f80 CPU: 7 COMMAND: "kworker/u16:26" #0 [c900039f76a0] machine_kexec at 810481fc #1 [c900039f76f0] __crash_kexec at 81109e3a #2 [c900039f77b0] crash_kexec at 8110adc9 #3 [c900039f77c8] oops_end at 8101d059 #4 [c900039f77e8] no_context at 81055ce5 #5 [c900039f7838] do_page_fault at 81056c5b #6 [c900039f7860] page_fault at 81690a88 [exception RIP: report_bug+93] RIP: 8167227d RSP: c900039f7918 RFLAGS: 00010002 RAX: a0229905 RBX: a020af0f RCX: 0001 RDX: 0907 RSI: a020af11 RDI: 98f6 RBP: c900039f7a58 R8: 0001 R9: 03fc R10: 81a01906 R11: 8803f84711f8 R12: a02231fb R13: 0260 R14: 0004 R15: 0006 ORIG_RAX: CS: 0010 SS: 0018 #7 [c900039f7910] report_bug at 81672248 #8 [c900039f7938] fixup_bug at 8101af85 #9 [c900039f7950] do_trap at 8101b0d9 #10 [c900039f79a0] do_error_trap at 8101b190 #11 [c900039f7a50] invalid_op at 8169063e [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335] RIP: a020af0f RSP: c900039f7b00 RFLAGS: 00010086 RAX: a04fa100 RBX: 8803f9550800 RCX: 0001 RDX: a0228a58 RSI: 0001 RDI: a022321b RBP: c900039f7b80 R8: R9: a020adc0 R10: a048a1b0 R11: 8803f84711f8 R12: 0001 R13: 8803f8471000 R14: c900039f7b94 R15: c900039f7bd0 ORIG_RAX: CS: 0010 SS: 0018 #12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau] #13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm] #14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm] #15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm] #16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm] #17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau] #18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau] #19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau] #20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau] #21 [c900039f7d50] pci_pm_suspend at 813b1ff0 #22 [c900039f7d80] dpm_run_callback at 814c4dbd #23 [c900039f7db8] __device_suspend at 814c5a61 #24 [c900039f7e30] async_suspend at 814c5cfa #25 [c900039f7e48] async_run_entry_fn at 81091683 #26 [c900039f7e70] process_one_work at 810882bc #27 [c900039f7eb0] worker_thread at 8108854a #28 [c900039f7f10] kthread at 8108e387 #29 [c900039f7f50] ret_from_fork at 8168fa85 crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335 0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos (drivers/gpu/drm/drm_vblank.c:608). 603 /* If mode timing undefined, just return as no-op: 604 * Happens during initial modesetting of a crtc. 605 */ 606 if (mode->crtc_clock == 0) { 607 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); 609 610 return false; 611 } 612 crash> gdb list *report_bug+93 0x8167227d is in report_bug (lib/bug.c:177). 172 return BUG_TRAP_TYPE_WARN; 173 174 /* 175 * Since this is the only store, concurrency is not an issue. 176 */ 177 bug->flags |= BUGFLAG_DONE; 178 } 179 } 180 181 if (warning) { crash> config.xz Description: application/xz
[regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
Greetings, I met $subject in master-rt post drm merge, but taking the config (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible. KERNEL: vmlinux-4.12.0.g9967468-preempt.gz DUMPFILE: vmcore CPUS: 8 DATE: Tue Jul 11 18:55:28 2017 UPTIME: 00:02:03 LOAD AVERAGE: 3.43, 1.39, 0.52 TASKS: 467 NODENAME: homer RELEASE: 4.12.0.g9967468-preempt VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017 MACHINE: x86_64 (3591 Mhz) MEMORY: 16 GB PANIC: "BUG: unable to handle kernel paging request at a022990f" PID: 4658 COMMAND: "kworker/u16:26" TASK: 8803c6068f80 [THREAD_INFO: 8803c6068f80] CPU: 7 STATE: TASK_RUNNING (PANIC) crash> bt PID: 4658 TASK: 8803c6068f80 CPU: 7 COMMAND: "kworker/u16:26" #0 [c900039f76a0] machine_kexec at 810481fc #1 [c900039f76f0] __crash_kexec at 81109e3a #2 [c900039f77b0] crash_kexec at 8110adc9 #3 [c900039f77c8] oops_end at 8101d059 #4 [c900039f77e8] no_context at 81055ce5 #5 [c900039f7838] do_page_fault at 81056c5b #6 [c900039f7860] page_fault at 81690a88 [exception RIP: report_bug+93] RIP: 8167227d RSP: c900039f7918 RFLAGS: 00010002 RAX: a0229905 RBX: a020af0f RCX: 0001 RDX: 0907 RSI: a020af11 RDI: 98f6 RBP: c900039f7a58 R8: 0001 R9: 03fc R10: 81a01906 R11: 8803f84711f8 R12: a02231fb R13: 0260 R14: 0004 R15: 0006 ORIG_RAX: CS: 0010 SS: 0018 #7 [c900039f7910] report_bug at 81672248 #8 [c900039f7938] fixup_bug at 8101af85 #9 [c900039f7950] do_trap at 8101b0d9 #10 [c900039f79a0] do_error_trap at 8101b190 #11 [c900039f7a50] invalid_op at 8169063e [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335] RIP: a020af0f RSP: c900039f7b00 RFLAGS: 00010086 RAX: a04fa100 RBX: 8803f9550800 RCX: 0001 RDX: a0228a58 RSI: 0001 RDI: a022321b RBP: c900039f7b80 R8: R9: a020adc0 R10: a048a1b0 R11: 8803f84711f8 R12: 0001 R13: 8803f8471000 R14: c900039f7b94 R15: c900039f7bd0 ORIG_RAX: CS: 0010 SS: 0018 #12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau] #13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm] #14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm] #15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm] #16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm] #17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau] #18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau] #19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau] #20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau] #21 [c900039f7d50] pci_pm_suspend at 813b1ff0 #22 [c900039f7d80] dpm_run_callback at 814c4dbd #23 [c900039f7db8] __device_suspend at 814c5a61 #24 [c900039f7e30] async_suspend at 814c5cfa #25 [c900039f7e48] async_run_entry_fn at 81091683 #26 [c900039f7e70] process_one_work at 810882bc #27 [c900039f7eb0] worker_thread at 8108854a #28 [c900039f7f10] kthread at 8108e387 #29 [c900039f7f50] ret_from_fork at 8168fa85 crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335 0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos (drivers/gpu/drm/drm_vblank.c:608). 603 /* If mode timing undefined, just return as no-op: 604 * Happens during initial modesetting of a crtc. 605 */ 606 if (mode->crtc_clock == 0) { 607 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); 609 610 return false; 611 } 612 crash> gdb list *report_bug+93 0x8167227d is in report_bug (lib/bug.c:177). 172 return BUG_TRAP_TYPE_WARN; 173 174 /* 175 * Since this is the only store, concurrency is not an issue. 176 */ 177 bug->flags |= BUGFLAG_DONE; 178 } 179 } 180 181 if (warning) { crash> config.xz Description: application/xz
arch/x86/kernel/ftrace.o: warning: objtool: ftrace_modify_code_direct()+0x2d: sibling call from callable instruction with modified stack frame
Greetings, I'm seeing $subject in master as of this merge window. I don't see it with my usual stripped down PREEMPT_VOLUNTARY or an enterprise-ish NOPREEMPT config, but do see it with the attached, derived from my enterprise-ish PREEMPT_RT config after running make oldconfig on it, and reproducing the warning in virgin source. -Mike config.xz Description: application/xz
arch/x86/kernel/ftrace.o: warning: objtool: ftrace_modify_code_direct()+0x2d: sibling call from callable instruction with modified stack frame
Greetings, I'm seeing $subject in master as of this merge window. I don't see it with my usual stripped down PREEMPT_VOLUNTARY or an enterprise-ish NOPREEMPT config, but do see it with the attached, derived from my enterprise-ish PREEMPT_RT config after running make oldconfig on it, and reproducing the warning in virgin source. -Mike config.xz Description: application/xz
Re: wake_wide mechanism clarification
On Fri, 2017-06-30 at 10:28 -0400, Josef Bacik wrote: > On Thu, Jun 29, 2017 at 08:04:59PM -0700, Joel Fernandes wrote: > > > That makes sense that we multiply slave's flips by a factor because > > its low, but I still didn't get why the factor is chosen to be > > llc_size instead of something else for the multiplication with slave > > (slave * factor). > Yeah I don't know why llc_size was chosen... static void update_top_cache_domain(int cpu) { struct sched_domain_shared *sds = NULL; struct sched_domain *sd; int id = cpu; int size = 1; sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES); if (sd) { id = cpumask_first(sched_domain_span(sd)); size = cpumask_weight(sched_domain_span(sd)); sds = sd->shared; } rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); per_cpu(sd_llc_size, cpu) = size; The goal of wake wide was to approximate when pulling would be a futile consolidation effort and counterproductive to scaling. 'course with ever increasing socket size, any 1:N waker is ever more likely to run out of CPU for its one and only self (slamming into scaling wall) before it needing to turn its minions loose to conquer the world. Something else to consider: network interrupt waking multiple workers at high frequency. If the waking CPU is idle, do you really want to place a worker directly in front of a tattoo artist, or is it better off nearly anywhere but there? If the box is virtual, with no topology exposed (or real but ancient) to let select_idle_sibling() come to the rescue, two workers can even get tattooed simultaneously (see sync wakeup). -Mike
Re: wake_wide mechanism clarification
On Fri, 2017-06-30 at 10:28 -0400, Josef Bacik wrote: > On Thu, Jun 29, 2017 at 08:04:59PM -0700, Joel Fernandes wrote: > > > That makes sense that we multiply slave's flips by a factor because > > its low, but I still didn't get why the factor is chosen to be > > llc_size instead of something else for the multiplication with slave > > (slave * factor). > Yeah I don't know why llc_size was chosen... static void update_top_cache_domain(int cpu) { struct sched_domain_shared *sds = NULL; struct sched_domain *sd; int id = cpu; int size = 1; sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES); if (sd) { id = cpumask_first(sched_domain_span(sd)); size = cpumask_weight(sched_domain_span(sd)); sds = sd->shared; } rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); per_cpu(sd_llc_size, cpu) = size; The goal of wake wide was to approximate when pulling would be a futile consolidation effort and counterproductive to scaling. 'course with ever increasing socket size, any 1:N waker is ever more likely to run out of CPU for its one and only self (slamming into scaling wall) before it needing to turn its minions loose to conquer the world. Something else to consider: network interrupt waking multiple workers at high frequency. If the waking CPU is idle, do you really want to place a worker directly in front of a tattoo artist, or is it better off nearly anywhere but there? If the box is virtual, with no topology exposed (or real but ancient) to let select_idle_sibling() come to the rescue, two workers can even get tattooed simultaneously (see sync wakeup). -Mike
Re: wake_wide mechanism clarification
On Thu, 2017-06-29 at 20:49 -0400, Josef Bacik wrote: > On Thu, Jun 29, 2017 at 05:19:14PM -0700, Joel Fernandes wrote: > > > Why are wanting the master's flip frequency to be higher than the > > slaves by the factor? > > (Responding from my personal email as my work email is outlook shit and > impossible to use) > > Because we are trying to detect the case that the master is waking many > different processes, and the 'slave' processes are only waking up the > master/some other specific processes to determine if we don't care about cache > locality. Yes, the heuristic (large delta implies waker/wakee are NOT 1:1, ie filter out high frequency communication where eating misses doesn't merely sting, it hurts like hell) just became bidirectional. > Actually I think both are wrong, but I need Mike to weigh in. My weigh in is this: if you have ideas to improve or replace that heuristic, by all means go for it, just make damn sure it's dirt cheap. Heuristics all suck one way or another, problem is that nasty old "perfect is the enemy of good" adage. Make it perfect, it'll hurt. -Mike
Re: wake_wide mechanism clarification
On Thu, 2017-06-29 at 20:49 -0400, Josef Bacik wrote: > On Thu, Jun 29, 2017 at 05:19:14PM -0700, Joel Fernandes wrote: > > > Why are wanting the master's flip frequency to be higher than the > > slaves by the factor? > > (Responding from my personal email as my work email is outlook shit and > impossible to use) > > Because we are trying to detect the case that the master is waking many > different processes, and the 'slave' processes are only waking up the > master/some other specific processes to determine if we don't care about cache > locality. Yes, the heuristic (large delta implies waker/wakee are NOT 1:1, ie filter out high frequency communication where eating misses doesn't merely sting, it hurts like hell) just became bidirectional. > Actually I think both are wrong, but I need Mike to weigh in. My weigh in is this: if you have ideas to improve or replace that heuristic, by all means go for it, just make damn sure it's dirt cheap. Heuristics all suck one way or another, problem is that nasty old "perfect is the enemy of good" adage. Make it perfect, it'll hurt. -Mike
Re: [ANNOUNCE] v4.11.7-rt3
On Thu, 2017-06-29 at 04:55 +0200, Mike Galbraith wrote: > > cpus_allowed became cpus_mask. Anything (crash.. hohum, yet again) > that rummages around in the kernels gizzard will have to adapt. (wrt crash: nope, it doesn't care for a change)
Re: [ANNOUNCE] v4.11.7-rt3
On Thu, 2017-06-29 at 04:55 +0200, Mike Galbraith wrote: > > cpus_allowed became cpus_mask. Anything (crash.. hohum, yet again) > that rummages around in the kernels gizzard will have to adapt. (wrt crash: nope, it doesn't care for a change)
Re: [ANNOUNCE] v4.11.7-rt3
On Thu, 2017-06-29 at 02:30 +0200, Bernhard Landauer wrote: > I'm unable to compile the ndiswrapper extramodule against 4.11.7-rt3 > > I get lots of errors of the kind > > 'struct task_struct' has no member named 'cpus_allowed'; did you mean > 'nr_cpus_allowed'? > > What can I do? cpus_allowed became cpus_mask. Anything (crash.. hohum, yet again) that rummages around in the kernels gizzard will have to adapt. -Mike
Re: [ANNOUNCE] v4.11.7-rt3
On Thu, 2017-06-29 at 02:30 +0200, Bernhard Landauer wrote: > I'm unable to compile the ndiswrapper extramodule against 4.11.7-rt3 > > I get lots of errors of the kind > > 'struct task_struct' has no member named 'cpus_allowed'; did you mean > 'nr_cpus_allowed'? > > What can I do? cpus_allowed became cpus_mask. Anything (crash.. hohum, yet again) that rummages around in the kernels gizzard will have to adapt. -Mike
Re: [patch-rt v2] rtmutex: Fix lock stealing logic
On Fri, 2017-06-23 at 09:33 -0400, Steven Rostedt wrote: > struct task_struct *task, > > @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct > > */ > > if (waiter) { > > /* > > -* If waiter is not the highest priority waiter of > > -* @lock, give up. > > +* If waiter is not the highest priority waiter of @lock, > > +* or its peer when lateral steal is allowed, give up. > > */ > > - if (waiter != rt_mutex_top_waiter(lock)) { > > - /* XXX rt_mutex_waiter_less() ? */ > > + if (!rt_mutex_steal(lock, waiter, mode)) > > return 0; > > - } > > - > > /* > > * We can acquire the lock. Remove the waiter from the > > * lock waiters tree. > > */ > > rt_mutex_dequeue(lock, waiter); > > - > > I liked that space. I like minus signs in diffstat, that one was a freebee. Maintainers can revive it if they like, or I can post a V3 with it revived as well as s/rt_mutex_steal/rt_mutex_claim. -Mike
Re: [patch-rt v2] rtmutex: Fix lock stealing logic
On Fri, 2017-06-23 at 09:33 -0400, Steven Rostedt wrote: > struct task_struct *task, > > @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct > > */ > > if (waiter) { > > /* > > -* If waiter is not the highest priority waiter of > > -* @lock, give up. > > +* If waiter is not the highest priority waiter of @lock, > > +* or its peer when lateral steal is allowed, give up. > > */ > > - if (waiter != rt_mutex_top_waiter(lock)) { > > - /* XXX rt_mutex_waiter_less() ? */ > > + if (!rt_mutex_steal(lock, waiter, mode)) > > return 0; > > - } > > - > > /* > > * We can acquire the lock. Remove the waiter from the > > * lock waiters tree. > > */ > > rt_mutex_dequeue(lock, waiter); > > - > > I liked that space. I like minus signs in diffstat, that one was a freebee. Maintainers can revive it if they like, or I can post a V3 with it revived as well as s/rt_mutex_steal/rt_mutex_claim. -Mike
Re: [patch-rt v2] rtmutex: Fix lock stealing logic
On Fri, 2017-06-23 at 09:37 +0200, Mike Galbraith wrote: > V2 changes: > - beautification (ymmv) > - enable lock stealing when waiter is queued (V3 s/steal/claim)
Re: [patch-rt v2] rtmutex: Fix lock stealing logic
On Fri, 2017-06-23 at 09:37 +0200, Mike Galbraith wrote: > V2 changes: > - beautification (ymmv) > - enable lock stealing when waiter is queued (V3 s/steal/claim)
[patch-rt v2] rtmutex: Fix lock stealing logic
V2 changes: - beautification (ymmv) - enable lock stealing when waiter is queued rtmutex: Fix lock stealing logic 1. When trying to acquire an rtmutex, we first try to grab it without queueing the waiter, and explicitly check for that initial attempt in the !waiter path of __try_to_take_rt_mutex(). Checking whether the lock taker is top waiter before allowing a steal attempt in that path is a thinko: the lock taker has not yet blocked. 2. It seems wrong to change the definition of rt_mutex_waiter_less() to mean less or perhaps equal when we have an rt_mutex_waiter_equal(). Remove the thinko, restore rt_mutex_waiter_less(), implement and use rt_mutex_steal() based upon rt_mutex_waiter_less/equal(), moving all qualification criteria into the function itself. Signed-off-by: Mike Galbraith <efa...@gmx.de> --- kernel/locking/rtmutex.c | 76 ++- 1 file changed, 37 insertions(+), 39 deletions(-) --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe( } #endif -#define STEAL_NORMAL 0 -#define STEAL_LATERAL 1 - /* * Only use with rt_mutex_waiter_{less,equal}() */ -#define task_to_waiter(p) \ - &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } +#define task_to_waiter(p) &(struct rt_mutex_waiter) \ + { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) } static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, -struct rt_mutex_waiter *right, int mode) +struct rt_mutex_waiter *right) { - if (mode == STEAL_NORMAL) { - if (left->prio < right->prio) - return 1; - } else { - if (left->prio <= right->prio) - return 1; - } + if (left->prio < right->prio) + return 1; + /* * If both waiters have dl_prio(), we check the deadlines of the * associated tasks. @@ -287,6 +280,27 @@ rt_mutex_waiter_equal(struct rt_mutex_wa return 1; } +#define STEAL_NORMAL 0 +#define STEAL_LATERAL 1 + +static inline int +rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode) +{ + struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock); + + if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter)) + return 1; + + /* +* Note that RT tasks are excluded from lateral-steals +* to prevent the introduction of an unbounded latency. +*/ + if (mode == STEAL_NORMAL || rt_task(waiter->task)) + return 0; + + return rt_mutex_waiter_equal(waiter, top_waiter); +} + static void rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { @@ -298,7 +312,7 @@ rt_mutex_enqueue(struct rt_mutex *lock, while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -337,7 +351,7 @@ rt_mutex_enqueue_pi(struct task_struct * while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -847,6 +861,7 @@ static int rt_mutex_adjust_prio_chain(st * @task: The task which wants to acquire the lock * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL + * @mode: Lock steal mode (STEAL_NORMAL, STEAL_LATERAL) */ static int __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct */ if (waiter) { /* -* If waiter is not the highest priority waiter of -* @lock, give up. +* If waiter is not the highest priority waiter of @lock, +* or its peer when lateral steal is allowed, give up. */ - if (waiter != rt_mutex_top_waiter(lock)) { - /* XXX rt_mutex_waiter_less() ? */ + if (!rt_mutex_steal(lock, waiter, mode)) return 0; - } - /* * We can acquire the lock. Remove the waiter from the * lock waiters tree.
[patch-rt v2] rtmutex: Fix lock stealing logic
V2 changes: - beautification (ymmv) - enable lock stealing when waiter is queued rtmutex: Fix lock stealing logic 1. When trying to acquire an rtmutex, we first try to grab it without queueing the waiter, and explicitly check for that initial attempt in the !waiter path of __try_to_take_rt_mutex(). Checking whether the lock taker is top waiter before allowing a steal attempt in that path is a thinko: the lock taker has not yet blocked. 2. It seems wrong to change the definition of rt_mutex_waiter_less() to mean less or perhaps equal when we have an rt_mutex_waiter_equal(). Remove the thinko, restore rt_mutex_waiter_less(), implement and use rt_mutex_steal() based upon rt_mutex_waiter_less/equal(), moving all qualification criteria into the function itself. Signed-off-by: Mike Galbraith --- kernel/locking/rtmutex.c | 76 ++- 1 file changed, 37 insertions(+), 39 deletions(-) --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe( } #endif -#define STEAL_NORMAL 0 -#define STEAL_LATERAL 1 - /* * Only use with rt_mutex_waiter_{less,equal}() */ -#define task_to_waiter(p) \ - &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } +#define task_to_waiter(p) &(struct rt_mutex_waiter) \ + { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) } static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, -struct rt_mutex_waiter *right, int mode) +struct rt_mutex_waiter *right) { - if (mode == STEAL_NORMAL) { - if (left->prio < right->prio) - return 1; - } else { - if (left->prio <= right->prio) - return 1; - } + if (left->prio < right->prio) + return 1; + /* * If both waiters have dl_prio(), we check the deadlines of the * associated tasks. @@ -287,6 +280,27 @@ rt_mutex_waiter_equal(struct rt_mutex_wa return 1; } +#define STEAL_NORMAL 0 +#define STEAL_LATERAL 1 + +static inline int +rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode) +{ + struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock); + + if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter)) + return 1; + + /* +* Note that RT tasks are excluded from lateral-steals +* to prevent the introduction of an unbounded latency. +*/ + if (mode == STEAL_NORMAL || rt_task(waiter->task)) + return 0; + + return rt_mutex_waiter_equal(waiter, top_waiter); +} + static void rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { @@ -298,7 +312,7 @@ rt_mutex_enqueue(struct rt_mutex *lock, while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -337,7 +351,7 @@ rt_mutex_enqueue_pi(struct task_struct * while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -847,6 +861,7 @@ static int rt_mutex_adjust_prio_chain(st * @task: The task which wants to acquire the lock * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL + * @mode: Lock steal mode (STEAL_NORMAL, STEAL_LATERAL) */ static int __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct */ if (waiter) { /* -* If waiter is not the highest priority waiter of -* @lock, give up. +* If waiter is not the highest priority waiter of @lock, +* or its peer when lateral steal is allowed, give up. */ - if (waiter != rt_mutex_top_waiter(lock)) { - /* XXX rt_mutex_waiter_less() ? */ + if (!rt_mutex_steal(lock, waiter, mode)) return 0; - } - /* * We can acquire the lock. Remove the waiter from the * lock waiters tree. */
Re: [ANNOUNCE] v4.11.5-rt1
On Thu, 2017-06-22 at 19:30 +0200, Mike Galbraith wrote: > On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote: > > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote: > > > See ! and ? > > > > See see. > > What about this: > > I'll give it a go, likely during the weekend. What the heck, after so much time expended squabbling with this bugger, beans/biscuits work will hardly notice a mere hour or so... Yup, it's confirmed dead, and no doubt my GUI goblins will be too. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Thu, 2017-06-22 at 19:30 +0200, Mike Galbraith wrote: > On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote: > > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote: > > > See ! and ? > > > > See see. > > What about this: > > I'll give it a go, likely during the weekend. What the heck, after so much time expended squabbling with this bugger, beans/biscuits work will hardly notice a mere hour or so... Yup, it's confirmed dead, and no doubt my GUI goblins will be too. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote: > > See ! and ? > > See see. > What about this: I'll give it a go, likely during the weekend. I moved 4.11-rt today (also repros nicely) due to ftrace annoying me. After yet more staring at ever more huge traces (opposite of goal;), then taking a break to stare at source again, I decided that the dual wake_q business should die.. and the stall died with it. > diff --git a/include/linux/sched.h b/include/linux/sched.h > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1014,8 +1014,20 @@ struct wake_q_head { > #define WAKE_Q(name) \ > struct wake_q_head name = { WAKE_Q_TAIL, } > > -extern void wake_q_add(struct wake_q_head *head, > - struct task_struct *task); > +extern void __wake_q_add(struct wake_q_head *head, > + struct task_struct *task, bool sleeper); > +static inline void wake_q_add(struct wake_q_head *head, > + struct task_struct *task) > +{ > + __wake_q_add(head, task, false); > +} > + > +static inline void wake_q_add_sleeper(struct wake_q_head *head, > + struct task_struct *task) > +{ > + __wake_q_add(head, task, true); > +} > + > extern void __wake_up_q(struct wake_q_head *head, bool sleeper); > > static inline void wake_up_q(struct wake_q_head *head) > @@ -1745,6 +1757,7 @@ struct task_struct { > raw_spinlock_t pi_lock; > > struct wake_q_node wake_q; > + struct wake_q_node wake_q_sleeper; > > #ifdef CONFIG_RT_MUTEXES > /* PI waiters blocked on a rt_mutex held by this task */ > diff --git a/kernel/fork.c b/kernel/fork.c > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct > task_struct *orig, int node) > tsk->splice_pipe = NULL; > tsk->task_frag.page = NULL; > tsk->wake_q.next = NULL; > + tsk->wake_q_sleeper.next = NULL; > > account_kernel_stack(tsk, 1); > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head > *wake_q, >*/ > preempt_disable(); > if (waiter->savestate) > - wake_q_add(wake_sleeper_q, waiter->task); > + wake_q_add_sleeper(wake_sleeper_q, waiter->task); > else > wake_q_add(wake_q, waiter->task); > raw_spin_unlock(>pi_lock); > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p) > #endif > #endif > > -void wake_q_add(struct wake_q_head *head, struct task_struct *task) > +void __wake_q_add(struct wake_q_head *head, struct task_struct *task, > + bool sleeper) > { > - struct wake_q_node *node = >wake_q; > + struct wake_q_node *node; > + > + if (sleeper) > + node = >wake_q_sleeper; > + else > + node = >wake_q; > > /* >* Atomically grab the task, if ->wake_q is !nil already it means > @@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper) > while (node != WAKE_Q_TAIL) { > struct task_struct *task; > > - task = container_of(node, struct task_struct, wake_q); > + if (sleeper) > + task = container_of(node, struct task_struct, > wake_q_sleeper); > + else > + task = container_of(node, struct task_struct, wake_q); > BUG_ON(!task); > /* task can safely be re-inserted now */ > node = node->next; > - task->wake_q.next = NULL; > + if (sleeper) > + task->wake_q_sleeper.next = NULL; > + else > + task->wake_q.next = NULL; > > /* >* wake_up_process() implies a wmb() to pair with the queueing
Re: [ANNOUNCE] v4.11.5-rt1
On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote: > > See ! and ? > > See see. > What about this: I'll give it a go, likely during the weekend. I moved 4.11-rt today (also repros nicely) due to ftrace annoying me. After yet more staring at ever more huge traces (opposite of goal;), then taking a break to stare at source again, I decided that the dual wake_q business should die.. and the stall died with it. > diff --git a/include/linux/sched.h b/include/linux/sched.h > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1014,8 +1014,20 @@ struct wake_q_head { > #define WAKE_Q(name) \ > struct wake_q_head name = { WAKE_Q_TAIL, } > > -extern void wake_q_add(struct wake_q_head *head, > - struct task_struct *task); > +extern void __wake_q_add(struct wake_q_head *head, > + struct task_struct *task, bool sleeper); > +static inline void wake_q_add(struct wake_q_head *head, > + struct task_struct *task) > +{ > + __wake_q_add(head, task, false); > +} > + > +static inline void wake_q_add_sleeper(struct wake_q_head *head, > + struct task_struct *task) > +{ > + __wake_q_add(head, task, true); > +} > + > extern void __wake_up_q(struct wake_q_head *head, bool sleeper); > > static inline void wake_up_q(struct wake_q_head *head) > @@ -1745,6 +1757,7 @@ struct task_struct { > raw_spinlock_t pi_lock; > > struct wake_q_node wake_q; > + struct wake_q_node wake_q_sleeper; > > #ifdef CONFIG_RT_MUTEXES > /* PI waiters blocked on a rt_mutex held by this task */ > diff --git a/kernel/fork.c b/kernel/fork.c > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct > task_struct *orig, int node) > tsk->splice_pipe = NULL; > tsk->task_frag.page = NULL; > tsk->wake_q.next = NULL; > + tsk->wake_q_sleeper.next = NULL; > > account_kernel_stack(tsk, 1); > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head > *wake_q, >*/ > preempt_disable(); > if (waiter->savestate) > - wake_q_add(wake_sleeper_q, waiter->task); > + wake_q_add_sleeper(wake_sleeper_q, waiter->task); > else > wake_q_add(wake_q, waiter->task); > raw_spin_unlock(>pi_lock); > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p) > #endif > #endif > > -void wake_q_add(struct wake_q_head *head, struct task_struct *task) > +void __wake_q_add(struct wake_q_head *head, struct task_struct *task, > + bool sleeper) > { > - struct wake_q_node *node = >wake_q; > + struct wake_q_node *node; > + > + if (sleeper) > + node = >wake_q_sleeper; > + else > + node = >wake_q; > > /* >* Atomically grab the task, if ->wake_q is !nil already it means > @@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper) > while (node != WAKE_Q_TAIL) { > struct task_struct *task; > > - task = container_of(node, struct task_struct, wake_q); > + if (sleeper) > + task = container_of(node, struct task_struct, > wake_q_sleeper); > + else > + task = container_of(node, struct task_struct, wake_q); > BUG_ON(!task); > /* task can safely be re-inserted now */ > node = node->next; > - task->wake_q.next = NULL; > + if (sleeper) > + task->wake_q_sleeper.next = NULL; > + else > + task->wake_q.next = NULL; > > /* >* wake_up_process() implies a wmb() to pair with the queueing
Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
On Tue, 2017-06-20 at 13:42 -0400, Rik van Riel wrote: > On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote: > > Although idle load balancing obviously only concern idle CPUs, it can > > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get > > rid > > of an idle load balancing duty once a tick fires while it runs a task > > and this can take a while in a nohz_full CPU. > > > > We could fix that and escape the idle load balancing duty from the > > very > > idle exit path but that would bring unecessary overhead. Lets just > > not > > bother and leave that job to housekeeping CPUs (those outside > > nohz_full > > range). The nohz_full CPUs simply don't want any disturbance. > > > > Signed-off-by: Frederic Weisbecker> > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: Rik van Riel > > Cc: Peter Zijlstra > > --- > > kernel/sched/fair.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index d711093..cfca960 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu) > > if (!cpu_active(cpu)) > > return; > > > > + /* Spare idle load balancing on CPUs that don't want to be > > disturbed */ > > + if (!is_housekeeping_cpu(cpu)) > > + return; > > + > > if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu))) > > return; > > I am not entirely convinced on this one. > > Doesn't the if (on_null_domain(cpu_rq(cpu)) test > a few lines down take care of this already? > > Do we want nohz_full to always automatically > imply that no idle balancing will happen, like > on isolated CPUs? IMO, nohz_full capable CPUs that are not isolated should automatically become housekeepers, and nohz_full _active_ upon becoming isolated. When a used as a housekeeper, you still pay a price for having the nohz_full capability available, but it doesn't have to be as high. In my kernels, I use cpusets to turn nohz on/off set wise, so CPUs can be ticking, dyntick, nohz_full or housekeeper, RT load balancing and cpupri on/off as well if you want to assume full responsibility. It's a tad (from box of xxl tads) ugly, but more flexible. -Mike
Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
On Tue, 2017-06-20 at 13:42 -0400, Rik van Riel wrote: > On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote: > > Although idle load balancing obviously only concern idle CPUs, it can > > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get > > rid > > of an idle load balancing duty once a tick fires while it runs a task > > and this can take a while in a nohz_full CPU. > > > > We could fix that and escape the idle load balancing duty from the > > very > > idle exit path but that would bring unecessary overhead. Lets just > > not > > bother and leave that job to housekeeping CPUs (those outside > > nohz_full > > range). The nohz_full CPUs simply don't want any disturbance. > > > > Signed-off-by: Frederic Weisbecker > > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: Rik van Riel > > Cc: Peter Zijlstra > > --- > > kernel/sched/fair.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index d711093..cfca960 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu) > > if (!cpu_active(cpu)) > > return; > > > > + /* Spare idle load balancing on CPUs that don't want to be > > disturbed */ > > + if (!is_housekeeping_cpu(cpu)) > > + return; > > + > > if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu))) > > return; > > I am not entirely convinced on this one. > > Doesn't the if (on_null_domain(cpu_rq(cpu)) test > a few lines down take care of this already? > > Do we want nohz_full to always automatically > imply that no idle balancing will happen, like > on isolated CPUs? IMO, nohz_full capable CPUs that are not isolated should automatically become housekeepers, and nohz_full _active_ upon becoming isolated. When a used as a housekeeper, you still pay a price for having the nohz_full capability available, but it doesn't have to be as high. In my kernels, I use cpusets to turn nohz on/off set wise, so CPUs can be ticking, dyntick, nohz_full or housekeeper, RT load balancing and cpupri on/off as well if you want to assume full responsibility. It's a tad (from box of xxl tads) ugly, but more flexible. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 18:29 +0200, Mike Galbraith wrote: > On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote: > > On Mon, 19 Jun 2017 16:13:41 +0200 > > Sebastian Andrzej Siewior <bige...@linutronix.de> wrote: > > > > > > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when > > > > waiter->savestate is true. And that should always be false for futex. > > > > > > you still have sleeping locks like the hb-lock (which might matter in > > > case the task blocks on the lock and does not continue for some reason). > > > > But then I'd expect this to be an issue with any spinlock in the kernel. > > Thing to do is set a trap for the bugger. See ! and ? vogelweide:~/:[1]# grep MIKE trace MIKE 913.728104: start MIKE futex_wait-7496 [031] ... 913.729160: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7497 MIKE futex_wait-7496 [031] ... 913.729253: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7499 MIKE futex_wait-7496 [031] ... 913.729307: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7500 MIKE futex_wait-7496 [031] ... 913.729348: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7501 MIKE futex_wait-7496 [031] d...2.. 913.729430: sched_switch: prev_comm=futex_wait prev_pid=7496 prev_prio=120 prev_state=S ==> next_comm=swapper/31 next_pid=0 next_prio=120 MIKE--futex_wait-7500 [005] d...2.. 920.040320: sched_switch: prev_comm=futex_wait prev_pid=7500 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120 MIKE futex_wait-7497 [058] d...211 920.060009: sched_waking: comm=futex_wait pid=7501 prio=120 target_cpu=044 MIKE futex_wait-7497 [058] d...311 920.060012: sched_wake_idle_without_ipi: cpu=44 MIKE futex_wait-7497 [058] d...311 920.060012: sched_wakeup: comm=futex_wait pid=7501 prio=120 target_cpu=044 MIKE--futex_wait-7497 [058] d...2.. 920.060035: sched_switch: prev_comm=futex_wait prev_pid=7497 prev_prio=120 prev_state=S ==> next_comm=swapper/58 next_pid=0 next_prio=120 MIKE futex_wait-7499 [043] dN..411 920.060035: sched_wakeup: comm=ktimersoftd/43 pid=438 prio=0 target_cpu=043 MIKE! futex_wait-7499 [043] .N..111 920.060037: wake_up_lock_sleeper: futex_wait:7497 state:1 saved_state:0 MIKE! futex_wait-7499 [043] dN..211 920.060037: try_to_wake_up: futex_wait:7497 WF_LOCK_SLEEPER !(p->state:1 & state:2) bailing MIKE 920.060037: futex_wait-7497 is never awakened again until ^C cleanup, 7499/7501 still standing MIKE futex_wait-7499 [043] 111 920.060066: wake_up_lock_sleeper: futex_wait:7501 state:2 saved_state:0 MIKE futex_wait-7499 [043] d...211 920.060066: try_to_wake_up: futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing MIKE futex_wait-7499 [043] 111 920.060606: wake_up_lock_sleeper: futex_wait:7501 state:2 saved_state:0 MIKE 920.355048: huh? wake_up_lock_sleeper sees state=2 but try_to_wake_up then sees state=0 and bails ?!? MIKE futex_wait-7501 [044] 111 920.355048: wake_up_lock_sleeper: futex_wait:7499 state:2 saved_state:0 MIKE futex_wait-7501 [044] d...211 920.355049: try_to_wake_up: futex_wait:7499 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing MIKE futex_wait-7499 [048] d..h311 920.355060: sched_stat_runtime: comm=futex_wait pid=7499 runtime=168537 [ns] vruntime=4850005377 [ns] MIKE futex_wait-7499 [048] d...311 920.355061: sched_waking: comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048 MIKE futex_wait-7499 [048] d...411 920.355062: sched_stat_runtime: comm=futex_wait pid=7499 runtime=2972 [ns] vruntime=4850008349 [ns] MIKE futex_wait-7499 [048] d.L.411 920.355063: sched_wakeup: comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048 MIKE futex_wait-7499 [048] d.L.311 920.355064: sched_waking: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048 MIKE futex_wait-7499 [048] dNL.411 920.355065: sched_wakeup: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048 MIKE futex_wait-7499 [048] .NL.111 920.355066: wake_up_lock_sleeper: futex_wait:7501 state:0 saved_state:0 MIKE futex_wait-7499 [048] dNL.211 920.355067: try_to_wake_up: futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing MIKE futex_wait-7499 [048] dNL.211 920.355067: sched_stat_runtime: comm=futex_wait pid=7499 runtime=2011 [ns] vruntime=4850010360 [ns] MIKE futex_wait-7499 [048] d...211 920.355068: sched_switch: prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R+ ==> next_comm=ktimersoftd/48 next_pid=488 next_prio=0 MIKE <...>-488 [048] d...2.. 920.355070: sched_switch: prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> next_comm=ksoftirqd/48 next_pid=489 next_prio=120 MIKE <...>-489 [048] d
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 18:29 +0200, Mike Galbraith wrote: > On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote: > > On Mon, 19 Jun 2017 16:13:41 +0200 > > Sebastian Andrzej Siewior wrote: > > > > > > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when > > > > waiter->savestate is true. And that should always be false for futex. > > > > > > you still have sleeping locks like the hb-lock (which might matter in > > > case the task blocks on the lock and does not continue for some reason). > > > > But then I'd expect this to be an issue with any spinlock in the kernel. > > Thing to do is set a trap for the bugger. See ! and ? vogelweide:~/:[1]# grep MIKE trace MIKE 913.728104: start MIKE futex_wait-7496 [031] ... 913.729160: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7497 MIKE futex_wait-7496 [031] ... 913.729253: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7499 MIKE futex_wait-7496 [031] ... 913.729307: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7500 MIKE futex_wait-7496 [031] ... 913.729348: sched_process_fork: comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7501 MIKE futex_wait-7496 [031] d...2.. 913.729430: sched_switch: prev_comm=futex_wait prev_pid=7496 prev_prio=120 prev_state=S ==> next_comm=swapper/31 next_pid=0 next_prio=120 MIKE--futex_wait-7500 [005] d...2.. 920.040320: sched_switch: prev_comm=futex_wait prev_pid=7500 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120 MIKE futex_wait-7497 [058] d...211 920.060009: sched_waking: comm=futex_wait pid=7501 prio=120 target_cpu=044 MIKE futex_wait-7497 [058] d...311 920.060012: sched_wake_idle_without_ipi: cpu=44 MIKE futex_wait-7497 [058] d...311 920.060012: sched_wakeup: comm=futex_wait pid=7501 prio=120 target_cpu=044 MIKE--futex_wait-7497 [058] d...2.. 920.060035: sched_switch: prev_comm=futex_wait prev_pid=7497 prev_prio=120 prev_state=S ==> next_comm=swapper/58 next_pid=0 next_prio=120 MIKE futex_wait-7499 [043] dN..411 920.060035: sched_wakeup: comm=ktimersoftd/43 pid=438 prio=0 target_cpu=043 MIKE! futex_wait-7499 [043] .N..111 920.060037: wake_up_lock_sleeper: futex_wait:7497 state:1 saved_state:0 MIKE! futex_wait-7499 [043] dN..211 920.060037: try_to_wake_up: futex_wait:7497 WF_LOCK_SLEEPER !(p->state:1 & state:2) bailing MIKE 920.060037: futex_wait-7497 is never awakened again until ^C cleanup, 7499/7501 still standing MIKE futex_wait-7499 [043] 111 920.060066: wake_up_lock_sleeper: futex_wait:7501 state:2 saved_state:0 MIKE futex_wait-7499 [043] d...211 920.060066: try_to_wake_up: futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing MIKE futex_wait-7499 [043] 111 920.060606: wake_up_lock_sleeper: futex_wait:7501 state:2 saved_state:0 MIKE 920.355048: huh? wake_up_lock_sleeper sees state=2 but try_to_wake_up then sees state=0 and bails ?!? MIKE futex_wait-7501 [044] 111 920.355048: wake_up_lock_sleeper: futex_wait:7499 state:2 saved_state:0 MIKE futex_wait-7501 [044] d...211 920.355049: try_to_wake_up: futex_wait:7499 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing MIKE futex_wait-7499 [048] d..h311 920.355060: sched_stat_runtime: comm=futex_wait pid=7499 runtime=168537 [ns] vruntime=4850005377 [ns] MIKE futex_wait-7499 [048] d...311 920.355061: sched_waking: comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048 MIKE futex_wait-7499 [048] d...411 920.355062: sched_stat_runtime: comm=futex_wait pid=7499 runtime=2972 [ns] vruntime=4850008349 [ns] MIKE futex_wait-7499 [048] d.L.411 920.355063: sched_wakeup: comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048 MIKE futex_wait-7499 [048] d.L.311 920.355064: sched_waking: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048 MIKE futex_wait-7499 [048] dNL.411 920.355065: sched_wakeup: comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048 MIKE futex_wait-7499 [048] .NL.111 920.355066: wake_up_lock_sleeper: futex_wait:7501 state:0 saved_state:0 MIKE futex_wait-7499 [048] dNL.211 920.355067: try_to_wake_up: futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing MIKE futex_wait-7499 [048] dNL.211 920.355067: sched_stat_runtime: comm=futex_wait pid=7499 runtime=2011 [ns] vruntime=4850010360 [ns] MIKE futex_wait-7499 [048] d...211 920.355068: sched_switch: prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R+ ==> next_comm=ktimersoftd/48 next_pid=488 next_prio=0 MIKE <...>-488 [048] d...2.. 920.355070: sched_switch: prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> next_comm=ksoftirqd/48 next_pid=489 next_prio=120 MIKE <...>-489 [048] d...2.. 920.355071: sched_stat_
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 18:27 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-19 18:14:52 [+0200], Mike Galbraith wrote: > > > > BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the > > have something resembling a life thing, and it did not stall in 50 > > iterations of performance/run.sh (bloody fickle thing). Hohum, take it > > for whatever you think it's worth. > > that sounds good because I tried your config on three boxes with the > same result. > So the issue remains on the DLxxx box which is 8socket NUMA box with > only one memory node, right? Yes. Now _I_ think (having spent days chasing) that it is in fact generic, but yes, all else aside, my highly utilized and very reliable up until now (and crippled) DL980 G7 very clearly states that we have a bug lurking. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 18:27 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-19 18:14:52 [+0200], Mike Galbraith wrote: > > > > BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the > > have something resembling a life thing, and it did not stall in 50 > > iterations of performance/run.sh (bloody fickle thing). Hohum, take it > > for whatever you think it's worth. > > that sounds good because I tried your config on three boxes with the > same result. > So the issue remains on the DLxxx box which is 8socket NUMA box with > only one memory node, right? Yes. Now _I_ think (having spent days chasing) that it is in fact generic, but yes, all else aside, my highly utilized and very reliable up until now (and crippled) DL980 G7 very clearly states that we have a bug lurking. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote: > On Mon, 19 Jun 2017 16:13:41 +0200 > Sebastian Andrzej Siewiorwrote: > > > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when > > > waiter->savestate is true. And that should always be false for futex. > > > > you still have sleeping locks like the hb-lock (which might matter in > > case the task blocks on the lock and does not continue for some reason). > > But then I'd expect this to be an issue with any spinlock in the kernel. Thing to do is set a trap for the bugger. I'll give that a go after I get some beans/biscuits work done. For the nonce, what I've reported is all I know. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote: > On Mon, 19 Jun 2017 16:13:41 +0200 > Sebastian Andrzej Siewior wrote: > > > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when > > > waiter->savestate is true. And that should always be false for futex. > > > > you still have sleeping locks like the hb-lock (which might matter in > > case the task blocks on the lock and does not continue for some reason). > > But then I'd expect this to be an issue with any spinlock in the kernel. Thing to do is set a trap for the bugger. I'll give that a go after I get some beans/biscuits work done. For the nonce, what I've reported is all I know. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 17:03 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-19 16:36:22 [+0200], Mike Galbraith wrote: > > On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote: > > > > > > I am suppressed that your desktop shows any symptoms on rt21. I tried my > > > smaller AMD box (A10), an Intel two sockets and a four socket box. Each > > > of them was fine with the run.sh and manual futex_wait invocation. > > > Could you please send the config of your desktop box? > > > > (sent offline) > > btw: I tried booting the two and four sockets box with less memory so > that the box is using just one memory node but nothing changed. Let me > try with the config of yours. BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the have something resembling a life thing, and it did not stall in 50 iterations of performance/run.sh (bloody fickle thing). Hohum, take it for whatever you think it's worth. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 17:03 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-19 16:36:22 [+0200], Mike Galbraith wrote: > > On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote: > > > > > > I am suppressed that your desktop shows any symptoms on rt21. I tried my > > > smaller AMD box (A10), an Intel two sockets and a four socket box. Each > > > of them was fine with the run.sh and manual futex_wait invocation. > > > Could you please send the config of your desktop box? > > > > (sent offline) > > btw: I tried booting the two and four sockets box with less memory so > that the box is using just one memory node but nothing changed. Let me > try with the config of yours. BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the have something resembling a life thing, and it did not stall in 50 iterations of performance/run.sh (bloody fickle thing). Hohum, take it for whatever you think it's worth. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote: > > I am suppressed that your desktop shows any symptoms on rt21. I tried my > smaller AMD box (A10), an Intel two sockets and a four socket box. Each > of them was fine with the run.sh and manual futex_wait invocation. > Could you please send the config of your desktop box? (sent offline)
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote: > > I am suppressed that your desktop shows any symptoms on rt21. I tried my > smaller AMD box (A10), an Intel two sockets and a four socket box. Each > of them was fine with the run.sh and manual futex_wait invocation. > Could you please send the config of your desktop box? (sent offline)
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 13:50 +0200, Sebastian Andrzej Siewior wrote: > > rt20…rt21 is just > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches Yup. I got there via git checkout v4.9.x-rtx, build/test. > Let me verify that here and fire maybe the four socket box. One thing perhaps interesting about the DL980 is that it is 8 socket, but with 7 memoryless nodes (poor thing). I wouldn't be surprised if you had to try a lot harder than I. DL980 now seems happy with master-rt21 fwtw. It's not virgin though, not even considering continuous merge 4.9..today. Hopefully desktop will cease with the comatose widget business too, I really don't want to go there. These things give tglx headaches, might well blow the point clean off of my head. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 13:50 +0200, Sebastian Andrzej Siewior wrote: > > rt20…rt21 is just > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches Yup. I got there via git checkout v4.9.x-rtx, build/test. > Let me verify that here and fire maybe the four socket box. One thing perhaps interesting about the DL980 is that it is 8 socket, but with 7 memoryless nodes (poor thing). I wouldn't be surprised if you had to try a lot harder than I. DL980 now seems happy with master-rt21 fwtw. It's not virgin though, not even considering continuous merge 4.9..today. Hopefully desktop will cease with the comatose widget business too, I really don't want to go there. These things give tglx headaches, might well blow the point clean off of my head. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 12:44 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-19 12:14:51 [+0200], Mike Galbraith wrote: > > Ok, doesn't matter for RT testing. What does matter, is that... > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 30b24f774198..10e832da70b6 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process); > > */ > > int wake_up_lock_sleeper(struct task_struct *p) > > { > > - return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER); > > + return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER); > > } > > > > ...appears to be inducing lost futex wakeups. > > has this something to do with "rtmutex: Fix lock stealing logic" ? Nope. The above is fallout of me being inspired me to stare, that inspiration having come initially from seeing lost wakeup symptoms in my desktop, telling me something has gone sour in rt-land, so a hunting I did go. I expected to find I had made a booboo in my trees, but maybe not, as I found a suspiciously similar symptom to what I was looking for in virgin source. > > Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980 > > running otherwise absolutely pristine 4.9-rt21, after having double > > verified that rt20 works fine. Now to go back to 4.11/master/tip-rt, > > make sure that the little bugger really really REALLY ain't fscking > > with me for the sheer fun of it, futexes being made of pure evil :) > > So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in > general or just with "futex_wait -n 4" ? -rt20 is verified to work fine, -rt21 starts hanging with futextest. The futex_wait -n 4 testcase was distilled out of seeing the full futextest/run.sh hanging. The only symptom I've _seen_ on the DL980 is futextest hanging. On the desktop, I've seen more, and may still, I'll know when I see or don't see desktop gizmos occasionally go comatose. > > My testcase is to run futex_wait -n 4 in a modest sized loop. Odd > > thing is that it only reproduces on the DL980 if I let it use multiple > > sockets, pin it to one, and all is peachy, (rather seems to be given) > > whereas on desktop box, the hang is far more intermittent, but there. > > do I parse it right, as v4.9-rt21 (without the change above) works with > the testcase mentioned if you pin it to one socket but does not work if > you let it use multiple sockets. > And your desktop box hangs no matter what? No no, desktop box will reproduce, but not nearly as reliably as the 8 socket box does, but yes, it seems to work fine on the DL980 when pinned to one socket. I was testing 4.9-rt because hunt was in progress when 4.11-rt was born. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 12:44 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-19 12:14:51 [+0200], Mike Galbraith wrote: > > Ok, doesn't matter for RT testing. What does matter, is that... > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 30b24f774198..10e832da70b6 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process); > > */ > > int wake_up_lock_sleeper(struct task_struct *p) > > { > > - return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER); > > + return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER); > > } > > > > ...appears to be inducing lost futex wakeups. > > has this something to do with "rtmutex: Fix lock stealing logic" ? Nope. The above is fallout of me being inspired me to stare, that inspiration having come initially from seeing lost wakeup symptoms in my desktop, telling me something has gone sour in rt-land, so a hunting I did go. I expected to find I had made a booboo in my trees, but maybe not, as I found a suspiciously similar symptom to what I was looking for in virgin source. > > Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980 > > running otherwise absolutely pristine 4.9-rt21, after having double > > verified that rt20 works fine. Now to go back to 4.11/master/tip-rt, > > make sure that the little bugger really really REALLY ain't fscking > > with me for the sheer fun of it, futexes being made of pure evil :) > > So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in > general or just with "futex_wait -n 4" ? -rt20 is verified to work fine, -rt21 starts hanging with futextest. The futex_wait -n 4 testcase was distilled out of seeing the full futextest/run.sh hanging. The only symptom I've _seen_ on the DL980 is futextest hanging. On the desktop, I've seen more, and may still, I'll know when I see or don't see desktop gizmos occasionally go comatose. > > My testcase is to run futex_wait -n 4 in a modest sized loop. Odd > > thing is that it only reproduces on the DL980 if I let it use multiple > > sockets, pin it to one, and all is peachy, (rather seems to be given) > > whereas on desktop box, the hang is far more intermittent, but there. > > do I parse it right, as v4.9-rt21 (without the change above) works with > the testcase mentioned if you pin it to one socket but does not work if > you let it use multiple sockets. > And your desktop box hangs no matter what? No no, desktop box will reproduce, but not nearly as reliably as the 8 socket box does, but yes, it seems to work fine on the DL980 when pinned to one socket. I was testing 4.9-rt because hunt was in progress when 4.11-rt was born. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 10:52 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote: > > > During that rebase, migrate_disable() was changed to no longer map to > > preempt_disable() for nonrt, but some patches still assume it does. It > > now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces > > grumbling in nonrt builds with PREEMPT_COUNT enabled. > > argh, right. It was planned to get it merged upstream but due to > $reasons we never got that far. For that reason I would simply revert > that change and let migrate_disable() map to preempt_disable() as it did > earlier. Ok, doesn't matter for RT testing. What does matter, is that... diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 30b24f774198..10e832da70b6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process); */ int wake_up_lock_sleeper(struct task_struct *p) { - return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER); + return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER); } ...appears to be inducing lost futex wakeups. Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980 running otherwise absolutely pristine 4.9-rt21, after having double verified that rt20 works fine. Now to go back to 4.11/master/tip-rt, make sure that the little bugger really really REALLY ain't fscking with me for the sheer fun of it, futexes being made of pure evil :) My testcase is to run futex_wait -n 4 in a modest sized loop. Odd thing is that it only reproduces on the DL980 if I let it use multiple sockets, pin it to one, and all is peachy, (rather seems to be given) whereas on desktop box, the hang is far more intermittent, but there. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Mon, 2017-06-19 at 10:52 +0200, Sebastian Andrzej Siewior wrote: > On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote: > > > During that rebase, migrate_disable() was changed to no longer map to > > preempt_disable() for nonrt, but some patches still assume it does. It > > now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces > > grumbling in nonrt builds with PREEMPT_COUNT enabled. > > argh, right. It was planned to get it merged upstream but due to > $reasons we never got that far. For that reason I would simply revert > that change and let migrate_disable() map to preempt_disable() as it did > earlier. Ok, doesn't matter for RT testing. What does matter, is that... diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 30b24f774198..10e832da70b6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process); */ int wake_up_lock_sleeper(struct task_struct *p) { - return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER); + return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER); } ...appears to be inducing lost futex wakeups. Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980 running otherwise absolutely pristine 4.9-rt21, after having double verified that rt20 works fine. Now to go back to 4.11/master/tip-rt, make sure that the little bugger really really REALLY ain't fscking with me for the sheer fun of it, futexes being made of pure evil :) My testcase is to run futex_wait -n 4 in a modest sized loop. Odd thing is that it only reproduces on the DL980 if I let it use multiple sockets, pin it to one, and all is peachy, (rather seems to be given) whereas on desktop box, the hang is far more intermittent, but there. -Mike
[patch-rt] rtmutex: Fix lock stealing logic
1. When trying to acquire an rtmutex, we first try to grab it without queueing the waiter, and explicitly check for that initial attempt in the !waiter path of __try_to_take_rt_mutex(). Checking whether the lock taker is top waiter before allowing a steal attempt in that path is a thinko: the lock taker has not yet blocked. 2. It seems wrong to change the definition of rt_mutex_waiter_less() to mean less or perhaps equal when we have an rt_mutex_waiter_equal(). Remove the thinko, and implement rt_mutex_stealable(steal_mode) using a restored rt_mutex_waiter_less(), and rt_mutex_waiter_equal(). Signed-off-by: Mike Galbraith <efa...@gmx.de> Fixes: f34d077c6f28 ("rt: Add the preempt-rt lock replacement APIs") --- kernel/locking/rtmutex.c | 69 +++ 1 file changed, 35 insertions(+), 34 deletions(-) --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe( } #endif -#define STEAL_NORMAL 0 -#define STEAL_LATERAL 1 - /* * Only use with rt_mutex_waiter_{less,equal}() */ -#define task_to_waiter(p) \ - &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } +#define task_to_waiter(p) &(struct rt_mutex_waiter) \ + { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) } static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, -struct rt_mutex_waiter *right, int mode) +struct rt_mutex_waiter *right) { - if (mode == STEAL_NORMAL) { - if (left->prio < right->prio) - return 1; - } else { - if (left->prio <= right->prio) - return 1; - } + if (left->prio < right->prio) + return 1; + /* * If both waiters have dl_prio(), we check the deadlines of the * associated tasks. @@ -287,6 +280,25 @@ rt_mutex_waiter_equal(struct rt_mutex_wa return 1; } +#define STEAL_NORMAL 0 +#define STEAL_LATERAL 1 + +static inline int rt_mutex_stealable(struct rt_mutex_waiter *thief, +struct rt_mutex_waiter *victim, int mode) +{ + if (rt_mutex_waiter_less(thief, victim)) + return 1; + + /* +* Note that RT tasks are excluded from lateral-steals +* to prevent the introduction of an unbounded latency. +*/ + if (mode == STEAL_NORMAL || rt_task(thief->task)) + return 0; + + return rt_mutex_waiter_equal(thief, victim); +} + static void rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { @@ -298,7 +310,7 @@ rt_mutex_enqueue(struct rt_mutex *lock, while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -337,7 +349,7 @@ rt_mutex_enqueue_pi(struct task_struct * while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -847,6 +859,7 @@ static int rt_mutex_adjust_prio_chain(st * @task: The task which wants to acquire the lock * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL + * @mode: Lock steal mode (STEAL_NORMAL, STEAL_LATERAL) */ static int __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -890,10 +903,9 @@ static int __try_to_take_rt_mutex(struct * @lock, give up. */ if (waiter != rt_mutex_top_waiter(lock)) { - /* XXX rt_mutex_waiter_less() ? */ + /* XXX rt_mutex_stealable() ? */ return 0; } - /* * We can acquire the lock. Remove the waiter from the * lock waiters tree. @@ -910,25 +922,14 @@ static int __try_to_take_rt_mutex(struct * not need to be dequeued. */ if (rt_mutex_has_waiters(lock)) { - struct task_struct *pown = rt_mutex_top_waiter(lock)->task; - - if (task != pown) - return 0; - - /* -* Note that RT tasks are ex
[patch-rt] rtmutex: Fix lock stealing logic
1. When trying to acquire an rtmutex, we first try to grab it without queueing the waiter, and explicitly check for that initial attempt in the !waiter path of __try_to_take_rt_mutex(). Checking whether the lock taker is top waiter before allowing a steal attempt in that path is a thinko: the lock taker has not yet blocked. 2. It seems wrong to change the definition of rt_mutex_waiter_less() to mean less or perhaps equal when we have an rt_mutex_waiter_equal(). Remove the thinko, and implement rt_mutex_stealable(steal_mode) using a restored rt_mutex_waiter_less(), and rt_mutex_waiter_equal(). Signed-off-by: Mike Galbraith Fixes: f34d077c6f28 ("rt: Add the preempt-rt lock replacement APIs") --- kernel/locking/rtmutex.c | 69 +++ 1 file changed, 35 insertions(+), 34 deletions(-) --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe( } #endif -#define STEAL_NORMAL 0 -#define STEAL_LATERAL 1 - /* * Only use with rt_mutex_waiter_{less,equal}() */ -#define task_to_waiter(p) \ - &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } +#define task_to_waiter(p) &(struct rt_mutex_waiter) \ + { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) } static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, -struct rt_mutex_waiter *right, int mode) +struct rt_mutex_waiter *right) { - if (mode == STEAL_NORMAL) { - if (left->prio < right->prio) - return 1; - } else { - if (left->prio <= right->prio) - return 1; - } + if (left->prio < right->prio) + return 1; + /* * If both waiters have dl_prio(), we check the deadlines of the * associated tasks. @@ -287,6 +280,25 @@ rt_mutex_waiter_equal(struct rt_mutex_wa return 1; } +#define STEAL_NORMAL 0 +#define STEAL_LATERAL 1 + +static inline int rt_mutex_stealable(struct rt_mutex_waiter *thief, +struct rt_mutex_waiter *victim, int mode) +{ + if (rt_mutex_waiter_less(thief, victim)) + return 1; + + /* +* Note that RT tasks are excluded from lateral-steals +* to prevent the introduction of an unbounded latency. +*/ + if (mode == STEAL_NORMAL || rt_task(thief->task)) + return 0; + + return rt_mutex_waiter_equal(thief, victim); +} + static void rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { @@ -298,7 +310,7 @@ rt_mutex_enqueue(struct rt_mutex *lock, while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -337,7 +349,7 @@ rt_mutex_enqueue_pi(struct task_struct * while (*link) { parent = *link; entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry); - if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) { + if (rt_mutex_waiter_less(waiter, entry)) { link = >rb_left; } else { link = >rb_right; @@ -847,6 +859,7 @@ static int rt_mutex_adjust_prio_chain(st * @task: The task which wants to acquire the lock * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL + * @mode: Lock steal mode (STEAL_NORMAL, STEAL_LATERAL) */ static int __try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -890,10 +903,9 @@ static int __try_to_take_rt_mutex(struct * @lock, give up. */ if (waiter != rt_mutex_top_waiter(lock)) { - /* XXX rt_mutex_waiter_less() ? */ + /* XXX rt_mutex_stealable() ? */ return 0; } - /* * We can acquire the lock. Remove the waiter from the * lock waiters tree. @@ -910,25 +922,14 @@ static int __try_to_take_rt_mutex(struct * not need to be dequeued. */ if (rt_mutex_has_waiters(lock)) { - struct task_struct *pown = rt_mutex_top_waiter(lock)->task; - - if (task != pown) - return 0; - - /* -* Note that RT tasks are excluded from lateral-steals -
Re: [ANNOUNCE] v4.11.5-rt1
On Sat, 2017-06-17 at 10:14 +0200, Mike Galbraith wrote: > >... the RT workaround in futex.c induces > grumbling in nonrt builds with PREEMPT_COUNT enabled. A trivial way to fix it up is to... futex: Fix migrate_disable/enable workaround for !PREEMPT_RT_FULL The imbalance fixed by aed0f50e58eb only exists for PREEMPT_RT_FULL, and creates for other PREEMPT_COUNT configs. Create/use _rt variants of migrate_disable/enable() which are compiled away when not needed. Signed-off-by: Mike Galbraith <efa...@gmx.de> Fixes: aed0f50e58eb ("futex: workaround migrate_disable/enable in different context") --- include/linux/preempt.h | 11 +++ kernel/futex.c |8 2 files changed, 15 insertions(+), 4 deletions(-) --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -227,12 +227,21 @@ do { \ extern void migrate_disable(void); extern void migrate_enable(void); +#ifdef CONFIG_PREEMPT_RT_FULL +#define migrate_disable_rt() migrate_disable() +#define migrate_enable_rt()migrate_enable() +#else +static inline void migrate_disable_rt(void) { } +static inline void migrate_enable_rt(void) { } +#endif int __migrate_disabled(struct task_struct *p); #else #define migrate_disable() barrier() #define migrate_enable() barrier() +static inline void migrate_disable_rt(void) { } +static inline void migrate_enable_rt(void) { } static inline int __migrate_disabled(struct task_struct *p) { return 0; @@ -323,6 +332,8 @@ do { \ #define migrate_disable() barrier() #define migrate_enable() barrier() +static inline void migrate_disable_rt(void) { } +static inline void migrate_enable_rt(void) { } static inline int __migrate_disabled(struct task_struct *p) { --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2690,12 +2690,12 @@ static int futex_lock_pi(u32 __user *uad * one migrate_disable() pending in the slow-path which is reversed * after the raw_spin_unlock_irq() where we leave the atomic context. */ - migrate_disable(); + migrate_disable_rt(); spin_unlock(q.lock_ptr); ret = __rt_mutex_start_proxy_lock(_state->pi_mutex, _waiter, current); raw_spin_unlock_irq(_state->pi_mutex.wait_lock); - migrate_enable(); + migrate_enable_rt(); if (ret) { if (ret == 1) @@ -2846,13 +2846,13 @@ static int futex_unlock_pi(u32 __user *u * won't undo the migrate_disable() which was issued when * locking hb->lock. */ - migrate_disable(); + migrate_disable_rt(); spin_unlock(>lock); /* Drops pi_state->pi_mutex.wait_lock */ ret = wake_futex_pi(uaddr, uval, pi_state); - migrate_enable(); + migrate_enable_rt(); put_pi_state(pi_state);
Re: [ANNOUNCE] v4.11.5-rt1
On Sat, 2017-06-17 at 10:14 +0200, Mike Galbraith wrote: > >... the RT workaround in futex.c induces > grumbling in nonrt builds with PREEMPT_COUNT enabled. A trivial way to fix it up is to... futex: Fix migrate_disable/enable workaround for !PREEMPT_RT_FULL The imbalance fixed by aed0f50e58eb only exists for PREEMPT_RT_FULL, and creates for other PREEMPT_COUNT configs. Create/use _rt variants of migrate_disable/enable() which are compiled away when not needed. Signed-off-by: Mike Galbraith Fixes: aed0f50e58eb ("futex: workaround migrate_disable/enable in different context") --- include/linux/preempt.h | 11 +++ kernel/futex.c |8 2 files changed, 15 insertions(+), 4 deletions(-) --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -227,12 +227,21 @@ do { \ extern void migrate_disable(void); extern void migrate_enable(void); +#ifdef CONFIG_PREEMPT_RT_FULL +#define migrate_disable_rt() migrate_disable() +#define migrate_enable_rt()migrate_enable() +#else +static inline void migrate_disable_rt(void) { } +static inline void migrate_enable_rt(void) { } +#endif int __migrate_disabled(struct task_struct *p); #else #define migrate_disable() barrier() #define migrate_enable() barrier() +static inline void migrate_disable_rt(void) { } +static inline void migrate_enable_rt(void) { } static inline int __migrate_disabled(struct task_struct *p) { return 0; @@ -323,6 +332,8 @@ do { \ #define migrate_disable() barrier() #define migrate_enable() barrier() +static inline void migrate_disable_rt(void) { } +static inline void migrate_enable_rt(void) { } static inline int __migrate_disabled(struct task_struct *p) { --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2690,12 +2690,12 @@ static int futex_lock_pi(u32 __user *uad * one migrate_disable() pending in the slow-path which is reversed * after the raw_spin_unlock_irq() where we leave the atomic context. */ - migrate_disable(); + migrate_disable_rt(); spin_unlock(q.lock_ptr); ret = __rt_mutex_start_proxy_lock(_state->pi_mutex, _waiter, current); raw_spin_unlock_irq(_state->pi_mutex.wait_lock); - migrate_enable(); + migrate_enable_rt(); if (ret) { if (ret == 1) @@ -2846,13 +2846,13 @@ static int futex_unlock_pi(u32 __user *u * won't undo the migrate_disable() which was issued when * locking hb->lock. */ - migrate_disable(); + migrate_disable_rt(); spin_unlock(>lock); /* Drops pi_state->pi_mutex.wait_lock */ ret = wake_futex_pi(uaddr, uval, pi_state); - migrate_enable(); + migrate_enable_rt(); put_pi_state(pi_state);
Re: [ANNOUNCE] v4.11.5-rt1
On Fri, 2017-06-16 at 12:56 +0200, Sebastian Andrzej Siewior wrote: > Dear RT folks! > > I'm pleased to announce the v4.11.5-rt1 patch set. > The release has been delayed due to the hotplug rework that was started > before the final v4.11 release. However the new code has not been > stabilized yet and it was decided to bring back the old patches before > delaying the v4.11-RT release any longer. We will try to complete the > hotplug rework (for RT) in the v4.11 cycle. > > Changes since v4.9.39-rt21: > > - rebase to v4.11.5 Hi Sebastian, I noticed a couple things wrt migrate_disable() changes... During that rebase, migrate_disable() was changed to no longer map to preempt_disable() for nonrt, but some patches still assume it does. It now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces grumbling in nonrt builds with PREEMPT_COUNT enabled. -Mike
Re: [ANNOUNCE] v4.11.5-rt1
On Fri, 2017-06-16 at 12:56 +0200, Sebastian Andrzej Siewior wrote: > Dear RT folks! > > I'm pleased to announce the v4.11.5-rt1 patch set. > The release has been delayed due to the hotplug rework that was started > before the final v4.11 release. However the new code has not been > stabilized yet and it was decided to bring back the old patches before > delaying the v4.11-RT release any longer. We will try to complete the > hotplug rework (for RT) in the v4.11 cycle. > > Changes since v4.9.39-rt21: > > - rebase to v4.11.5 Hi Sebastian, I noticed a couple things wrt migrate_disable() changes... During that rebase, migrate_disable() was changed to no longer map to preempt_disable() for nonrt, but some patches still assume it does. It now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces grumbling in nonrt builds with PREEMPT_COUNT enabled. -Mike
Re: kernel BUG at kernel/locking/rtmutex.c:1027
On Thu, 2017-06-08 at 07:01 +, Feng Feng24 Liu wrote: > > Our kernel version is: kernel4.4.6-rt14 > Latest 4.4-rt is 4.4.70-rt83...
Re: kernel BUG at kernel/locking/rtmutex.c:1027
On Thu, 2017-06-08 at 07:01 +, Feng Feng24 Liu wrote: > > Our kernel version is: kernel4.4.6-rt14 > Latest 4.4-rt is 4.4.70-rt83...
Re: [PATCH] Fix loop device flush before configure v2
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote: > This condition check was exist at before commit b5dd2f6047ca ("block: loop: > improve performance via blk-mq") When add MQ support to loop device, it be > removed because the member of '->lo_thread' be removed. And then upstream > add '->worker_task', I think they forget add it to here. > > When I install SLES-12 product is base on 4.4 kernel I found installer will > hang +60 second at scan disks. and I found LVM tools would take this action. > finally I found this problem is more obvious on AMD platform. This problem > will impact all scenarios that scan loop devcie. > > When the loop device didn't configure backing file or Request Queue, we > shouldn't to cost a lot of time to flush it. The changelog sounds odd to me, perhaps reword/condense a bit?... While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq"). Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior. Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done Test output: stock patched /dev/loop018.1217e-058.3842e-05 /dev/loop1 6.1114e-050.000147979 /dev/loop100.414701 0.000116564 /dev/loop110.74746.7942e-05 /dev/loop120.747986 8.9082e-05 /dev/loop130.746532 7.4799e-05 /dev/loop140.480041 9.3926e-05 /dev/loop151.26453 7.2522e-05 Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. Reviewed-by: Hannes ReineckeSigned-off-by: James Wang Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 48f6fa6f810e..2e5b8538760c 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct > file *file) > */ > static int loop_flush(struct loop_device *lo) > { > + /* loop not yet configured, no running thread, nothing to flush */ > + if (lo->lo_state != Lo_bound) > + return 0; > return loop_switch(lo, NULL); > } >
Re: [PATCH] Fix loop device flush before configure v2
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote: > This condition check was exist at before commit b5dd2f6047ca ("block: loop: > improve performance via blk-mq") When add MQ support to loop device, it be > removed because the member of '->lo_thread' be removed. And then upstream > add '->worker_task', I think they forget add it to here. > > When I install SLES-12 product is base on 4.4 kernel I found installer will > hang +60 second at scan disks. and I found LVM tools would take this action. > finally I found this problem is more obvious on AMD platform. This problem > will impact all scenarios that scan loop devcie. > > When the loop device didn't configure backing file or Request Queue, we > shouldn't to cost a lot of time to flush it. The changelog sounds odd to me, perhaps reword/condense a bit?... While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq"). Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior. Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done Test output: stock patched /dev/loop018.1217e-058.3842e-05 /dev/loop1 6.1114e-050.000147979 /dev/loop100.414701 0.000116564 /dev/loop110.74746.7942e-05 /dev/loop120.747986 8.9082e-05 /dev/loop130.746532 7.4799e-05 /dev/loop140.480041 9.3926e-05 /dev/loop151.26453 7.2522e-05 Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. Reviewed-by: Hannes Reinecke Signed-off-by: James Wang Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 48f6fa6f810e..2e5b8538760c 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct > file *file) > */ > static int loop_flush(struct loop_device *lo) > { > + /* loop not yet configured, no running thread, nothing to flush */ > + if (lo->lo_state != Lo_bound) > + return 0; > return loop_switch(lo, NULL); > } >
Re: [lkp-robot] [tty] 52b03e644f: calltrace:flush_to_ldisc
On Wed, 2017-06-07 at 10:07 +0800, kernel test robot wrote: > FYI, we noticed the following commit: > > commit: 52b03e644f702dbcb252f6aec92fc0f0d6e29f78 ("tty: fix port buffer > locking V2") > url: > https://github.com/0day-ci/linux/commits/Mike-Galbraith/tty-fix-port-buffer-locking-V2/20170605-134432 > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing ? Patchlet has been properly vaporized, no longer exists. -Mike
Re: [lkp-robot] [tty] 52b03e644f: calltrace:flush_to_ldisc
On Wed, 2017-06-07 at 10:07 +0800, kernel test robot wrote: > FYI, we noticed the following commit: > > commit: 52b03e644f702dbcb252f6aec92fc0f0d6e29f78 ("tty: fix port buffer > locking V2") > url: > https://github.com/0day-ci/linux/commits/Mike-Galbraith/tty-fix-port-buffer-locking-V2/20170605-134432 > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing ? Patchlet has been properly vaporized, no longer exists. -Mike
Re: [bisected] Re: tty lockdep trace
On Mon, 2017-06-05 at 16:09 +0100, Alan Cox wrote: > On Sun, 04 Jun 2017 13:34:31 +0200 > Mike Galbraith <efa...@gmx.de> wrote: > > > On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote: > > > > > > I don't know how you did it, but this passes my testing (reproducers for > > > both the original issue and the lockdep splat/hang). > > > > I suppose I can sign it off, see if that inspires anyone to come up > > with something better. > > > > drivers/tty: Fix 925bb1ce47f4 circular locking dependency > > This is still completely broken and has the same underlying flaw as the > original. Thanks for looking. It didn't look the least bit lovely, it being equally busted (well, ever so slightly less, lockdep did sthu;) is not a shocker. Off to the bin ya go little patchlet. -Mike
Re: [bisected] Re: tty lockdep trace
On Mon, 2017-06-05 at 16:09 +0100, Alan Cox wrote: > On Sun, 04 Jun 2017 13:34:31 +0200 > Mike Galbraith wrote: > > > On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote: > > > > > > I don't know how you did it, but this passes my testing (reproducers for > > > both the original issue and the lockdep splat/hang). > > > > I suppose I can sign it off, see if that inspires anyone to come up > > with something better. > > > > drivers/tty: Fix 925bb1ce47f4 circular locking dependency > > This is still completely broken and has the same underlying flaw as the > original. Thanks for looking. It didn't look the least bit lovely, it being equally busted (well, ever so slightly less, lockdep did sthu;) is not a shocker. Off to the bin ya go little patchlet. -Mike
Re: [patch] tty: fix port buffer locking V2
On Mon, 2017-06-05 at 08:48 +0200, Greg Kroah-Hartman wrote: > > Can you redo this against 4.12-rc4 which has the original tty locking > patch reverted? It's against 4.12-rc4. -Mike
Re: [patch] tty: fix port buffer locking V2
On Mon, 2017-06-05 at 08:48 +0200, Greg Kroah-Hartman wrote: > > Can you redo this against 4.12-rc4 which has the original tty locking > patch reverted? It's against 4.12-rc4. -Mike
[patch] tty: fix port buffer locking V2
This is just in case. While it works, I consider it to be diagnostic data for those unfortunate enough to be intimate with tty locking :) --- V1 (925bb1ce47f4) changelog: tty_insert_flip_string_fixed_flag() is racy against itself when called from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc() workqueue path [2]. The problem is that port->buf.tail->used is modified without consistent locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue path takes ldata->output_lock. We cannot simply take ldata->output_lock, since that is specific to the N_TTY line discipline. It might seem natural to try to take port->buf.lock inside tty_insert_flip_string_fixed_flag() and friends (where port->buf is actually used/modified), but this creates problems for flush_to_ldisc() which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem, and ldata->output_lock. Therefore, the simplest solution for now seems to be to take tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock is also used in the write path [3] with a consistent ordering. [1]: Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_send_xchar // down_read(_tty->termios_rwsem) // mutex_lock(>atomic_write_lock) n_tty_ioctl_helper n_tty_ioctl tty_ioctl // down_read(>ldisc_sem) do_vfs_ioctl SyS_ioctl [2]: Workqueue: events_unbound flush_to_ldisc Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_put_char __process_echoes commit_echoes // mutex_lock(>output_lock) n_tty_receive_buf_common n_tty_receive_buf2 tty_ldisc_receive_buf // down_read(_tty->termios_rwsem) tty_port_default_receive_buf // down_read(>ldisc_sem) flush_to_ldisc // mutex_lock(>buf.lock) process_one_work [3]: Call Trace: tty_insert_flip_string_fixed_flag pty_write n_tty_write// mutex_lock(>output_lock) // down_read(>termios_rwsem) do_tty_write (inline) // mutex_lock(>atomic_write_lock) tty_write // down_read(>ldisc_sem) __vfs_write vfs_write SyS_write The bug can result in about a dozen different crashes depending on what exactly gets corrupted when port->buf.tail->used points outside the buffer. The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is always welcome. Found using syzkaller. V2: The V1 solution induced an ordering issue, holding buf->lock while acquiring tty->atomic_write_lock. Resolve it by moving acquisition to flush_to_ldisc(), prior to acquisition of buf->lock. Credit to Vegard Nossum for problem analysis/resolution, blame to me for trivial adaptation thereof. Signed-off-by: Mike Galbraith <efa...@gmx.de> Cc: Vegard Nossum <vegard.nos...@oracle.com> Cc: <sta...@vger.kernel.org> --- drivers/tty/tty_buffer.c | 10 ++ 1 file changed, 10 insertions(+) --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s { struct tty_port *port = container_of(work, struct tty_port, buf.work); struct tty_bufhead *buf = >buf; + struct tty_struct *tty = READ_ONCE(port->itty); + struct tty_ldisc *disc = NULL; + if (tty) + disc = tty_ldisc_ref(tty); + if (disc) + mutex_lock(>atomic_write_lock); mutex_lock(>lock); while (1) { @@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s } mutex_unlock(>lock); + if (disc) { + mutex_unlock(>atomic_write_lock); + tty_ldisc_deref(disc); + } }
[patch] tty: fix port buffer locking V2
This is just in case. While it works, I consider it to be diagnostic data for those unfortunate enough to be intimate with tty locking :) --- V1 (925bb1ce47f4) changelog: tty_insert_flip_string_fixed_flag() is racy against itself when called from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc() workqueue path [2]. The problem is that port->buf.tail->used is modified without consistent locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue path takes ldata->output_lock. We cannot simply take ldata->output_lock, since that is specific to the N_TTY line discipline. It might seem natural to try to take port->buf.lock inside tty_insert_flip_string_fixed_flag() and friends (where port->buf is actually used/modified), but this creates problems for flush_to_ldisc() which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem, and ldata->output_lock. Therefore, the simplest solution for now seems to be to take tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock is also used in the write path [3] with a consistent ordering. [1]: Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_send_xchar // down_read(_tty->termios_rwsem) // mutex_lock(>atomic_write_lock) n_tty_ioctl_helper n_tty_ioctl tty_ioctl // down_read(>ldisc_sem) do_vfs_ioctl SyS_ioctl [2]: Workqueue: events_unbound flush_to_ldisc Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_put_char __process_echoes commit_echoes // mutex_lock(>output_lock) n_tty_receive_buf_common n_tty_receive_buf2 tty_ldisc_receive_buf // down_read(_tty->termios_rwsem) tty_port_default_receive_buf // down_read(>ldisc_sem) flush_to_ldisc // mutex_lock(>buf.lock) process_one_work [3]: Call Trace: tty_insert_flip_string_fixed_flag pty_write n_tty_write// mutex_lock(>output_lock) // down_read(>termios_rwsem) do_tty_write (inline) // mutex_lock(>atomic_write_lock) tty_write // down_read(>ldisc_sem) __vfs_write vfs_write SyS_write The bug can result in about a dozen different crashes depending on what exactly gets corrupted when port->buf.tail->used points outside the buffer. The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is always welcome. Found using syzkaller. V2: The V1 solution induced an ordering issue, holding buf->lock while acquiring tty->atomic_write_lock. Resolve it by moving acquisition to flush_to_ldisc(), prior to acquisition of buf->lock. Credit to Vegard Nossum for problem analysis/resolution, blame to me for trivial adaptation thereof. Signed-off-by: Mike Galbraith Cc: Vegard Nossum Cc: --- drivers/tty/tty_buffer.c | 10 ++ 1 file changed, 10 insertions(+) --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s { struct tty_port *port = container_of(work, struct tty_port, buf.work); struct tty_bufhead *buf = >buf; + struct tty_struct *tty = READ_ONCE(port->itty); + struct tty_ldisc *disc = NULL; + if (tty) + disc = tty_ldisc_ref(tty); + if (disc) + mutex_lock(>atomic_write_lock); mutex_lock(>lock); while (1) { @@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s } mutex_unlock(>lock); + if (disc) { + mutex_unlock(>atomic_write_lock); + tty_ldisc_deref(disc); + } }
Re: [bisected] Re: tty lockdep trace
On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote: > > I don't know how you did it, but this passes my testing (reproducers for > both the original issue and the lockdep splat/hang). I suppose I can sign it off, see if that inspires anyone to come up with something better. drivers/tty: Fix 925bb1ce47f4 circular locking dependency 925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock while acquiring tty->atomic_write_lock. Move acquisition to flush_to_ldisc(), taking it prior to taking buf->lock. Costs a reference, but appeases lockdep. Signed-off-by: Mike Galbraith <efa...@gmx.de> Fixes: 925bb1ce47f4 ("tty: fix port buffer locking") --- drivers/tty/tty_buffer.c | 10 ++ drivers/tty/tty_port.c |2 -- 2 files changed, 10 insertions(+), 2 deletions(-) --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s { struct tty_port *port = container_of(work, struct tty_port, buf.work); struct tty_bufhead *buf = >buf; + struct tty_struct *tty = READ_ONCE(port->itty); + struct tty_ldisc *disc = NULL; + if (tty) + disc = tty_ldisc_ref(tty); + if (disc) + mutex_lock(>atomic_write_lock); mutex_lock(>lock); while (1) { @@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s } mutex_unlock(>lock); + if (disc) { + mutex_unlock(>atomic_write_lock); + tty_ldisc_deref(disc); + } } --- a/drivers/tty/tty_port.c +++ b/drivers/tty/tty_port.c @@ -34,9 +34,7 @@ static int tty_port_default_receive_buf( if (!disc) return 0; - mutex_lock(>atomic_write_lock); ret = tty_ldisc_receive_buf(disc, p, (char *)f, count); - mutex_unlock(>atomic_write_lock); tty_ldisc_deref(disc);
Re: [bisected] Re: tty lockdep trace
On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote: > > I don't know how you did it, but this passes my testing (reproducers for > both the original issue and the lockdep splat/hang). I suppose I can sign it off, see if that inspires anyone to come up with something better. drivers/tty: Fix 925bb1ce47f4 circular locking dependency 925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock while acquiring tty->atomic_write_lock. Move acquisition to flush_to_ldisc(), taking it prior to taking buf->lock. Costs a reference, but appeases lockdep. Signed-off-by: Mike Galbraith Fixes: 925bb1ce47f4 ("tty: fix port buffer locking") --- drivers/tty/tty_buffer.c | 10 ++ drivers/tty/tty_port.c |2 -- 2 files changed, 10 insertions(+), 2 deletions(-) --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s { struct tty_port *port = container_of(work, struct tty_port, buf.work); struct tty_bufhead *buf = >buf; + struct tty_struct *tty = READ_ONCE(port->itty); + struct tty_ldisc *disc = NULL; + if (tty) + disc = tty_ldisc_ref(tty); + if (disc) + mutex_lock(>atomic_write_lock); mutex_lock(>lock); while (1) { @@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s } mutex_unlock(>lock); + if (disc) { + mutex_unlock(>atomic_write_lock); + tty_ldisc_deref(disc); + } } --- a/drivers/tty/tty_port.c +++ b/drivers/tty/tty_port.c @@ -34,9 +34,7 @@ static int tty_port_default_receive_buf( if (!disc) return 0; - mutex_lock(>atomic_write_lock); ret = tty_ldisc_receive_buf(disc, p, (char *)f, count); - mutex_unlock(>atomic_write_lock); tty_ldisc_deref(disc);
Re: [bisected] Re: tty lockdep trace
On Sun, 2017-06-04 at 10:32 +0200, Greg Kroah-Hartman wrote: > On Sat, Jun 03, 2017 at 08:33:52AM +0200, Mike Galbraith wrote: > > On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote: > > > Just hit this during a trinity run. > > > > 925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit > > commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e > > Author: Vegard Nossum <vegard.nos...@oracle.com> > > Date: Thu May 11 12:18:52 2017 +0200 > > > > tty: fix port buffer locking > > Now reverting this. Oops, sorry, forgot to add Dave and your names to > the patch revert. The list of people who reported this was really long, > many thanks for this. If flush_to_ldisc() is the problem, and taking atomic_write_lock in that path an acceptable solution, how about do that a bit differently instead. Lockdep stopped grumbling, vbox seems happy. 925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock while acquiring tty->atomic_write_lock. Move acquisition to flush_to_ldisc(), taking it prior to taking buf->lock. Costs a reference, but appeases lockdep. Not-so-signed-off-by: /me --- drivers/tty/tty_buffer.c | 10 ++ drivers/tty/tty_port.c |2 -- 2 files changed, 10 insertions(+), 2 deletions(-) --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s { struct tty_port *port = container_of(work, struct tty_port, buf.work); struct tty_bufhead *buf = >buf; + struct tty_struct *tty = READ_ONCE(port->itty); + struct tty_ldisc *disc = NULL; + if (tty) + disc = tty_ldisc_ref(tty); + if (disc) + mutex_lock(>atomic_write_lock); mutex_lock(>lock); while (1) { @@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s } mutex_unlock(>lock); + if (disc) { + mutex_unlock(>atomic_write_lock); + tty_ldisc_deref(disc); + } } --- a/drivers/tty/tty_port.c +++ b/drivers/tty/tty_port.c @@ -34,9 +34,7 @@ static int tty_port_default_receive_buf( if (!disc) return 0; - mutex_lock(>atomic_write_lock); ret = tty_ldisc_receive_buf(disc, p, (char *)f, count); - mutex_unlock(>atomic_write_lock); tty_ldisc_deref(disc);
Re: [bisected] Re: tty lockdep trace
On Sun, 2017-06-04 at 10:32 +0200, Greg Kroah-Hartman wrote: > On Sat, Jun 03, 2017 at 08:33:52AM +0200, Mike Galbraith wrote: > > On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote: > > > Just hit this during a trinity run. > > > > 925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit > > commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e > > Author: Vegard Nossum > > Date: Thu May 11 12:18:52 2017 +0200 > > > > tty: fix port buffer locking > > Now reverting this. Oops, sorry, forgot to add Dave and your names to > the patch revert. The list of people who reported this was really long, > many thanks for this. If flush_to_ldisc() is the problem, and taking atomic_write_lock in that path an acceptable solution, how about do that a bit differently instead. Lockdep stopped grumbling, vbox seems happy. 925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock while acquiring tty->atomic_write_lock. Move acquisition to flush_to_ldisc(), taking it prior to taking buf->lock. Costs a reference, but appeases lockdep. Not-so-signed-off-by: /me --- drivers/tty/tty_buffer.c | 10 ++ drivers/tty/tty_port.c |2 -- 2 files changed, 10 insertions(+), 2 deletions(-) --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s { struct tty_port *port = container_of(work, struct tty_port, buf.work); struct tty_bufhead *buf = >buf; + struct tty_struct *tty = READ_ONCE(port->itty); + struct tty_ldisc *disc = NULL; + if (tty) + disc = tty_ldisc_ref(tty); + if (disc) + mutex_lock(>atomic_write_lock); mutex_lock(>lock); while (1) { @@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s } mutex_unlock(>lock); + if (disc) { + mutex_unlock(>atomic_write_lock); + tty_ldisc_deref(disc); + } } --- a/drivers/tty/tty_port.c +++ b/drivers/tty/tty_port.c @@ -34,9 +34,7 @@ static int tty_port_default_receive_buf( if (!disc) return 0; - mutex_lock(>atomic_write_lock); ret = tty_ldisc_receive_buf(disc, p, (char *)f, count); - mutex_unlock(>atomic_write_lock); tty_ldisc_deref(disc);
[bisected] Re: tty lockdep trace
On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote: > Just hit this during a trinity run. 925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e Author: Vegard NossumDate: Thu May 11 12:18:52 2017 +0200 tty: fix port buffer locking tty_insert_flip_string_fixed_flag() is racy against itself when called from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc() workqueue path [2]. The problem is that port->buf.tail->used is modified without consistent locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue path takes ldata->output_lock. We cannot simply take ldata->output_lock, since that is specific to the N_TTY line discipline. It might seem natural to try to take port->buf.lock inside tty_insert_flip_string_fixed_flag() and friends (where port->buf is actually used/modified), but this creates problems for flush_to_ldisc() which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem, and ldata->output_lock. Therefore, the simplest solution for now seems to be to take tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock is also used in the write path [3] with a consistent ordering. [1]: Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_send_xchar // down_read(_tty->termios_rwsem) // mutex_lock(>atomic_write_lock) n_tty_ioctl_helper n_tty_ioctl tty_ioctl // down_read(>ldisc_sem) do_vfs_ioctl SyS_ioctl [2]: Workqueue: events_unbound flush_to_ldisc Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_put_char __process_echoes commit_echoes // mutex_lock(>output_lock) n_tty_receive_buf_common n_tty_receive_buf2 tty_ldisc_receive_buf // down_read(_tty->termios_rwsem) tty_port_default_receive_buf // down_read(>ldisc_sem) flush_to_ldisc // mutex_lock(>buf.lock) process_one_work [3]: Call Trace: tty_insert_flip_string_fixed_flag pty_write n_tty_write// mutex_lock(>output_lock) // down_read(>termios_rwsem) do_tty_write (inline) // mutex_lock(>atomic_write_lock) tty_write // down_read(>ldisc_sem) __vfs_write vfs_write SyS_write The bug can result in about a dozen different crashes depending on what exactly gets corrupted when port->buf.tail->used points outside the buffer. The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is always welcome. Found using syzkaller. Cc: Signed-off-by: Vegard Nossum Signed-off-by: Greg Kroah-Hartman :04 04 6df9d7d2f22ff6ac1ad265869c1cb0d15621a0eb 56e5e66a5e451bf55857025899f4a092b51dca6e M drivers git bisect start # good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11 git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12 # bad: [104c08ba8e921ef97abfdc10408d54921a6d9003] Merge tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 104c08ba8e921ef97abfdc10408d54921a6d9003 # good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 16a12fa9aed176444fc795b09e796be41902bb08 # good: [2d3e4866dea96b0506395b47bfefb234f2088dac] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect good 2d3e4866dea96b0506395b47bfefb234f2088dac # good: [f94c128eefcce2e3448d543f13cd7d7b8aa660a5] Merge tag 'metag-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag git bisect good f94c128eefcce2e3448d543f13cd7d7b8aa660a5 # good: [18365225f0440d09708ad9daade2ec11275c3df9] hwpoison, memcg: forcibly uncharge LRU pages git bisect good 18365225f0440d09708ad9daade2ec11275c3df9 # good: [5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11] nvme_fc: Support ctrl_loss_tmo git bisect good 5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11 # good: [d024baa58a4a7e5eb6058017771d15b9e47b56db] Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good d024baa58a4a7e5eb6058017771d15b9e47b56db # bad: [393bcfaeb8be7f46a4cd7d673e33541ebee76b12] Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending git bisect bad 393bcfaeb8be7f46a4cd7d673e33541ebee76b12 # bad: [b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08] kthread: fix boot hang (regression) on MIPS/OpenRISC git bisect bad b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08 # good: [6f68a6ae1f0ea2fd87e26dc345866e928b5850a8] Merge tag
[bisected] Re: tty lockdep trace
On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote: > Just hit this during a trinity run. 925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e Author: Vegard Nossum Date: Thu May 11 12:18:52 2017 +0200 tty: fix port buffer locking tty_insert_flip_string_fixed_flag() is racy against itself when called from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc() workqueue path [2]. The problem is that port->buf.tail->used is modified without consistent locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue path takes ldata->output_lock. We cannot simply take ldata->output_lock, since that is specific to the N_TTY line discipline. It might seem natural to try to take port->buf.lock inside tty_insert_flip_string_fixed_flag() and friends (where port->buf is actually used/modified), but this creates problems for flush_to_ldisc() which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem, and ldata->output_lock. Therefore, the simplest solution for now seems to be to take tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock is also used in the write path [3] with a consistent ordering. [1]: Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_send_xchar // down_read(_tty->termios_rwsem) // mutex_lock(>atomic_write_lock) n_tty_ioctl_helper n_tty_ioctl tty_ioctl // down_read(>ldisc_sem) do_vfs_ioctl SyS_ioctl [2]: Workqueue: events_unbound flush_to_ldisc Call Trace: tty_insert_flip_string_fixed_flag pty_write tty_put_char __process_echoes commit_echoes // mutex_lock(>output_lock) n_tty_receive_buf_common n_tty_receive_buf2 tty_ldisc_receive_buf // down_read(_tty->termios_rwsem) tty_port_default_receive_buf // down_read(>ldisc_sem) flush_to_ldisc // mutex_lock(>buf.lock) process_one_work [3]: Call Trace: tty_insert_flip_string_fixed_flag pty_write n_tty_write// mutex_lock(>output_lock) // down_read(>termios_rwsem) do_tty_write (inline) // mutex_lock(>atomic_write_lock) tty_write // down_read(>ldisc_sem) __vfs_write vfs_write SyS_write The bug can result in about a dozen different crashes depending on what exactly gets corrupted when port->buf.tail->used points outside the buffer. The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is always welcome. Found using syzkaller. Cc: Signed-off-by: Vegard Nossum Signed-off-by: Greg Kroah-Hartman :04 04 6df9d7d2f22ff6ac1ad265869c1cb0d15621a0eb 56e5e66a5e451bf55857025899f4a092b51dca6e M drivers git bisect start # good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11 git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12 # bad: [104c08ba8e921ef97abfdc10408d54921a6d9003] Merge tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 104c08ba8e921ef97abfdc10408d54921a6d9003 # good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 16a12fa9aed176444fc795b09e796be41902bb08 # good: [2d3e4866dea96b0506395b47bfefb234f2088dac] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect good 2d3e4866dea96b0506395b47bfefb234f2088dac # good: [f94c128eefcce2e3448d543f13cd7d7b8aa660a5] Merge tag 'metag-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag git bisect good f94c128eefcce2e3448d543f13cd7d7b8aa660a5 # good: [18365225f0440d09708ad9daade2ec11275c3df9] hwpoison, memcg: forcibly uncharge LRU pages git bisect good 18365225f0440d09708ad9daade2ec11275c3df9 # good: [5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11] nvme_fc: Support ctrl_loss_tmo git bisect good 5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11 # good: [d024baa58a4a7e5eb6058017771d15b9e47b56db] Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good d024baa58a4a7e5eb6058017771d15b9e47b56db # bad: [393bcfaeb8be7f46a4cd7d673e33541ebee76b12] Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending git bisect bad 393bcfaeb8be7f46a4cd7d673e33541ebee76b12 # bad: [b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08] kthread: fix boot hang (regression) on MIPS/OpenRISC git bisect bad b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08 # good: [6f68a6ae1f0ea2fd87e26dc345866e928b5850a8] Merge tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint
On Fri, 2017-05-19 at 16:38 -0400, Tejun Heo wrote: > Hello, Waiman. > > On Mon, May 15, 2017 at 09:34:11AM -0400, Waiman Long wrote: > > The rationale behind the cgroup v2 no internal process constraint is > > to avoid resouorce competition between internal processes and child > > cgroups. However, not all controllers have problem with internal > > process competiton. Enforcing this rule may lead to unnatural process > > hierarchy and unneeded levels for those controllers. > > This isn't necessarily something we can determine by looking at the > current state of controllers. It's true that some controllers - pid > and perf - inherently only care about membership of each task but at > the same time neither really suffers from the constraint either. CPU > which is the problematic one here... (+ cpuacct + cpuset)
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint
On Fri, 2017-05-19 at 16:38 -0400, Tejun Heo wrote: > Hello, Waiman. > > On Mon, May 15, 2017 at 09:34:11AM -0400, Waiman Long wrote: > > The rationale behind the cgroup v2 no internal process constraint is > > to avoid resouorce competition between internal processes and child > > cgroups. However, not all controllers have problem with internal > > process competiton. Enforcing this rule may lead to unnatural process > > hierarchy and unneeded levels for those controllers. > > This isn't necessarily something we can determine by looking at the > current state of controllers. It's true that some controllers - pid > and perf - inherently only care about membership of each task but at > the same time neither really suffers from the constraint either. CPU > which is the problematic one here... (+ cpuacct + cpuset)
Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
On Thu, 2017-05-18 at 17:40 -0300, Gabriel Krisman Bertazi wrote: > > Mike Galbraith <efa...@gmx.de> writes: > > > > > On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote: > > > > > > > > Thanks for reporting this. Can you confirm the following patch prevents > > > the issue? > > > > Nope, it still gripes. > > Oops... Thanks for checking. The following patch fixes the issue for > me. Can you please test that one? Yup, all better. -Mike
Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
On Thu, 2017-05-18 at 17:40 -0300, Gabriel Krisman Bertazi wrote: > > Mike Galbraith writes: > > > > > On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote: > > > > > > > > Thanks for reporting this. Can you confirm the following patch prevents > > > the issue? > > > > Nope, it still gripes. > > Oops... Thanks for checking. The following patch fixes the issue for > me. Can you please test that one? Yup, all better. -Mike
Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
On Tue, 2017-05-09 at 04:37 +0200, Mike Galbraith wrote: > On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote: > > > Thanks for reporting this. Can you confirm the following patch prevents > > the issue? > > Nope, it still gripes. The reason for this gripe is that we find that we don't have memory reserved.. a tad too late. Xorg-2252 [000] 135.409756: qxl_release_map <-qxl_cursor_atomic_update Xorg-2252 [000] 135.409756: qxl_bo_kmap_atomic_page <-qxl_release_map Xorg-2252 [000] 135.409757: qxl_bo_kmap_atomic_page: ENTER Xorg-2252 [000] 135.409757: ttm_mem_io_lock <-qxl_bo_kmap_atomic_page Xorg-2252 [000] 135.409757: ttm_mem_io_reserve <-qxl_bo_kmap_atomic_page Xorg-2252 [000] 135.409757: qxl_ttm_io_mem_reserve <-ttm_mem_io_reserve Xorg-2252 [000] 135.409757: ttm_mem_io_unlock <-qxl_bo_kmap_atomic_page Xorg-2252 [000] 135.409757: qxl_bo_kmap_atomic_page: PREEMPTION DISABLED Xorg-2252 [000] ...1 135.409758: qxl_bo_kmap <-qxl_cursor_atomic_update Xorg-2252 [000] ...1 135.409758: ttm_bo_kmap <-qxl_bo_kmap <== too late Xorg-2252 [000] ...1 135.409758: ttm_mem_io_reserve <-ttm_bo_kmap Xorg-2252 [000] ...1 135.409758: qxl_ttm_io_mem_reserve <-ttm_mem_io_reserve Xorg-2252 [000] ...1 135.409759: ioremap_nocache <-ttm_bo_kmap <== game over Xorg-2252 [000] ...1 135.409759: __ioremap_caller <-ioremap_nocache Xorg-2252 [000] ...1 135.409759: walk_system_ram_range <-__ioremap_caller Xorg-2252 [000] ...1 135.409759: find_next_iomem_res <-walk_system_ram_range Xorg-2252 [000] ...1 135.409759: _raw_read_lock <-find_next_iomem_res Xorg-2252 [000] ...1 135.409760: reserve_memtype <-__ioremap_caller Xorg-2252 [000] ...1 135.409760: pat_pagerange_is_ram <-reserve_memtype Xorg-2252 [000] ...1 135.409761: walk_system_ram_range <-pat_pagerange_is_ram Xorg-2252 [000] ...1 135.409761: find_next_iomem_res <-walk_system_ram_range Xorg-2252 [000] ...1 135.409761: _raw_read_lock <-find_next_iomem_res Xorg-2252 [000] ...1 135.409761: kmem_cache_alloc_trace <-reserve_memtype Xorg-2252 [000] ...1 135.409761: __might_sleep <-kmem_cache_alloc_trace Xorg-2252 [000] ...1 135.409762: ___might_sleep <-__might_sleep
Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
On Tue, 2017-05-09 at 04:37 +0200, Mike Galbraith wrote: > On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote: > > > Thanks for reporting this. Can you confirm the following patch prevents > > the issue? > > Nope, it still gripes. The reason for this gripe is that we find that we don't have memory reserved.. a tad too late. Xorg-2252 [000] 135.409756: qxl_release_map <-qxl_cursor_atomic_update Xorg-2252 [000] 135.409756: qxl_bo_kmap_atomic_page <-qxl_release_map Xorg-2252 [000] 135.409757: qxl_bo_kmap_atomic_page: ENTER Xorg-2252 [000] 135.409757: ttm_mem_io_lock <-qxl_bo_kmap_atomic_page Xorg-2252 [000] 135.409757: ttm_mem_io_reserve <-qxl_bo_kmap_atomic_page Xorg-2252 [000] 135.409757: qxl_ttm_io_mem_reserve <-ttm_mem_io_reserve Xorg-2252 [000] 135.409757: ttm_mem_io_unlock <-qxl_bo_kmap_atomic_page Xorg-2252 [000] 135.409757: qxl_bo_kmap_atomic_page: PREEMPTION DISABLED Xorg-2252 [000] ...1 135.409758: qxl_bo_kmap <-qxl_cursor_atomic_update Xorg-2252 [000] ...1 135.409758: ttm_bo_kmap <-qxl_bo_kmap <== too late Xorg-2252 [000] ...1 135.409758: ttm_mem_io_reserve <-ttm_bo_kmap Xorg-2252 [000] ...1 135.409758: qxl_ttm_io_mem_reserve <-ttm_mem_io_reserve Xorg-2252 [000] ...1 135.409759: ioremap_nocache <-ttm_bo_kmap <== game over Xorg-2252 [000] ...1 135.409759: __ioremap_caller <-ioremap_nocache Xorg-2252 [000] ...1 135.409759: walk_system_ram_range <-__ioremap_caller Xorg-2252 [000] ...1 135.409759: find_next_iomem_res <-walk_system_ram_range Xorg-2252 [000] ...1 135.409759: _raw_read_lock <-find_next_iomem_res Xorg-2252 [000] ...1 135.409760: reserve_memtype <-__ioremap_caller Xorg-2252 [000] ...1 135.409760: pat_pagerange_is_ram <-reserve_memtype Xorg-2252 [000] ...1 135.409761: walk_system_ram_range <-pat_pagerange_is_ram Xorg-2252 [000] ...1 135.409761: find_next_iomem_res <-walk_system_ram_range Xorg-2252 [000] ...1 135.409761: _raw_read_lock <-find_next_iomem_res Xorg-2252 [000] ...1 135.409761: kmem_cache_alloc_trace <-reserve_memtype Xorg-2252 [000] ...1 135.409761: __might_sleep <-kmem_cache_alloc_trace Xorg-2252 [000] ...1 135.409762: ___might_sleep <-__might_sleep
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 10:32 -0600, Jens Axboe wrote: > > Sure, but s/get_cpu_var/get_cpu_ptr first. > > Doh. Ditto, box OTOH spotted the booboo instantly :) -Mike
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 10:32 -0600, Jens Axboe wrote: > > Sure, but s/get_cpu_var/get_cpu_ptr first. > > Doh. Ditto, box OTOH spotted the booboo instantly :) -Mike
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 09:45 -0600, Jens Axboe wrote: > > > > > > Is it from this_cpu_ptr() in blk_stat_add()? > > > > > > > > > > Yeah. > > > > > > > > So why is this complaining, doesn't rcu_read_lock() disable > > > > preemption? > > > > > > Ah, I guess it doesn't if PREEMPT_RCU is set. How about the > > > below? > > > > Should do it. I was about to run LTP (where it turned up) again > > anyway, I'll add this. No news is good news, what you should hear. > > Thanks, let me know, so I can add your tested-by (or whatever you > prefer). Sure, but s/get_cpu_var/get_cpu_ptr first. -MIke
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 09:45 -0600, Jens Axboe wrote: > > > > > > Is it from this_cpu_ptr() in blk_stat_add()? > > > > > > > > > > Yeah. > > > > > > > > So why is this complaining, doesn't rcu_read_lock() disable > > > > preemption? > > > > > > Ah, I guess it doesn't if PREEMPT_RCU is set. How about the > > > below? > > > > Should do it. I was about to run LTP (where it turned up) again > > anyway, I'll add this. No news is good news, what you should hear. > > Thanks, let me know, so I can add your tested-by (or whatever you > prefer). Sure, but s/get_cpu_var/get_cpu_ptr first. -MIke
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 09:15 -0600, Jens Axboe wrote: > On 05/09/2017 09:13 AM, Jens Axboe wrote: > > On 05/09/2017 09:04 AM, Mike Galbraith wrote: > > > On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote: > > > > On 05/09/2017 12:07 AM, Mike Galbraith wrote: > > > > > Hi Jens, > > > > > > > > > > I was about to fix up this splat.. > > > > > > > > > > [ 445.022141] loop: module loaded > > > > > [ 445.078116] BUG: using smp_processor_id() in preemptible > > > > > [] code: loop0/3801 > > > > > [ 445.085873] caller is debug_smp_processor_id+0x17/0x20 > > > > > [ 445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: G > > > > > E 4.12.0-default #40 > > > > > [ 445.098838] Hardware name: IBM System x3550 M3 -[7944K3G] > > > > > -/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010 > > > > > [ 445.108564] Call Trace: > > > > > [ 445.111016] dump_stack+0x65/0x89 > > > > > [ 445.114330] check_preemption_disabled+0xde/0xf0 > > > > > [ 445.118945] debug_smp_processor_id+0x17/0x20 > > > > > [ 445.123300] blk_stat_add+0xb0/0x130 > > > > > [ 445.126876] __blk_mq_complete_request+0xb5/0x150 > > > > > [ 445.131575] blk_mq_complete_request+0x16/0x20 > > > > > [ 445.136020] loop_queue_work+0x5f/0xaa0 [loop] > > > > > [ 445.140461] ? _raw_spin_unlock_irq+0x21/0x40 > > > > > [ 445.144816] ? finish_task_switch+0x85/0x270 > > > > > [ 445.149085] ? __schedule+0x291/0x8c0 > > > > > [ 445.152747] kthread_worker_fn+0xc2/0x1d0 > > > > > [ 445.156754] kthread+0x114/0x150 > > > > > [ 445.159983] ? __kthread_init_worker+0xb0/0xb0 > > > > > [ 445.164424] ? kthread_park+0x60/0x60 > > > > > [ 445.168085] ret_from_fork+0x2c/0x40 > > > > > > > > Is it from this_cpu_ptr() in blk_stat_add()? > > > > > > Yeah. > > > > So why is this complaining, doesn't rcu_read_lock() disable > > preemption? > > Ah, I guess it doesn't if PREEMPT_RCU is set. How about the below? Should do it. I was about to run LTP (where it turned up) again anyway, I'll add this. No news is good news, what you should hear. > diff --git a/block/blk-stat.c b/block/blk-stat.c > index 6c2f40940439..022f5925a427 100644 > --- a/block/blk-stat.c > +++ b/block/blk-stat.c > @@ -96,13 +96,16 @@ void blk_stat_add(struct request *rq) > > rcu_read_lock(); > list_for_each_entry_rcu(cb, >stats->callbacks, list) { > - if (blk_stat_is_active(cb)) { > - bucket = cb->bucket_fn(rq); > - if (bucket < 0) > - continue; > - stat = _cpu_ptr(cb->cpu_stat)[bucket]; > - __blk_stat_add(stat, value); > - } > + if (!blk_stat_is_active(cb)) > + continue; > + > + bucket = cb->bucket_fn(rq); > + if (bucket < 0) > + continue; > + > + stat = _cpu_var(cb->cpu_stat)[bucket]; > + __blk_stat_add(stat, value); > + put_cpu_var(cb->cpu_stat); > } > rcu_read_unlock(); > } >
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 09:15 -0600, Jens Axboe wrote: > On 05/09/2017 09:13 AM, Jens Axboe wrote: > > On 05/09/2017 09:04 AM, Mike Galbraith wrote: > > > On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote: > > > > On 05/09/2017 12:07 AM, Mike Galbraith wrote: > > > > > Hi Jens, > > > > > > > > > > I was about to fix up this splat.. > > > > > > > > > > [ 445.022141] loop: module loaded > > > > > [ 445.078116] BUG: using smp_processor_id() in preemptible > > > > > [] code: loop0/3801 > > > > > [ 445.085873] caller is debug_smp_processor_id+0x17/0x20 > > > > > [ 445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: G > > > > > E 4.12.0-default #40 > > > > > [ 445.098838] Hardware name: IBM System x3550 M3 -[7944K3G] > > > > > -/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010 > > > > > [ 445.108564] Call Trace: > > > > > [ 445.111016] dump_stack+0x65/0x89 > > > > > [ 445.114330] check_preemption_disabled+0xde/0xf0 > > > > > [ 445.118945] debug_smp_processor_id+0x17/0x20 > > > > > [ 445.123300] blk_stat_add+0xb0/0x130 > > > > > [ 445.126876] __blk_mq_complete_request+0xb5/0x150 > > > > > [ 445.131575] blk_mq_complete_request+0x16/0x20 > > > > > [ 445.136020] loop_queue_work+0x5f/0xaa0 [loop] > > > > > [ 445.140461] ? _raw_spin_unlock_irq+0x21/0x40 > > > > > [ 445.144816] ? finish_task_switch+0x85/0x270 > > > > > [ 445.149085] ? __schedule+0x291/0x8c0 > > > > > [ 445.152747] kthread_worker_fn+0xc2/0x1d0 > > > > > [ 445.156754] kthread+0x114/0x150 > > > > > [ 445.159983] ? __kthread_init_worker+0xb0/0xb0 > > > > > [ 445.164424] ? kthread_park+0x60/0x60 > > > > > [ 445.168085] ret_from_fork+0x2c/0x40 > > > > > > > > Is it from this_cpu_ptr() in blk_stat_add()? > > > > > > Yeah. > > > > So why is this complaining, doesn't rcu_read_lock() disable > > preemption? > > Ah, I guess it doesn't if PREEMPT_RCU is set. How about the below? Should do it. I was about to run LTP (where it turned up) again anyway, I'll add this. No news is good news, what you should hear. > diff --git a/block/blk-stat.c b/block/blk-stat.c > index 6c2f40940439..022f5925a427 100644 > --- a/block/blk-stat.c > +++ b/block/blk-stat.c > @@ -96,13 +96,16 @@ void blk_stat_add(struct request *rq) > > rcu_read_lock(); > list_for_each_entry_rcu(cb, >stats->callbacks, list) { > - if (blk_stat_is_active(cb)) { > - bucket = cb->bucket_fn(rq); > - if (bucket < 0) > - continue; > - stat = _cpu_ptr(cb->cpu_stat)[bucket]; > - __blk_stat_add(stat, value); > - } > + if (!blk_stat_is_active(cb)) > + continue; > + > + bucket = cb->bucket_fn(rq); > + if (bucket < 0) > + continue; > + > + stat = _cpu_var(cb->cpu_stat)[bucket]; > + __blk_stat_add(stat, value); > + put_cpu_var(cb->cpu_stat); > } > rcu_read_unlock(); > } >
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote: > On 05/09/2017 12:07 AM, Mike Galbraith wrote: > > Hi Jens, > > > > I was about to fix up this splat.. > > > > [ 445.022141] loop: module loaded > > [ 445.078116] BUG: using smp_processor_id() in preemptible [] > > code: loop0/3801 > > [ 445.085873] caller is debug_smp_processor_id+0x17/0x20 > > [ 445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE > > 4.12.0-default #40 > > [ 445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , > > BIOS -[D6E150AUS-1.10]- 12/15/2010 > > [ 445.108564] Call Trace: > > [ 445.111016] dump_stack+0x65/0x89 > > [ 445.114330] check_preemption_disabled+0xde/0xf0 > > [ 445.118945] debug_smp_processor_id+0x17/0x20 > > [ 445.123300] blk_stat_add+0xb0/0x130 > > [ 445.126876] __blk_mq_complete_request+0xb5/0x150 > > [ 445.131575] blk_mq_complete_request+0x16/0x20 > > [ 445.136020] loop_queue_work+0x5f/0xaa0 [loop] > > [ 445.140461] ? _raw_spin_unlock_irq+0x21/0x40 > > [ 445.144816] ? finish_task_switch+0x85/0x270 > > [ 445.149085] ? __schedule+0x291/0x8c0 > > [ 445.152747] kthread_worker_fn+0xc2/0x1d0 > > [ 445.156754] kthread+0x114/0x150 > > [ 445.159983] ? __kthread_init_worker+0xb0/0xb0 > > [ 445.164424] ? kthread_park+0x60/0x60 > > [ 445.168085] ret_from_fork+0x2c/0x40 > > Is it from this_cpu_ptr() in blk_stat_add()? Yeah. > > ..but then noticed that __blk_mq_run_hw_queue() is also called under > > get_cpu(), but recently grew a might_sleep(), so maybe wants some block > > layer eyeballs. > > That part is fine. The might sleep depends on BLOCKING being set in > the hardware queue flags. And if that is set, then we don't call > it under get_cpu(). Ah, right. -Mike
Re: get/put_cpu() usage in block/blk-mq.c
On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote: > On 05/09/2017 12:07 AM, Mike Galbraith wrote: > > Hi Jens, > > > > I was about to fix up this splat.. > > > > [ 445.022141] loop: module loaded > > [ 445.078116] BUG: using smp_processor_id() in preemptible [] > > code: loop0/3801 > > [ 445.085873] caller is debug_smp_processor_id+0x17/0x20 > > [ 445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE > > 4.12.0-default #40 > > [ 445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , > > BIOS -[D6E150AUS-1.10]- 12/15/2010 > > [ 445.108564] Call Trace: > > [ 445.111016] dump_stack+0x65/0x89 > > [ 445.114330] check_preemption_disabled+0xde/0xf0 > > [ 445.118945] debug_smp_processor_id+0x17/0x20 > > [ 445.123300] blk_stat_add+0xb0/0x130 > > [ 445.126876] __blk_mq_complete_request+0xb5/0x150 > > [ 445.131575] blk_mq_complete_request+0x16/0x20 > > [ 445.136020] loop_queue_work+0x5f/0xaa0 [loop] > > [ 445.140461] ? _raw_spin_unlock_irq+0x21/0x40 > > [ 445.144816] ? finish_task_switch+0x85/0x270 > > [ 445.149085] ? __schedule+0x291/0x8c0 > > [ 445.152747] kthread_worker_fn+0xc2/0x1d0 > > [ 445.156754] kthread+0x114/0x150 > > [ 445.159983] ? __kthread_init_worker+0xb0/0xb0 > > [ 445.164424] ? kthread_park+0x60/0x60 > > [ 445.168085] ret_from_fork+0x2c/0x40 > > Is it from this_cpu_ptr() in blk_stat_add()? Yeah. > > ..but then noticed that __blk_mq_run_hw_queue() is also called under > > get_cpu(), but recently grew a might_sleep(), so maybe wants some block > > layer eyeballs. > > That part is fine. The might sleep depends on BLOCKING being set in > the hardware queue flags. And if that is set, then we don't call > it under get_cpu(). Ah, right. -Mike
get/put_cpu() usage in block/blk-mq.c
Hi Jens, I was about to fix up this splat.. [ 445.022141] loop: module loaded [ 445.078116] BUG: using smp_processor_id() in preemptible [] code: loop0/3801 [ 445.085873] caller is debug_smp_processor_id+0x17/0x20 [ 445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE 4.12.0-default #40 [ 445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010 [ 445.108564] Call Trace: [ 445.111016] dump_stack+0x65/0x89 [ 445.114330] check_preemption_disabled+0xde/0xf0 [ 445.118945] debug_smp_processor_id+0x17/0x20 [ 445.123300] blk_stat_add+0xb0/0x130 [ 445.126876] __blk_mq_complete_request+0xb5/0x150 [ 445.131575] blk_mq_complete_request+0x16/0x20 [ 445.136020] loop_queue_work+0x5f/0xaa0 [loop] [ 445.140461] ? _raw_spin_unlock_irq+0x21/0x40 [ 445.144816] ? finish_task_switch+0x85/0x270 [ 445.149085] ? __schedule+0x291/0x8c0 [ 445.152747] kthread_worker_fn+0xc2/0x1d0 [ 445.156754] kthread+0x114/0x150 [ 445.159983] ? __kthread_init_worker+0xb0/0xb0 [ 445.164424] ? kthread_park+0x60/0x60 [ 445.168085] ret_from_fork+0x2c/0x40 ..but then noticed that __blk_mq_run_hw_queue() is also called under get_cpu(), but recently grew a might_sleep(), so maybe wants some block layer eyeballs. -Mike
get/put_cpu() usage in block/blk-mq.c
Hi Jens, I was about to fix up this splat.. [ 445.022141] loop: module loaded [ 445.078116] BUG: using smp_processor_id() in preemptible [] code: loop0/3801 [ 445.085873] caller is debug_smp_processor_id+0x17/0x20 [ 445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE 4.12.0-default #40 [ 445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010 [ 445.108564] Call Trace: [ 445.111016] dump_stack+0x65/0x89 [ 445.114330] check_preemption_disabled+0xde/0xf0 [ 445.118945] debug_smp_processor_id+0x17/0x20 [ 445.123300] blk_stat_add+0xb0/0x130 [ 445.126876] __blk_mq_complete_request+0xb5/0x150 [ 445.131575] blk_mq_complete_request+0x16/0x20 [ 445.136020] loop_queue_work+0x5f/0xaa0 [loop] [ 445.140461] ? _raw_spin_unlock_irq+0x21/0x40 [ 445.144816] ? finish_task_switch+0x85/0x270 [ 445.149085] ? __schedule+0x291/0x8c0 [ 445.152747] kthread_worker_fn+0xc2/0x1d0 [ 445.156754] kthread+0x114/0x150 [ 445.159983] ? __kthread_init_worker+0xb0/0xb0 [ 445.164424] ? kthread_park+0x60/0x60 [ 445.168085] ret_from_fork+0x2c/0x40 ..but then noticed that __blk_mq_run_hw_queue() is also called under get_cpu(), but recently grew a might_sleep(), so maybe wants some block layer eyeballs. -Mike
Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote: > Thanks for reporting this. Can you confirm the following patch prevents > the issue? Nope, it still gripes. [ 43.910362] BUG: sleeping function called from invalid context at mm/slab.h:432 [ 43.910955] in_atomic(): 1, irqs_disabled(): 0, pid: 2077, name: Xorg [ 43.911472] Preemption disabled at: [ 43.911478] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 43.912103] CPU: 0 PID: 2077 Comm: Xorg Tainted: GE 4.12.0-master #38 [ 43.912550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 43.913202] Call Trace: [ 43.913371] dump_stack+0x65/0x89 [ 43.913581] ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 43.913876] ___might_sleep+0x11a/0x190 [ 43.914095] __might_sleep+0x4a/0x80 [ 43.914319] ? qxl_bo_create+0x50/0x190 [qxl] [ 43.914565] kmem_cache_alloc_trace+0x46/0x180 [ 43.914836] qxl_bo_create+0x50/0x190 [qxl] [ 43.915082] ? refcount_dec_and_test+0x11/0x20 [ 43.915332] ? ttm_mem_io_reserve+0x41/0xe0 [ttm] [ 43.915595] qxl_alloc_bo_reserved+0x37/0xb0 [qxl] [ 43.915884] qxl_cursor_atomic_update+0x8f/0x260 [qxl] [ 43.916172] ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 [drm_kms_helper] [ 43.916623] drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper] [ 43.916995] drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper] [ 43.917398] commit_tail+0x65/0x70 [drm_kms_helper] [ 43.917693] drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper] [ 43.918039] drm_atomic_commit+0x4b/0x50 [drm] [ 43.918334] drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper] [ 43.918902] __setplane_internal+0x19f/0x280 [drm] [ 43.919240] drm_mode_cursor_universal+0x101/0x1c0 [drm] [ 43.919541] drm_mode_cursor_common+0x15b/0x1d0 [drm] [ 43.919858] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 43.920157] drm_ioctl+0x211/0x460 [drm] [ 43.920383] ? drm_mode_cursor_ioctl+0x50/0x50 [drm] [ 43.920664] ? handle_mm_fault+0x93/0x160 [ 43.920893] do_vfs_ioctl+0x96/0x6e0 [ 43.921117] ? __fget+0x73/0xa0 [ 43.921322] SyS_ioctl+0x41/0x70 [ 43.921545] entry_SYSCALL_64_fastpath+0x1a/0xa5 [ 43.922188] RIP: 0033:0x7f1145804bc7 [ 43.922526] RSP: 002b:7ffcd3e50508 EFLAGS: 3246 ORIG_RAX: 0010 [ 43.923367] RAX: ffda RBX: 0040 RCX: 7f1145804bc7 [ 43.923852] RDX: 7ffcd3e50540 RSI: c02464bb RDI: 000b [ 43.924299] RBP: 0040 R08: 0040 R09: 000c [ 43.924694] R10: 7ffcd3e50340 R11: 3246 R12: 0018 [ 43.925128] R13: 022bc390 R14: 0040 R15: 7ffcd3e5062c
Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote: > Thanks for reporting this. Can you confirm the following patch prevents > the issue? Nope, it still gripes. [ 43.910362] BUG: sleeping function called from invalid context at mm/slab.h:432 [ 43.910955] in_atomic(): 1, irqs_disabled(): 0, pid: 2077, name: Xorg [ 43.911472] Preemption disabled at: [ 43.911478] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 43.912103] CPU: 0 PID: 2077 Comm: Xorg Tainted: GE 4.12.0-master #38 [ 43.912550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 43.913202] Call Trace: [ 43.913371] dump_stack+0x65/0x89 [ 43.913581] ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 43.913876] ___might_sleep+0x11a/0x190 [ 43.914095] __might_sleep+0x4a/0x80 [ 43.914319] ? qxl_bo_create+0x50/0x190 [qxl] [ 43.914565] kmem_cache_alloc_trace+0x46/0x180 [ 43.914836] qxl_bo_create+0x50/0x190 [qxl] [ 43.915082] ? refcount_dec_and_test+0x11/0x20 [ 43.915332] ? ttm_mem_io_reserve+0x41/0xe0 [ttm] [ 43.915595] qxl_alloc_bo_reserved+0x37/0xb0 [qxl] [ 43.915884] qxl_cursor_atomic_update+0x8f/0x260 [qxl] [ 43.916172] ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 [drm_kms_helper] [ 43.916623] drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper] [ 43.916995] drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper] [ 43.917398] commit_tail+0x65/0x70 [drm_kms_helper] [ 43.917693] drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper] [ 43.918039] drm_atomic_commit+0x4b/0x50 [drm] [ 43.918334] drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper] [ 43.918902] __setplane_internal+0x19f/0x280 [drm] [ 43.919240] drm_mode_cursor_universal+0x101/0x1c0 [drm] [ 43.919541] drm_mode_cursor_common+0x15b/0x1d0 [drm] [ 43.919858] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 43.920157] drm_ioctl+0x211/0x460 [drm] [ 43.920383] ? drm_mode_cursor_ioctl+0x50/0x50 [drm] [ 43.920664] ? handle_mm_fault+0x93/0x160 [ 43.920893] do_vfs_ioctl+0x96/0x6e0 [ 43.921117] ? __fget+0x73/0xa0 [ 43.921322] SyS_ioctl+0x41/0x70 [ 43.921545] entry_SYSCALL_64_fastpath+0x1a/0xa5 [ 43.922188] RIP: 0033:0x7f1145804bc7 [ 43.922526] RSP: 002b:7ffcd3e50508 EFLAGS: 3246 ORIG_RAX: 0010 [ 43.923367] RAX: ffda RBX: 0040 RCX: 7f1145804bc7 [ 43.923852] RDX: 7ffcd3e50540 RSI: c02464bb RDI: 000b [ 43.924299] RBP: 0040 R08: 0040 R09: 000c [ 43.924694] R10: 7ffcd3e50340 R11: 3246 R12: 0018 [ 43.925128] R13: 022bc390 R14: 0040 R15: 7ffcd3e5062c
[drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
Greetings, I'm meeting this splat in master, call io_mapping_map_atomic_wc(), then do something sleepy. In master-rt, I meet a variant of this doing read_lock() in find_next_iomem_res(), due to rwlocks being sleepy in RT. [ 53.286236] BUG: sleeping function called from invalid context at mm/slab.h:432 [ 53.286809] in_atomic(): 1, irqs_disabled(): 0, pid: 2078, name: Xorg [ 53.287354] Preemption disabled at: [ 53.287360] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 53.288028] CPU: 0 PID: 2078 Comm: Xorg Tainted: GE 4.12.0-master #35 [ 53.288530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 53.289337] Call Trace: [ 53.289513] dump_stack+0x65/0x89 [ 53.289739] ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 53.290084] ___might_sleep+0x11a/0x190 [ 53.290336] __might_sleep+0x4a/0x80 [ 53.290575] ? reserve_memtype+0xa5/0x390 [ 53.290841] kmem_cache_alloc_trace+0x46/0x180 [ 53.291185] reserve_memtype+0xa5/0x390 [ 53.291442] ? ttm_bo_kmap+0x1f5/0x230 [ttm] [ 53.291724] __ioremap_caller+0xf2/0x330 [ 53.291987] ? preempt_count_add+0x47/0xb0 [ 53.292284] ioremap_nocache+0x17/0x20 [ 53.292536] ttm_bo_kmap+0x1f5/0x230 [ttm] [ 53.292816] qxl_bo_kmap+0x43/0x70 [qxl] [ 53.293076] qxl_cursor_atomic_update+0x8d/0x260 [qxl] [ 53.293427] ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 [drm_kms_helper] [ 53.293966] drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper] [ 53.294403] drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper] [ 53.294824] commit_tail+0x65/0x70 [drm_kms_helper] [ 53.295145] drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper] [ 53.295549] drm_atomic_commit+0x4b/0x50 [drm] [ 53.295843] drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper] [ 53.296407] __setplane_internal+0x19f/0x280 [drm] [ 53.296737] drm_mode_cursor_universal+0x101/0x1c0 [drm] [ 53.297091] drm_mode_cursor_common+0x15b/0x1d0 [drm] [ 53.297434] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 53.297758] drm_ioctl+0x211/0x460 [drm] [ 53.298205] ? drm_mode_cursor_ioctl+0x50/0x50 [drm] [ 53.298638] ? handle_mm_fault+0x93/0x160 [ 53.298960] do_vfs_ioctl+0x96/0x6e0 [ 53.299200] ? __fget+0x73/0xa0 [ 53.299410] SyS_ioctl+0x41/0x70 [ 53.299630] entry_SYSCALL_64_fastpath+0x1a/0xa5 [ 53.299934] RIP: 0033:0x7fcdedfb0bc7 [ 53.300170] RSP: 002b:7fffbde2ed48 EFLAGS: 3246 ORIG_RAX: 0010 [ 53.300660] RAX: ffda RBX: 0040 RCX: 7fcdedfb0bc7 [ 53.301130] RDX: 7fffbde2ed80 RSI: c02464bb RDI: 000b [ 53.301609] RBP: 0040 R08: 0040 R09: 000c [ 53.302124] R10: 7fffbde2eb80 R11: 3246 R12: 0018 [ 53.302614] R13: 0275b390 R14: 0040 R15: 7fffbde2ee6c
[drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat
Greetings, I'm meeting this splat in master, call io_mapping_map_atomic_wc(), then do something sleepy. In master-rt, I meet a variant of this doing read_lock() in find_next_iomem_res(), due to rwlocks being sleepy in RT. [ 53.286236] BUG: sleeping function called from invalid context at mm/slab.h:432 [ 53.286809] in_atomic(): 1, irqs_disabled(): 0, pid: 2078, name: Xorg [ 53.287354] Preemption disabled at: [ 53.287360] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 53.288028] CPU: 0 PID: 2078 Comm: Xorg Tainted: GE 4.12.0-master #35 [ 53.288530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014 [ 53.289337] Call Trace: [ 53.289513] dump_stack+0x65/0x89 [ 53.289739] ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl] [ 53.290084] ___might_sleep+0x11a/0x190 [ 53.290336] __might_sleep+0x4a/0x80 [ 53.290575] ? reserve_memtype+0xa5/0x390 [ 53.290841] kmem_cache_alloc_trace+0x46/0x180 [ 53.291185] reserve_memtype+0xa5/0x390 [ 53.291442] ? ttm_bo_kmap+0x1f5/0x230 [ttm] [ 53.291724] __ioremap_caller+0xf2/0x330 [ 53.291987] ? preempt_count_add+0x47/0xb0 [ 53.292284] ioremap_nocache+0x17/0x20 [ 53.292536] ttm_bo_kmap+0x1f5/0x230 [ttm] [ 53.292816] qxl_bo_kmap+0x43/0x70 [qxl] [ 53.293076] qxl_cursor_atomic_update+0x8d/0x260 [qxl] [ 53.293427] ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 [drm_kms_helper] [ 53.293966] drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper] [ 53.294403] drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper] [ 53.294824] commit_tail+0x65/0x70 [drm_kms_helper] [ 53.295145] drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper] [ 53.295549] drm_atomic_commit+0x4b/0x50 [drm] [ 53.295843] drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper] [ 53.296407] __setplane_internal+0x19f/0x280 [drm] [ 53.296737] drm_mode_cursor_universal+0x101/0x1c0 [drm] [ 53.297091] drm_mode_cursor_common+0x15b/0x1d0 [drm] [ 53.297434] drm_mode_cursor2_ioctl+0xe/0x10 [drm] [ 53.297758] drm_ioctl+0x211/0x460 [drm] [ 53.298205] ? drm_mode_cursor_ioctl+0x50/0x50 [drm] [ 53.298638] ? handle_mm_fault+0x93/0x160 [ 53.298960] do_vfs_ioctl+0x96/0x6e0 [ 53.299200] ? __fget+0x73/0xa0 [ 53.299410] SyS_ioctl+0x41/0x70 [ 53.299630] entry_SYSCALL_64_fastpath+0x1a/0xa5 [ 53.299934] RIP: 0033:0x7fcdedfb0bc7 [ 53.300170] RSP: 002b:7fffbde2ed48 EFLAGS: 3246 ORIG_RAX: 0010 [ 53.300660] RAX: ffda RBX: 0040 RCX: 7fcdedfb0bc7 [ 53.301130] RDX: 7fffbde2ed80 RSI: c02464bb RDI: 000b [ 53.301609] RBP: 0040 R08: 0040 R09: 000c [ 53.302124] R10: 7fffbde2eb80 R11: 3246 R12: 0018 [ 53.302614] R13: 0275b390 R14: 0040 R15: 7fffbde2ee6c
[patch] device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined! ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined! ERROR: "dax_region_put" [drivers/dax/dax_pmem.ko] undefined! Signed-off-by: Mike Galbraith <efa...@gmx.de> --- drivers/dax/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -19,7 +19,7 @@ config DEV_DAX config DEV_DAX_PMEM tristate "PMEM DAX: direct access to persistent memory" - depends on LIBNVDIMM && NVDIMM_DAX + depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX default DEV_DAX help Support raw access to persistent memory. Note that this
[patch] device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined! ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined! ERROR: "dax_region_put" [drivers/dax/dax_pmem.ko] undefined! Signed-off-by: Mike Galbraith --- drivers/dax/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -19,7 +19,7 @@ config DEV_DAX config DEV_DAX_PMEM tristate "PMEM DAX: direct access to persistent memory" - depends on LIBNVDIMM && NVDIMM_DAX + depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX default DEV_DAX help Support raw access to persistent memory. Note that this
Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic
On Fri, 2017-05-05 at 06:26 +0200, Mike Galbraith wrote: > > To get rteval: > > > > $ git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git > > $ cd rteval > > $ git checkout origin/v2/master > > ImportError: No module named ethtool > > Where does one find the ethtool bits it seems to depend upon? Nevermind, google found a tarball. -Mike
Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic
On Fri, 2017-05-05 at 06:26 +0200, Mike Galbraith wrote: > > To get rteval: > > > > $ git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git > > $ cd rteval > > $ git checkout origin/v2/master > > ImportError: No module named ethtool > > Where does one find the ethtool bits it seems to depend upon? Nevermind, google found a tarball. -Mike
Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic
> To get rteval: > > $ git clone > git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git > $ cd rteval > $ git checkout origin/v2/master ImportError: No module named ethtool Where does one find the ethtool bits it seems to depend upon? -Mike
Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic
> To get rteval: > > $ git clone > git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git > $ cd rteval > $ git checkout origin/v2/master ImportError: No module named ethtool Where does one find the ethtool bits it seems to depend upon? -Mike