Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Mike Galbraith
On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> > 
> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> > too much trouble, a bisect would be pretty useful.
> 
> Bisection seemingly went fine, but the result is odd.
> 
> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

But it really really is bad.  Looking at gitk fork in the road leading
to it...

52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
e4e818cc2d7c drm: make drm_panel.h self-contained - good
9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good

Before the git highway splits, all is well.  The lane with commits
works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Mike Galbraith
On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> > 
> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> > too much trouble, a bisect would be pretty useful.
> 
> Bisection seemingly went fine, but the result is odd.
> 
> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

But it really really is bad.  Looking at gitk fork in the road leading
to it...

52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
e4e818cc2d7c drm: make drm_panel.h self-contained - good
9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good

Before the git highway splits, all is well.  The lane with commits
works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Mike Galbraith
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.

Bisection seemingly went fine, but the result is odd.

e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

-Mike





Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Mike Galbraith
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.

Bisection seemingly went fine, but the result is odd.

e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

-Mike





Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Mike Galbraith
On Tue, 2017-07-11 at 20:53 +0200, Mike Galbraith wrote:
> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> > too much trouble, a bisect would be pretty useful.
> 
> Vacation -> back to work happens in the very early AM, so bisection
> will have to wait a bit.

Hm, my backup workstation (old GeForce 8600 GT box) has the same issue,
so perhaps I can bisect it as I work on backlog (multitasking: screw up
multiple tasks concurrently).

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Mike Galbraith
On Tue, 2017-07-11 at 20:53 +0200, Mike Galbraith wrote:
> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> > too much trouble, a bisect would be pretty useful.
> 
> Vacation -> back to work happens in the very early AM, so bisection
> will have to wait a bit.

Hm, my backup workstation (old GeForce 8600 GT box) has the same issue,
so perhaps I can bisect it as I work on backlog (multitasking: screw up
multiple tasks concurrently).

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Mike Galbraith
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> OK, thanks. So in other words, a fairly standard desktop with a PCIe
> board plugged in. No funny business. (Laptops can create a ton of
> additional weirdness, which I assumed you had since you were talking
> about STR.)

Yup, garden variety deskside box.

> My best guess is that gf119_head_vblank_put either has a bogus head id
> (should be in the 0..3 range) which causes it to do an out-of-bounds
> read on MMIO space, or that the MMIO mapping has already been removed
> by the time nouveau_display_suspend runs. Adding Ben Skeggs for
> additional insight.
> 
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.

Vacation -> back to work happens in the very early AM, so bisection
will have to wait a bit.

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Mike Galbraith
On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> 
> OK, thanks. So in other words, a fairly standard desktop with a PCIe
> board plugged in. No funny business. (Laptops can create a ton of
> additional weirdness, which I assumed you had since you were talking
> about STR.)

Yup, garden variety deskside box.

> My best guess is that gf119_head_vblank_put either has a bogus head id
> (should be in the 0..3 range) which causes it to do an out-of-bounds
> read on MMIO space, or that the MMIO mapping has already been removed
> by the time nouveau_display_suspend runs. Adding Ben Skeggs for
> additional insight.
> 
> Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> too much trouble, a bisect would be pretty useful.

Vacation -> back to work happens in the very early AM, so bisection
will have to wait a bit.

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Mike Galbraith
On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:
> Some details that may be useful in analysis of the bug:
> 
> 1. lspci -nn -d 10de:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 
980] [10de:13c0] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio 
Controller [10de:0fbb] (rev a1

> 2. What displays, if any, you have plugged into the NVIDIA board when
> this happens?

A Philips 273V, via DVI.

> 3. Any boot parameters, esp relating to ACPI, PM, or related?

None for those, what's there that will be unfamiliar to you are for
patches that aren't applied.

nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0
nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60
ignore_loglevel crashkernel=256M,high

-Mike


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Mike Galbraith
On Tue, 2017-07-11 at 13:51 -0400, Ilia Mirkin wrote:
> Some details that may be useful in analysis of the bug:
> 
> 1. lspci -nn -d 10de:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 
980] [10de:13c0] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio 
Controller [10de:0fbb] (rev a1

> 2. What displays, if any, you have plugged into the NVIDIA board when
> this happens?

A Philips 273V, via DVI.

> 3. Any boot parameters, esp relating to ACPI, PM, or related?

None for those, what's there that will be unfamiliar to you are for
patches that aren't applied.

nortsched hpc_cpusets skew_tick=1 ftrace_dump_on_oops audit=0
nodelayacct cgroup_disable=memory rtkthreads=1 rtworkqueues=2 panic=60
ignore_loglevel crashkernel=256M,high

-Mike


[regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Mike Galbraith
Greetings,

I met $subject in master-rt post drm merge, but taking the config
(attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible.

  KERNEL: vmlinux-4.12.0.g9967468-preempt.gz
DUMPFILE: vmcore
CPUS: 8
DATE: Tue Jul 11 18:55:28 2017
  UPTIME: 00:02:03
LOAD AVERAGE: 3.43, 1.39, 0.52
   TASKS: 467
NODENAME: homer
 RELEASE: 4.12.0.g9967468-preempt
 VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017
 MACHINE: x86_64  (3591 Mhz)
  MEMORY: 16 GB
   PANIC: "BUG: unable to handle kernel paging request at a022990f"
 PID: 4658
 COMMAND: "kworker/u16:26"
TASK: 8803c6068f80  [THREAD_INFO: 8803c6068f80]
 CPU: 7
   STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 4658   TASK: 8803c6068f80  CPU: 7   COMMAND: "kworker/u16:26"
 #0 [c900039f76a0] machine_kexec at 810481fc
 #1 [c900039f76f0] __crash_kexec at 81109e3a
 #2 [c900039f77b0] crash_kexec at 8110adc9
 #3 [c900039f77c8] oops_end at 8101d059
 #4 [c900039f77e8] no_context at 81055ce5
 #5 [c900039f7838] do_page_fault at 81056c5b
 #6 [c900039f7860] page_fault at 81690a88
[exception RIP: report_bug+93]
RIP: 8167227d  RSP: c900039f7918  RFLAGS: 00010002
RAX: a0229905  RBX: a020af0f  RCX: 0001
RDX: 0907  RSI: a020af11  RDI: 98f6
RBP: c900039f7a58   R8: 0001   R9: 03fc
R10: 81a01906  R11: 8803f84711f8  R12: a02231fb
R13: 0260  R14: 0004  R15: 0006
ORIG_RAX:   CS: 0010  SS: 0018
 #7 [c900039f7910] report_bug at 81672248
 #8 [c900039f7938] fixup_bug at 8101af85
 #9 [c900039f7950] do_trap at 8101b0d9
#10 [c900039f79a0] do_error_trap at 8101b190
#11 [c900039f7a50] invalid_op at 8169063e
[exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335]
RIP: a020af0f  RSP: c900039f7b00  RFLAGS: 00010086
RAX: a04fa100  RBX: 8803f9550800  RCX: 0001
RDX: a0228a58  RSI: 0001  RDI: a022321b
RBP: c900039f7b80   R8:    R9: a020adc0
R10: a048a1b0  R11: 8803f84711f8  R12: 0001
R13: 8803f8471000  R14: c900039f7b94  R15: c900039f7bd0
ORIG_RAX:   CS: 0010  SS: 0018
#12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau]
#13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm]
#14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm]
#15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm]
#16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm]
#17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau]
#18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau]
#19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau]
#20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau]
#21 [c900039f7d50] pci_pm_suspend at 813b1ff0
#22 [c900039f7d80] dpm_run_callback at 814c4dbd
#23 [c900039f7db8] __device_suspend at 814c5a61
#24 [c900039f7e30] async_suspend at 814c5cfa
#25 [c900039f7e48] async_run_entry_fn at 81091683
#26 [c900039f7e70] process_one_work at 810882bc
#27 [c900039f7eb0] worker_thread at 8108854a
#28 [c900039f7f10] kthread at 8108e387
#29 [c900039f7f50] ret_from_fork at 8168fa85
crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335
0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos 
(drivers/gpu/drm/drm_vblank.c:608).
603 /* If mode timing undefined, just return as no-op:
604  * Happens during initial modesetting of a crtc.
605  */
606 if (mode->crtc_clock == 0) {
607 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", 
pipe);
608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
609 
610 return false;
611 }
612 
crash> gdb list *report_bug+93
0x8167227d is in report_bug (lib/bug.c:177).
172 return BUG_TRAP_TYPE_WARN;
173 
174 /*
175  * Since this is the only store, concurrency is 
not an issue.
176  */
177 bug->flags |= BUGFLAG_DONE;
178 }
179 }
180 
181 if (warning) {
crash>

config.xz
Description: application/xz


[regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-11 Thread Mike Galbraith
Greetings,

I met $subject in master-rt post drm merge, but taking the config
(attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible.

  KERNEL: vmlinux-4.12.0.g9967468-preempt.gz
DUMPFILE: vmcore
CPUS: 8
DATE: Tue Jul 11 18:55:28 2017
  UPTIME: 00:02:03
LOAD AVERAGE: 3.43, 1.39, 0.52
   TASKS: 467
NODENAME: homer
 RELEASE: 4.12.0.g9967468-preempt
 VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017
 MACHINE: x86_64  (3591 Mhz)
  MEMORY: 16 GB
   PANIC: "BUG: unable to handle kernel paging request at a022990f"
 PID: 4658
 COMMAND: "kworker/u16:26"
TASK: 8803c6068f80  [THREAD_INFO: 8803c6068f80]
 CPU: 7
   STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 4658   TASK: 8803c6068f80  CPU: 7   COMMAND: "kworker/u16:26"
 #0 [c900039f76a0] machine_kexec at 810481fc
 #1 [c900039f76f0] __crash_kexec at 81109e3a
 #2 [c900039f77b0] crash_kexec at 8110adc9
 #3 [c900039f77c8] oops_end at 8101d059
 #4 [c900039f77e8] no_context at 81055ce5
 #5 [c900039f7838] do_page_fault at 81056c5b
 #6 [c900039f7860] page_fault at 81690a88
[exception RIP: report_bug+93]
RIP: 8167227d  RSP: c900039f7918  RFLAGS: 00010002
RAX: a0229905  RBX: a020af0f  RCX: 0001
RDX: 0907  RSI: a020af11  RDI: 98f6
RBP: c900039f7a58   R8: 0001   R9: 03fc
R10: 81a01906  R11: 8803f84711f8  R12: a02231fb
R13: 0260  R14: 0004  R15: 0006
ORIG_RAX:   CS: 0010  SS: 0018
 #7 [c900039f7910] report_bug at 81672248
 #8 [c900039f7938] fixup_bug at 8101af85
 #9 [c900039f7950] do_trap at 8101b0d9
#10 [c900039f79a0] do_error_trap at 8101b190
#11 [c900039f7a50] invalid_op at 8169063e
[exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335]
RIP: a020af0f  RSP: c900039f7b00  RFLAGS: 00010086
RAX: a04fa100  RBX: 8803f9550800  RCX: 0001
RDX: a0228a58  RSI: 0001  RDI: a022321b
RBP: c900039f7b80   R8:    R9: a020adc0
R10: a048a1b0  R11: 8803f84711f8  R12: 0001
R13: 8803f8471000  R14: c900039f7b94  R15: c900039f7bd0
ORIG_RAX:   CS: 0010  SS: 0018
#12 [c900039f7b18] gf119_head_vblank_put at a04422f9 [nouveau]
#13 [c900039f7b88] drm_get_last_vbltimestamp at a020ad91 [drm]
#14 [c900039f7ba8] drm_update_vblank_count at a020b3e1 [drm]
#15 [c900039f7c10] drm_vblank_disable_and_save at a020bbe9 [drm]
#16 [c900039f7c40] drm_crtc_vblank_off at a020c3c0 [drm]
#17 [c900039f7cb0] nouveau_display_fini at a048a4d6 [nouveau]
#18 [c900039f7ce0] nouveau_display_suspend at a048ac4f [nouveau]
#19 [c900039f7d00] nouveau_do_suspend at a047e5ec [nouveau]
#20 [c900039f7d38] nouveau_pmops_suspend at a047e77d [nouveau]
#21 [c900039f7d50] pci_pm_suspend at 813b1ff0
#22 [c900039f7d80] dpm_run_callback at 814c4dbd
#23 [c900039f7db8] __device_suspend at 814c5a61
#24 [c900039f7e30] async_suspend at 814c5cfa
#25 [c900039f7e48] async_run_entry_fn at 81091683
#26 [c900039f7e70] process_one_work at 810882bc
#27 [c900039f7eb0] worker_thread at 8108854a
#28 [c900039f7f10] kthread at 8108e387
#29 [c900039f7f50] ret_from_fork at 8168fa85
crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335
0xa020af0f is in drm_calc_vbltimestamp_from_scanoutpos 
(drivers/gpu/drm/drm_vblank.c:608).
603 /* If mode timing undefined, just return as no-op:
604  * Happens during initial modesetting of a crtc.
605  */
606 if (mode->crtc_clock == 0) {
607 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", 
pipe);
608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
609 
610 return false;
611 }
612 
crash> gdb list *report_bug+93
0x8167227d is in report_bug (lib/bug.c:177).
172 return BUG_TRAP_TYPE_WARN;
173 
174 /*
175  * Since this is the only store, concurrency is 
not an issue.
176  */
177 bug->flags |= BUGFLAG_DONE;
178 }
179 }
180 
181 if (warning) {
crash>

config.xz
Description: application/xz


arch/x86/kernel/ftrace.o: warning: objtool: ftrace_modify_code_direct()+0x2d: sibling call from callable instruction with modified stack frame

2017-07-06 Thread Mike Galbraith
Greetings,

I'm seeing $subject in master as of this merge window.  I don't see it
with my usual stripped down PREEMPT_VOLUNTARY or an enterprise-ish
NOPREEMPT config, but do see it with the attached, derived from my
enterprise-ish PREEMPT_RT config after running make oldconfig on it,
and reproducing the warning in virgin source.

-Mike

config.xz
Description: application/xz


arch/x86/kernel/ftrace.o: warning: objtool: ftrace_modify_code_direct()+0x2d: sibling call from callable instruction with modified stack frame

2017-07-06 Thread Mike Galbraith
Greetings,

I'm seeing $subject in master as of this merge window.  I don't see it
with my usual stripped down PREEMPT_VOLUNTARY or an enterprise-ish
NOPREEMPT config, but do see it with the attached, derived from my
enterprise-ish PREEMPT_RT config after running make oldconfig on it,
and reproducing the warning in virgin source.

-Mike

config.xz
Description: application/xz


Re: wake_wide mechanism clarification

2017-06-30 Thread Mike Galbraith
On Fri, 2017-06-30 at 10:28 -0400, Josef Bacik wrote:
> On Thu, Jun 29, 2017 at 08:04:59PM -0700, Joel Fernandes wrote:
> 
> > That makes sense that we multiply slave's flips by a factor because
> > its low, but I still didn't get why the factor is chosen to be
> > llc_size instead of something else for the multiplication with slave
> > (slave * factor).

> Yeah I don't know why llc_size was chosen...

static void update_top_cache_domain(int cpu)
{
struct sched_domain_shared *sds = NULL;
struct sched_domain *sd;
int id = cpu;
int size = 1;

sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES);
if (sd) {
id = cpumask_first(sched_domain_span(sd));
size = cpumask_weight(sched_domain_span(sd));
sds = sd->shared;
}

rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
per_cpu(sd_llc_size, cpu) = size;

The goal of wake wide was to approximate when pulling would be a futile
consolidation effort and counterproductive to scaling.  'course with
ever increasing socket size, any 1:N waker is ever more likely to run
out of CPU for its one and only self (slamming into scaling wall)
before it needing to turn its minions loose to conquer the world.

Something else to consider: network interrupt waking multiple workers
at high frequency.  If the waking CPU is idle, do you really want to
place a worker directly in front of a tattoo artist, or is it better
off nearly anywhere but there?

If the box is virtual, with no topology exposed (or real but ancient)
to let select_idle_sibling() come to the rescue, two workers can even
get tattooed simultaneously (see sync wakeup). 

-Mike


Re: wake_wide mechanism clarification

2017-06-30 Thread Mike Galbraith
On Fri, 2017-06-30 at 10:28 -0400, Josef Bacik wrote:
> On Thu, Jun 29, 2017 at 08:04:59PM -0700, Joel Fernandes wrote:
> 
> > That makes sense that we multiply slave's flips by a factor because
> > its low, but I still didn't get why the factor is chosen to be
> > llc_size instead of something else for the multiplication with slave
> > (slave * factor).

> Yeah I don't know why llc_size was chosen...

static void update_top_cache_domain(int cpu)
{
struct sched_domain_shared *sds = NULL;
struct sched_domain *sd;
int id = cpu;
int size = 1;

sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES);
if (sd) {
id = cpumask_first(sched_domain_span(sd));
size = cpumask_weight(sched_domain_span(sd));
sds = sd->shared;
}

rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
per_cpu(sd_llc_size, cpu) = size;

The goal of wake wide was to approximate when pulling would be a futile
consolidation effort and counterproductive to scaling.  'course with
ever increasing socket size, any 1:N waker is ever more likely to run
out of CPU for its one and only self (slamming into scaling wall)
before it needing to turn its minions loose to conquer the world.

Something else to consider: network interrupt waking multiple workers
at high frequency.  If the waking CPU is idle, do you really want to
place a worker directly in front of a tattoo artist, or is it better
off nearly anywhere but there?

If the box is virtual, with no topology exposed (or real but ancient)
to let select_idle_sibling() come to the rescue, two workers can even
get tattooed simultaneously (see sync wakeup). 

-Mike


Re: wake_wide mechanism clarification

2017-06-29 Thread Mike Galbraith
On Thu, 2017-06-29 at 20:49 -0400, Josef Bacik wrote:
> On Thu, Jun 29, 2017 at 05:19:14PM -0700, Joel Fernandes wrote:
> 
> > Why are wanting the master's flip frequency to be higher than the
> > slaves by the factor?
> 
> (Responding from my personal email as my work email is outlook shit and
> impossible to use)
> 
> Because we are trying to detect the case that the master is waking many
> different processes, and the 'slave' processes are only waking up the
> master/some other specific processes to determine if we don't care about cache
> locality.

Yes, the heuristic (large delta implies waker/wakee are NOT 1:1, ie
filter out high frequency communication where eating misses doesn't
merely sting, it hurts like hell) just became bidirectional.

> Actually I think both are wrong, but I need Mike to weigh in.

My weigh in is this: if you have ideas to improve or replace that
heuristic, by all means go for it, just make damn sure it's dirt cheap.
 Heuristics all suck one way or another, problem is that nasty old
"perfect is the enemy of good" adage.  Make it perfect, it'll hurt.

-Mike


Re: wake_wide mechanism clarification

2017-06-29 Thread Mike Galbraith
On Thu, 2017-06-29 at 20:49 -0400, Josef Bacik wrote:
> On Thu, Jun 29, 2017 at 05:19:14PM -0700, Joel Fernandes wrote:
> 
> > Why are wanting the master's flip frequency to be higher than the
> > slaves by the factor?
> 
> (Responding from my personal email as my work email is outlook shit and
> impossible to use)
> 
> Because we are trying to detect the case that the master is waking many
> different processes, and the 'slave' processes are only waking up the
> master/some other specific processes to determine if we don't care about cache
> locality.

Yes, the heuristic (large delta implies waker/wakee are NOT 1:1, ie
filter out high frequency communication where eating misses doesn't
merely sting, it hurts like hell) just became bidirectional.

> Actually I think both are wrong, but I need Mike to weigh in.

My weigh in is this: if you have ideas to improve or replace that
heuristic, by all means go for it, just make damn sure it's dirt cheap.
 Heuristics all suck one way or another, problem is that nasty old
"perfect is the enemy of good" adage.  Make it perfect, it'll hurt.

-Mike


Re: [ANNOUNCE] v4.11.7-rt3

2017-06-28 Thread Mike Galbraith
On Thu, 2017-06-29 at 04:55 +0200, Mike Galbraith wrote:
> 
> cpus_allowed became cpus_mask.  Anything (crash.. hohum, yet again)
> that rummages around in the kernels gizzard will have to adapt.

(wrt crash: nope, it doesn't care for a change)


Re: [ANNOUNCE] v4.11.7-rt3

2017-06-28 Thread Mike Galbraith
On Thu, 2017-06-29 at 04:55 +0200, Mike Galbraith wrote:
> 
> cpus_allowed became cpus_mask.  Anything (crash.. hohum, yet again)
> that rummages around in the kernels gizzard will have to adapt.

(wrt crash: nope, it doesn't care for a change)


Re: [ANNOUNCE] v4.11.7-rt3

2017-06-28 Thread Mike Galbraith
On Thu, 2017-06-29 at 02:30 +0200, Bernhard Landauer wrote:
> I'm unable to compile the ndiswrapper extramodule against 4.11.7-rt3
> 
> I get lots of errors of the kind
> 
> 'struct task_struct' has no member named 'cpus_allowed'; did you mean 
> 'nr_cpus_allowed'?
> 
> What can I do?

cpus_allowed became cpus_mask.  Anything (crash.. hohum, yet again)
that rummages around in the kernels gizzard will have to adapt.

-Mike


Re: [ANNOUNCE] v4.11.7-rt3

2017-06-28 Thread Mike Galbraith
On Thu, 2017-06-29 at 02:30 +0200, Bernhard Landauer wrote:
> I'm unable to compile the ndiswrapper extramodule against 4.11.7-rt3
> 
> I get lots of errors of the kind
> 
> 'struct task_struct' has no member named 'cpus_allowed'; did you mean 
> 'nr_cpus_allowed'?
> 
> What can I do?

cpus_allowed became cpus_mask.  Anything (crash.. hohum, yet again)
that rummages around in the kernels gizzard will have to adapt.

-Mike


Re: [patch-rt v2] rtmutex: Fix lock stealing logic

2017-06-23 Thread Mike Galbraith
On Fri, 2017-06-23 at 09:33 -0400, Steven Rostedt wrote:
> struct task_struct *task,
> > @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
> >  */
> > if (waiter) {
> > /*
> > -* If waiter is not the highest priority waiter of
> > -* @lock, give up.
> > +* If waiter is not the highest priority waiter of @lock,
> > +* or its peer when lateral steal is allowed, give up.
> >  */
> > -   if (waiter != rt_mutex_top_waiter(lock)) {
> > -   /* XXX rt_mutex_waiter_less() ? */
> > +   if (!rt_mutex_steal(lock, waiter, mode))
> > return 0;
> > -   }
> > -
> > /*
> >  * We can acquire the lock. Remove the waiter from the
> >  * lock waiters tree.
> >  */
> > rt_mutex_dequeue(lock, waiter);
> > -
> 
> I liked that space.

I like minus signs in diffstat, that one was a freebee.  Maintainers
can revive it if they like, or I can post a V3 with it revived as well
as s/rt_mutex_steal/rt_mutex_claim.

-Mike


Re: [patch-rt v2] rtmutex: Fix lock stealing logic

2017-06-23 Thread Mike Galbraith
On Fri, 2017-06-23 at 09:33 -0400, Steven Rostedt wrote:
> struct task_struct *task,
> > @@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
> >  */
> > if (waiter) {
> > /*
> > -* If waiter is not the highest priority waiter of
> > -* @lock, give up.
> > +* If waiter is not the highest priority waiter of @lock,
> > +* or its peer when lateral steal is allowed, give up.
> >  */
> > -   if (waiter != rt_mutex_top_waiter(lock)) {
> > -   /* XXX rt_mutex_waiter_less() ? */
> > +   if (!rt_mutex_steal(lock, waiter, mode))
> > return 0;
> > -   }
> > -
> > /*
> >  * We can acquire the lock. Remove the waiter from the
> >  * lock waiters tree.
> >  */
> > rt_mutex_dequeue(lock, waiter);
> > -
> 
> I liked that space.

I like minus signs in diffstat, that one was a freebee.  Maintainers
can revive it if they like, or I can post a V3 with it revived as well
as s/rt_mutex_steal/rt_mutex_claim.

-Mike


Re: [patch-rt v2] rtmutex: Fix lock stealing logic

2017-06-23 Thread Mike Galbraith
On Fri, 2017-06-23 at 09:37 +0200, Mike Galbraith wrote:
> V2 changes:
>   - beautification (ymmv)
>   - enable lock stealing when waiter is queued

(V3 s/steal/claim)


Re: [patch-rt v2] rtmutex: Fix lock stealing logic

2017-06-23 Thread Mike Galbraith
On Fri, 2017-06-23 at 09:37 +0200, Mike Galbraith wrote:
> V2 changes:
>   - beautification (ymmv)
>   - enable lock stealing when waiter is queued

(V3 s/steal/claim)


[patch-rt v2] rtmutex: Fix lock stealing logic

2017-06-23 Thread Mike Galbraith
V2 changes:
  - beautification (ymmv)
  - enable lock stealing when waiter is queued

rtmutex: Fix lock stealing logic

1. When trying to acquire an rtmutex, we first try to grab it without
queueing the waiter, and explicitly check for that initial attempt
in the !waiter path of __try_to_take_rt_mutex().  Checking whether
the lock taker is top waiter before allowing a steal attempt in that
path is a thinko: the lock taker has not yet blocked.

2. It seems wrong to change the definition of rt_mutex_waiter_less()
to mean less or perhaps equal when we have an rt_mutex_waiter_equal().

Remove the thinko, restore rt_mutex_waiter_less(), implement and use
rt_mutex_steal() based upon rt_mutex_waiter_less/equal(), moving all
qualification criteria into the function itself.

Signed-off-by: Mike Galbraith <efa...@gmx.de>
---
 kernel/locking/rtmutex.c |   76 ++-
 1 file changed, 37 insertions(+), 39 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
-#define STEAL_NORMAL  0
-#define STEAL_LATERAL 1
-
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
-#define task_to_waiter(p)  \
-   &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = 
(p)->dl.deadline }
+#define task_to_waiter(p) &(struct rt_mutex_waiter) \
+   { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
-struct rt_mutex_waiter *right, int mode)
+struct rt_mutex_waiter *right)
 {
-   if (mode == STEAL_NORMAL) {
-   if (left->prio < right->prio)
-   return 1;
-   } else {
-   if (left->prio <= right->prio)
-   return 1;
-   }
+   if (left->prio < right->prio)
+   return 1;
+
/*
 * If both waiters have dl_prio(), we check the deadlines of the
 * associated tasks.
@@ -287,6 +280,27 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
return 1;
 }
 
+#define STEAL_NORMAL  0
+#define STEAL_LATERAL 1
+
+static inline int
+rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode)
+{
+   struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock);
+
+   if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter))
+   return 1;
+
+   /*
+* Note that RT tasks are excluded from lateral-steals
+* to prevent the introduction of an unbounded latency.
+*/
+   if (mode == STEAL_NORMAL || rt_task(waiter->task))
+   return 0;
+
+   return rt_mutex_waiter_equal(waiter, top_waiter);
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
@@ -298,7 +312,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -337,7 +351,7 @@ rt_mutex_enqueue_pi(struct task_struct *
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -847,6 +861,7 @@ static int rt_mutex_adjust_prio_chain(st
  * @task:   The task which wants to acquire the lock
  * @waiter: The waiter that is queued to the lock's wait tree if the
  * callsite called task_blocked_on_lock(), otherwise NULL
+ * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
  */
 static int __try_to_take_rt_mutex(struct rt_mutex *lock,
  struct task_struct *task,
@@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
 */
if (waiter) {
/*
-* If waiter is not the highest priority waiter of
-* @lock, give up.
+* If waiter is not the highest priority waiter of @lock,
+* or its peer when lateral steal is allowed, give up.
 */
-   if (waiter != rt_mutex_top_waiter(lock)) {
-   /* XXX rt_mutex_waiter_less() ? */
+   if (!rt_mutex_steal(lock, waiter, mode))
return 0;
-   }
-
/*
 * We can acquire the lock. Remove the waiter from the
 * lock waiters tree.

[patch-rt v2] rtmutex: Fix lock stealing logic

2017-06-23 Thread Mike Galbraith
V2 changes:
  - beautification (ymmv)
  - enable lock stealing when waiter is queued

rtmutex: Fix lock stealing logic

1. When trying to acquire an rtmutex, we first try to grab it without
queueing the waiter, and explicitly check for that initial attempt
in the !waiter path of __try_to_take_rt_mutex().  Checking whether
the lock taker is top waiter before allowing a steal attempt in that
path is a thinko: the lock taker has not yet blocked.

2. It seems wrong to change the definition of rt_mutex_waiter_less()
to mean less or perhaps equal when we have an rt_mutex_waiter_equal().

Remove the thinko, restore rt_mutex_waiter_less(), implement and use
rt_mutex_steal() based upon rt_mutex_waiter_less/equal(), moving all
qualification criteria into the function itself.

Signed-off-by: Mike Galbraith 
---
 kernel/locking/rtmutex.c |   76 ++-
 1 file changed, 37 insertions(+), 39 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
-#define STEAL_NORMAL  0
-#define STEAL_LATERAL 1
-
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
-#define task_to_waiter(p)  \
-   &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = 
(p)->dl.deadline }
+#define task_to_waiter(p) &(struct rt_mutex_waiter) \
+   { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
-struct rt_mutex_waiter *right, int mode)
+struct rt_mutex_waiter *right)
 {
-   if (mode == STEAL_NORMAL) {
-   if (left->prio < right->prio)
-   return 1;
-   } else {
-   if (left->prio <= right->prio)
-   return 1;
-   }
+   if (left->prio < right->prio)
+   return 1;
+
/*
 * If both waiters have dl_prio(), we check the deadlines of the
 * associated tasks.
@@ -287,6 +280,27 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
return 1;
 }
 
+#define STEAL_NORMAL  0
+#define STEAL_LATERAL 1
+
+static inline int
+rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode)
+{
+   struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock);
+
+   if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter))
+   return 1;
+
+   /*
+* Note that RT tasks are excluded from lateral-steals
+* to prevent the introduction of an unbounded latency.
+*/
+   if (mode == STEAL_NORMAL || rt_task(waiter->task))
+   return 0;
+
+   return rt_mutex_waiter_equal(waiter, top_waiter);
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
@@ -298,7 +312,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -337,7 +351,7 @@ rt_mutex_enqueue_pi(struct task_struct *
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -847,6 +861,7 @@ static int rt_mutex_adjust_prio_chain(st
  * @task:   The task which wants to acquire the lock
  * @waiter: The waiter that is queued to the lock's wait tree if the
  * callsite called task_blocked_on_lock(), otherwise NULL
+ * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
  */
 static int __try_to_take_rt_mutex(struct rt_mutex *lock,
  struct task_struct *task,
@@ -886,20 +901,16 @@ static int __try_to_take_rt_mutex(struct
 */
if (waiter) {
/*
-* If waiter is not the highest priority waiter of
-* @lock, give up.
+* If waiter is not the highest priority waiter of @lock,
+* or its peer when lateral steal is allowed, give up.
 */
-   if (waiter != rt_mutex_top_waiter(lock)) {
-   /* XXX rt_mutex_waiter_less() ? */
+   if (!rt_mutex_steal(lock, waiter, mode))
return 0;
-   }
-
/*
 * We can acquire the lock. Remove the waiter from the
 * lock waiters tree.
 */
  

Re: [ANNOUNCE] v4.11.5-rt1

2017-06-22 Thread Mike Galbraith
On Thu, 2017-06-22 at 19:30 +0200, Mike Galbraith wrote:
> On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > > See ! and ?
> > 
> > See see.
> > What about this:
> 
> I'll give it a go, likely during the weekend.

What the heck, after so much time expended squabbling with this bugger,
beans/biscuits work will hardly notice a mere hour or so...

Yup, it's confirmed dead, and no doubt my GUI goblins will be too.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-22 Thread Mike Galbraith
On Thu, 2017-06-22 at 19:30 +0200, Mike Galbraith wrote:
> On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > > See ! and ?
> > 
> > See see.
> > What about this:
> 
> I'll give it a go, likely during the weekend.

What the heck, after so much time expended squabbling with this bugger,
beans/biscuits work will hardly notice a mere hour or so...

Yup, it's confirmed dead, and no doubt my GUI goblins will be too.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-22 Thread Mike Galbraith
On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > See ! and ?
> 
> See see.
> What about this:

I'll give it a go, likely during the weekend.

I moved 4.11-rt today (also repros nicely) due to ftrace annoying me.
 After yet more staring at ever more huge traces (opposite of goal;),
then taking a break to stare at source again, I decided that the dual
wake_q business should die.. and the stall died with it.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1014,8 +1014,20 @@ struct wake_q_head {
>  #define WAKE_Q(name) \
>   struct wake_q_head name = { WAKE_Q_TAIL,  }
>  
> -extern void wake_q_add(struct wake_q_head *head,
> -   struct task_struct *task);
> +extern void __wake_q_add(struct wake_q_head *head,
> +  struct task_struct *task, bool sleeper);
> +static inline void wake_q_add(struct wake_q_head *head,
> +   struct task_struct *task)
> +{
> + __wake_q_add(head, task, false);
> +}
> +
> +static inline void wake_q_add_sleeper(struct wake_q_head *head,
> +   struct task_struct *task)
> +{
> + __wake_q_add(head, task, true);
> +}
> +
>  extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
>  
>  static inline void wake_up_q(struct wake_q_head *head)
> @@ -1745,6 +1757,7 @@ struct task_struct {
>   raw_spinlock_t pi_lock;
>  
>   struct wake_q_node wake_q;
> + struct wake_q_node wake_q_sleeper;
>  
>  #ifdef CONFIG_RT_MUTEXES
>   /* PI waiters blocked on a rt_mutex held by this task */
> diff --git a/kernel/fork.c b/kernel/fork.c
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
>   tsk->splice_pipe = NULL;
>   tsk->task_frag.page = NULL;
>   tsk->wake_q.next = NULL;
> + tsk->wake_q_sleeper.next = NULL;
>  
>   account_kernel_stack(tsk, 1);
>  
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head 
> *wake_q,
>*/
>   preempt_disable();
>   if (waiter->savestate)
> - wake_q_add(wake_sleeper_q, waiter->task);
> + wake_q_add_sleeper(wake_sleeper_q, waiter->task);
>   else
>   wake_q_add(wake_q, waiter->task);
>   raw_spin_unlock(>pi_lock);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p)
>  #endif
>  #endif
>  
> -void wake_q_add(struct wake_q_head *head, struct task_struct *task)
> +void __wake_q_add(struct wake_q_head *head, struct task_struct *task,
> +   bool sleeper)
>  {
> - struct wake_q_node *node = >wake_q;
> + struct wake_q_node *node;
> +
> + if (sleeper)
> + node = >wake_q_sleeper;
> + else
> + node = >wake_q;
>  
>   /*
>* Atomically grab the task, if ->wake_q is !nil already it means
> @@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper)
>   while (node != WAKE_Q_TAIL) {
>   struct task_struct *task;
>  
> - task = container_of(node, struct task_struct, wake_q);
> + if (sleeper)
> + task = container_of(node, struct task_struct, 
> wake_q_sleeper);
> + else
> + task = container_of(node, struct task_struct, wake_q);
>   BUG_ON(!task);
>   /* task can safely be re-inserted now */
>   node = node->next;
> - task->wake_q.next = NULL;
> + if (sleeper)
> + task->wake_q_sleeper.next = NULL;
> + else
> + task->wake_q.next = NULL;
>  
>   /*
>* wake_up_process() implies a wmb() to pair with the queueing


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-22 Thread Mike Galbraith
On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > See ! and ?
> 
> See see.
> What about this:

I'll give it a go, likely during the weekend.

I moved 4.11-rt today (also repros nicely) due to ftrace annoying me.
 After yet more staring at ever more huge traces (opposite of goal;),
then taking a break to stare at source again, I decided that the dual
wake_q business should die.. and the stall died with it.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1014,8 +1014,20 @@ struct wake_q_head {
>  #define WAKE_Q(name) \
>   struct wake_q_head name = { WAKE_Q_TAIL,  }
>  
> -extern void wake_q_add(struct wake_q_head *head,
> -   struct task_struct *task);
> +extern void __wake_q_add(struct wake_q_head *head,
> +  struct task_struct *task, bool sleeper);
> +static inline void wake_q_add(struct wake_q_head *head,
> +   struct task_struct *task)
> +{
> + __wake_q_add(head, task, false);
> +}
> +
> +static inline void wake_q_add_sleeper(struct wake_q_head *head,
> +   struct task_struct *task)
> +{
> + __wake_q_add(head, task, true);
> +}
> +
>  extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
>  
>  static inline void wake_up_q(struct wake_q_head *head)
> @@ -1745,6 +1757,7 @@ struct task_struct {
>   raw_spinlock_t pi_lock;
>  
>   struct wake_q_node wake_q;
> + struct wake_q_node wake_q_sleeper;
>  
>  #ifdef CONFIG_RT_MUTEXES
>   /* PI waiters blocked on a rt_mutex held by this task */
> diff --git a/kernel/fork.c b/kernel/fork.c
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
>   tsk->splice_pipe = NULL;
>   tsk->task_frag.page = NULL;
>   tsk->wake_q.next = NULL;
> + tsk->wake_q_sleeper.next = NULL;
>  
>   account_kernel_stack(tsk, 1);
>  
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head 
> *wake_q,
>*/
>   preempt_disable();
>   if (waiter->savestate)
> - wake_q_add(wake_sleeper_q, waiter->task);
> + wake_q_add_sleeper(wake_sleeper_q, waiter->task);
>   else
>   wake_q_add(wake_q, waiter->task);
>   raw_spin_unlock(>pi_lock);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p)
>  #endif
>  #endif
>  
> -void wake_q_add(struct wake_q_head *head, struct task_struct *task)
> +void __wake_q_add(struct wake_q_head *head, struct task_struct *task,
> +   bool sleeper)
>  {
> - struct wake_q_node *node = >wake_q;
> + struct wake_q_node *node;
> +
> + if (sleeper)
> + node = >wake_q_sleeper;
> + else
> + node = >wake_q;
>  
>   /*
>* Atomically grab the task, if ->wake_q is !nil already it means
> @@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper)
>   while (node != WAKE_Q_TAIL) {
>   struct task_struct *task;
>  
> - task = container_of(node, struct task_struct, wake_q);
> + if (sleeper)
> + task = container_of(node, struct task_struct, 
> wake_q_sleeper);
> + else
> + task = container_of(node, struct task_struct, wake_q);
>   BUG_ON(!task);
>   /* task can safely be re-inserted now */
>   node = node->next;
> - task->wake_q.next = NULL;
> + if (sleeper)
> + task->wake_q_sleeper.next = NULL;
> + else
> + task->wake_q.next = NULL;
>  
>   /*
>* wake_up_process() implies a wmb() to pair with the queueing


Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs

2017-06-20 Thread Mike Galbraith
On Tue, 2017-06-20 at 13:42 -0400, Rik van Riel wrote:
> On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> > Although idle load balancing obviously only concern idle CPUs, it can
> > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get
> > rid
> > of an idle load balancing duty once a tick fires while it runs a task
> > and this can take a while in a nohz_full CPU.
> > 
> > We could fix that and escape the idle load balancing duty from the
> > very
> > idle exit path but that would bring unecessary overhead. Lets just
> > not
> > bother and leave that job to housekeeping CPUs (those outside
> > nohz_full
> > range). The nohz_full CPUs simply don't want any disturbance.
> > 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Rik van Riel 
> > Cc: Peter Zijlstra 
> > ---
> >  kernel/sched/fair.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d711093..cfca960 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
> >     if (!cpu_active(cpu))
> >     return;
> >  
> > +   /* Spare idle load balancing on CPUs that don't want to be
> > disturbed */
> > +   if (!is_housekeeping_cpu(cpu))
> > +   return;
> > +
> >     if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> >     return;
> 
> I am not entirely convinced on this one.
> 
> Doesn't the if (on_null_domain(cpu_rq(cpu)) test
> a few lines down take care of this already?
> 
> Do we want nohz_full to always automatically
> imply that no idle balancing will happen, like
> on isolated CPUs?

IMO, nohz_full capable CPUs that are not isolated should automatically
become housekeepers, and nohz_full _active_ upon becoming isolated.
 When a used as a housekeeper, you still pay a price for having the
nohz_full capability available, but it doesn't have to be as high. 

In my kernels, I use cpusets to turn nohz on/off set wise, so CPUs can
be ticking, dyntick, nohz_full or housekeeper, RT load balancing and
cpupri on/off as well if you want to assume full responsibility.  It's
a tad (from box of xxl tads) ugly, but more flexible.

-Mike


Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs

2017-06-20 Thread Mike Galbraith
On Tue, 2017-06-20 at 13:42 -0400, Rik van Riel wrote:
> On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> > Although idle load balancing obviously only concern idle CPUs, it can
> > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get
> > rid
> > of an idle load balancing duty once a tick fires while it runs a task
> > and this can take a while in a nohz_full CPU.
> > 
> > We could fix that and escape the idle load balancing duty from the
> > very
> > idle exit path but that would bring unecessary overhead. Lets just
> > not
> > bother and leave that job to housekeeping CPUs (those outside
> > nohz_full
> > range). The nohz_full CPUs simply don't want any disturbance.
> > 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Rik van Riel 
> > Cc: Peter Zijlstra 
> > ---
> >  kernel/sched/fair.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d711093..cfca960 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
> >     if (!cpu_active(cpu))
> >     return;
> >  
> > +   /* Spare idle load balancing on CPUs that don't want to be
> > disturbed */
> > +   if (!is_housekeeping_cpu(cpu))
> > +   return;
> > +
> >     if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> >     return;
> 
> I am not entirely convinced on this one.
> 
> Doesn't the if (on_null_domain(cpu_rq(cpu)) test
> a few lines down take care of this already?
> 
> Do we want nohz_full to always automatically
> imply that no idle balancing will happen, like
> on isolated CPUs?

IMO, nohz_full capable CPUs that are not isolated should automatically
become housekeepers, and nohz_full _active_ upon becoming isolated.
 When a used as a housekeeper, you still pay a price for having the
nohz_full capability available, but it doesn't have to be as high. 

In my kernels, I use cpusets to turn nohz on/off set wise, so CPUs can
be ticking, dyntick, nohz_full or housekeeper, RT load balancing and
cpupri on/off as well if you want to assume full responsibility.  It's
a tad (from box of xxl tads) ugly, but more flexible.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-20 Thread Mike Galbraith
On Mon, 2017-06-19 at 18:29 +0200, Mike Galbraith wrote:
> On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote:
> > On Mon, 19 Jun 2017 16:13:41 +0200
> > Sebastian Andrzej Siewior <bige...@linutronix.de> wrote:
> > 
> > 
> > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > > > waiter->savestate is true. And that should always be false for futex.  
> > > 
> > > you still have sleeping locks like the hb-lock (which might matter in
> > > case the task blocks on the lock and does not continue for some reason).
> > 
> > But then I'd expect this to be an issue with any spinlock in the kernel.
> 
> Thing to do is set a trap for the bugger.

See ! and ?

vogelweide:~/:[1]# grep MIKE trace 
MIKE 913.728104: start
MIKE  futex_wait-7496  [031] ...   913.729160: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7497
MIKE  futex_wait-7496  [031] ...   913.729253: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7499
MIKE  futex_wait-7496  [031] ...   913.729307: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7500
MIKE  futex_wait-7496  [031] ...   913.729348: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7501
MIKE  futex_wait-7496  [031] d...2..   913.729430: sched_switch: 
prev_comm=futex_wait prev_pid=7496 prev_prio=120 prev_state=S ==> 
next_comm=swapper/31 next_pid=0 next_prio=120
MIKE--futex_wait-7500  [005] d...2..   920.040320: sched_switch: 
prev_comm=futex_wait prev_pid=7500 prev_prio=120 prev_state=S ==> 
next_comm=swapper/5 next_pid=0 next_prio=120
MIKE  futex_wait-7497  [058] d...211   920.060009: sched_waking: 
comm=futex_wait pid=7501 prio=120 target_cpu=044
MIKE  futex_wait-7497  [058] d...311   920.060012: sched_wake_idle_without_ipi: 
cpu=44
MIKE  futex_wait-7497  [058] d...311   920.060012: sched_wakeup: 
comm=futex_wait pid=7501 prio=120 target_cpu=044
MIKE--futex_wait-7497  [058] d...2..   920.060035: sched_switch: 
prev_comm=futex_wait prev_pid=7497 prev_prio=120 prev_state=S ==> 
next_comm=swapper/58 next_pid=0 next_prio=120
MIKE  futex_wait-7499  [043] dN..411   920.060035: sched_wakeup: 
comm=ktimersoftd/43 pid=438 prio=0 target_cpu=043
MIKE! futex_wait-7499  [043] .N..111   920.060037: wake_up_lock_sleeper: 
futex_wait:7497 state:1 saved_state:0
MIKE! futex_wait-7499  [043] dN..211   920.060037: try_to_wake_up: 
futex_wait:7497 WF_LOCK_SLEEPER !(p->state:1 & state:2) bailing
MIKE 920.060037: futex_wait-7497 is never awakened again until ^C cleanup, 
7499/7501 still standing
MIKE  futex_wait-7499  [043] 111   920.060066: wake_up_lock_sleeper: 
futex_wait:7501 state:2 saved_state:0
MIKE  futex_wait-7499  [043] d...211   920.060066: try_to_wake_up: 
futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [043] 111   920.060606: wake_up_lock_sleeper: 
futex_wait:7501 state:2 saved_state:0
MIKE 920.355048: huh? wake_up_lock_sleeper sees state=2 but try_to_wake_up then 
sees state=0 and bails ?!?
MIKE  futex_wait-7501  [044] 111   920.355048: wake_up_lock_sleeper: 
futex_wait:7499 state:2 saved_state:0
MIKE  futex_wait-7501  [044] d...211   920.355049: try_to_wake_up: 
futex_wait:7499 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [048] d..h311   920.355060: sched_stat_runtime: 
comm=futex_wait pid=7499 runtime=168537 [ns] vruntime=4850005377 [ns]
MIKE  futex_wait-7499  [048] d...311   920.355061: sched_waking: 
comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048
MIKE  futex_wait-7499  [048] d...411   920.355062: sched_stat_runtime: 
comm=futex_wait pid=7499 runtime=2972 [ns] vruntime=4850008349 [ns]
MIKE  futex_wait-7499  [048] d.L.411   920.355063: sched_wakeup: 
comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048
MIKE  futex_wait-7499  [048] d.L.311   920.355064: sched_waking: 
comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dNL.411   920.355065: sched_wakeup: 
comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] .NL.111   920.355066: wake_up_lock_sleeper: 
futex_wait:7501 state:0 saved_state:0
MIKE  futex_wait-7499  [048] dNL.211   920.355067: try_to_wake_up: 
futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [048] dNL.211   920.355067: sched_stat_runtime: 
comm=futex_wait pid=7499 runtime=2011 [ns] vruntime=4850010360 [ns]
MIKE  futex_wait-7499  [048] d...211   920.355068: sched_switch: 
prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R+ ==> 
next_comm=ktimersoftd/48 next_pid=488 next_prio=0
MIKE   <...>-488   [048] d...2..   920.355070: sched_switch: 
prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> 
next_comm=ksoftirqd/48 next_pid=489 next_prio=120
MIKE   <...>-489   [048] d

Re: [ANNOUNCE] v4.11.5-rt1

2017-06-20 Thread Mike Galbraith
On Mon, 2017-06-19 at 18:29 +0200, Mike Galbraith wrote:
> On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote:
> > On Mon, 19 Jun 2017 16:13:41 +0200
> > Sebastian Andrzej Siewior  wrote:
> > 
> > 
> > > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > > > waiter->savestate is true. And that should always be false for futex.  
> > > 
> > > you still have sleeping locks like the hb-lock (which might matter in
> > > case the task blocks on the lock and does not continue for some reason).
> > 
> > But then I'd expect this to be an issue with any spinlock in the kernel.
> 
> Thing to do is set a trap for the bugger.

See ! and ?

vogelweide:~/:[1]# grep MIKE trace 
MIKE 913.728104: start
MIKE  futex_wait-7496  [031] ...   913.729160: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7497
MIKE  futex_wait-7496  [031] ...   913.729253: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7499
MIKE  futex_wait-7496  [031] ...   913.729307: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7500
MIKE  futex_wait-7496  [031] ...   913.729348: sched_process_fork: 
comm=futex_wait pid=7496 child_comm=futex_wait child_pid=7501
MIKE  futex_wait-7496  [031] d...2..   913.729430: sched_switch: 
prev_comm=futex_wait prev_pid=7496 prev_prio=120 prev_state=S ==> 
next_comm=swapper/31 next_pid=0 next_prio=120
MIKE--futex_wait-7500  [005] d...2..   920.040320: sched_switch: 
prev_comm=futex_wait prev_pid=7500 prev_prio=120 prev_state=S ==> 
next_comm=swapper/5 next_pid=0 next_prio=120
MIKE  futex_wait-7497  [058] d...211   920.060009: sched_waking: 
comm=futex_wait pid=7501 prio=120 target_cpu=044
MIKE  futex_wait-7497  [058] d...311   920.060012: sched_wake_idle_without_ipi: 
cpu=44
MIKE  futex_wait-7497  [058] d...311   920.060012: sched_wakeup: 
comm=futex_wait pid=7501 prio=120 target_cpu=044
MIKE--futex_wait-7497  [058] d...2..   920.060035: sched_switch: 
prev_comm=futex_wait prev_pid=7497 prev_prio=120 prev_state=S ==> 
next_comm=swapper/58 next_pid=0 next_prio=120
MIKE  futex_wait-7499  [043] dN..411   920.060035: sched_wakeup: 
comm=ktimersoftd/43 pid=438 prio=0 target_cpu=043
MIKE! futex_wait-7499  [043] .N..111   920.060037: wake_up_lock_sleeper: 
futex_wait:7497 state:1 saved_state:0
MIKE! futex_wait-7499  [043] dN..211   920.060037: try_to_wake_up: 
futex_wait:7497 WF_LOCK_SLEEPER !(p->state:1 & state:2) bailing
MIKE 920.060037: futex_wait-7497 is never awakened again until ^C cleanup, 
7499/7501 still standing
MIKE  futex_wait-7499  [043] 111   920.060066: wake_up_lock_sleeper: 
futex_wait:7501 state:2 saved_state:0
MIKE  futex_wait-7499  [043] d...211   920.060066: try_to_wake_up: 
futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [043] 111   920.060606: wake_up_lock_sleeper: 
futex_wait:7501 state:2 saved_state:0
MIKE 920.355048: huh? wake_up_lock_sleeper sees state=2 but try_to_wake_up then 
sees state=0 and bails ?!?
MIKE  futex_wait-7501  [044] 111   920.355048: wake_up_lock_sleeper: 
futex_wait:7499 state:2 saved_state:0
MIKE  futex_wait-7501  [044] d...211   920.355049: try_to_wake_up: 
futex_wait:7499 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [048] d..h311   920.355060: sched_stat_runtime: 
comm=futex_wait pid=7499 runtime=168537 [ns] vruntime=4850005377 [ns]
MIKE  futex_wait-7499  [048] d...311   920.355061: sched_waking: 
comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048
MIKE  futex_wait-7499  [048] d...411   920.355062: sched_stat_runtime: 
comm=futex_wait pid=7499 runtime=2972 [ns] vruntime=4850008349 [ns]
MIKE  futex_wait-7499  [048] d.L.411   920.355063: sched_wakeup: 
comm=ksoftirqd/48 pid=489 prio=120 target_cpu=048
MIKE  futex_wait-7499  [048] d.L.311   920.355064: sched_waking: 
comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] dNL.411   920.355065: sched_wakeup: 
comm=ktimersoftd/48 pid=488 prio=0 target_cpu=048
MIKE  futex_wait-7499  [048] .NL.111   920.355066: wake_up_lock_sleeper: 
futex_wait:7501 state:0 saved_state:0
MIKE  futex_wait-7499  [048] dNL.211   920.355067: try_to_wake_up: 
futex_wait:7501 WF_LOCK_SLEEPER !(p->state:0 & state:2) bailing
MIKE  futex_wait-7499  [048] dNL.211   920.355067: sched_stat_runtime: 
comm=futex_wait pid=7499 runtime=2011 [ns] vruntime=4850010360 [ns]
MIKE  futex_wait-7499  [048] d...211   920.355068: sched_switch: 
prev_comm=futex_wait prev_pid=7499 prev_prio=120 prev_state=R+ ==> 
next_comm=ktimersoftd/48 next_pid=488 next_prio=0
MIKE   <...>-488   [048] d...2..   920.355070: sched_switch: 
prev_comm=ktimersoftd/48 prev_pid=488 prev_prio=0 prev_state=S ==> 
next_comm=ksoftirqd/48 next_pid=489 next_prio=120
MIKE   <...>-489   [048] d...2..   920.355071: sched_stat_

Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 18:27 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 18:14:52 [+0200], Mike Galbraith wrote:
> > 
> > BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
> > have something resembling a life thing, and it did not stall in 50
> > iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
> > for whatever you think it's worth.
> 
> that sounds good because I tried your config on three boxes with the
> same result.
> So the issue remains on the DLxxx box which is 8socket NUMA box with
> only one memory node, right?

Yes.  Now _I_ think (having spent days chasing) that it is in fact
generic, but yes, all else aside, my highly utilized and very reliable
up until now (and crippled) DL980 G7 very clearly states that we have a
bug lurking.

-Mike 


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 18:27 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 18:14:52 [+0200], Mike Galbraith wrote:
> > 
> > BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
> > have something resembling a life thing, and it did not stall in 50
> > iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
> > for whatever you think it's worth.
> 
> that sounds good because I tried your config on three boxes with the
> same result.
> So the issue remains on the DLxxx box which is 8socket NUMA box with
> only one memory node, right?

Yes.  Now _I_ think (having spent days chasing) that it is in fact
generic, but yes, all else aside, my highly utilized and very reliable
up until now (and crippled) DL980 G7 very clearly states that we have a
bug lurking.

-Mike 


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote:
> On Mon, 19 Jun 2017 16:13:41 +0200
> Sebastian Andrzej Siewior  wrote:
> 
> 
> > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > > waiter->savestate is true. And that should always be false for futex.  
> > 
> > you still have sleeping locks like the hb-lock (which might matter in
> > case the task blocks on the lock and does not continue for some reason).
> 
> But then I'd expect this to be an issue with any spinlock in the kernel.

Thing to do is set a trap for the bugger.  I'll give that a go after I
get some beans/biscuits work done.  For the nonce, what I've reported
is all I know.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 10:41 -0400, Steven Rostedt wrote:
> On Mon, 19 Jun 2017 16:13:41 +0200
> Sebastian Andrzej Siewior  wrote:
> 
> 
> > > Hmm, it shouldn't affect futexes, as it's only called by rtmutex when
> > > waiter->savestate is true. And that should always be false for futex.  
> > 
> > you still have sleeping locks like the hb-lock (which might matter in
> > case the task blocks on the lock and does not continue for some reason).
> 
> But then I'd expect this to be an issue with any spinlock in the kernel.

Thing to do is set a trap for the bugger.  I'll give that a go after I
get some beans/biscuits work done.  For the nonce, what I've reported
is all I know.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 17:03 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 16:36:22 [+0200], Mike Galbraith wrote:
> > On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> > > 
> > > I am suppressed that your desktop shows any symptoms on rt21. I tried my
> > > smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> > > of them was fine with the run.sh and manual futex_wait invocation.
> > > Could you please send the config of your desktop box?
> > 
> > (sent offline)
> 
> btw: I tried booting the two and four sockets box with less memory so
> that the box is using just one memory node but nothing changed. Let me
> try with the config of yours.

BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
have something resembling a life thing, and it did not stall in 50
iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
for whatever you think it's worth.

-Mike  


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 17:03 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 16:36:22 [+0200], Mike Galbraith wrote:
> > On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> > > 
> > > I am suppressed that your desktop shows any symptoms on rt21. I tried my
> > > smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> > > of them was fine with the run.sh and manual futex_wait invocation.
> > > Could you please send the config of your desktop box?
> > 
> > (sent offline)
> 
> btw: I tried booting the two and four sockets box with less memory so
> that the box is using just one memory node but nothing changed. Let me
> try with the config of yours.

BTW back, I reran virgin 4.9-rt21 on desktop box while off doing the
have something resembling a life thing, and it did not stall in 50
iterations of performance/run.sh (bloody fickle thing).  Hohum, take it
for whatever you think it's worth.

-Mike  


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> 
> I am suppressed that your desktop shows any symptoms on rt21. I tried my
> smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> of them was fine with the run.sh and manual futex_wait invocation.
> Could you please send the config of your desktop box?

(sent offline)


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 16:06 +0200, Sebastian Andrzej Siewior wrote:
> 
> I am suppressed that your desktop shows any symptoms on rt21. I tried my
> smaller AMD box (A10), an Intel two sockets and a four socket box. Each
> of them was fine with the run.sh and manual futex_wait invocation.
> Could you please send the config of your desktop box?

(sent offline)


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 13:50 +0200, Sebastian Andrzej Siewior wrote:
> 
> rt20…rt21 is just
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches

Yup.  I got there via git checkout v4.9.x-rtx, build/test.

> Let me verify that here and fire maybe the four socket box.

One thing perhaps interesting about the DL980 is that it is 8 socket,
but with 7 memoryless nodes (poor thing).  I wouldn't be surprised if
you had to try a lot harder than I.
DL980 now seems happy with master-rt21 fwtw.  It's not virgin though,
not even considering continuous merge 4.9..today.

Hopefully desktop will cease with the comatose widget business too, I
really don't want to go there.  These things give tglx headaches, might
well blow the point clean off of my head. 

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 13:50 +0200, Sebastian Andrzej Siewior wrote:
> 
> rt20…rt21 is just
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/sched-Prevent-task-state-corruption-by-spurious-lock.patch?h=linux-4.9.y-rt-patches

Yup.  I got there via git checkout v4.9.x-rtx, build/test.

> Let me verify that here and fire maybe the four socket box.

One thing perhaps interesting about the DL980 is that it is 8 socket,
but with 7 memoryless nodes (poor thing).  I wouldn't be surprised if
you had to try a lot harder than I.
DL980 now seems happy with master-rt21 fwtw.  It's not virgin though,
not even considering continuous merge 4.9..today.

Hopefully desktop will cease with the comatose widget business too, I
really don't want to go there.  These things give tglx headaches, might
well blow the point clean off of my head. 

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 12:44 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 12:14:51 [+0200], Mike Galbraith wrote:
> > Ok, doesn't matter for RT testing.  What does matter, is that...
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 30b24f774198..10e832da70b6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
> >   */
> >  int wake_up_lock_sleeper(struct task_struct *p)
> >  {
> > -   return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
> > +   return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
> >  }
> > 
> > ...appears to be inducing lost futex wakeups.
> 
> has this something to do with "rtmutex: Fix lock stealing logic" ?

Nope.  The above is fallout of me being inspired me to stare, that
inspiration having come initially from seeing lost wakeup symptoms in
my desktop, telling me something has gone sour in rt-land, so a hunting
I did go.  I expected to find I had made a booboo in my trees, but
maybe not, as I found a suspiciously similar symptom to what I was
looking for in virgin source.

> > Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
> > running otherwise absolutely pristine 4.9-rt21, after having double
> > verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
> > make sure that the little bugger really really REALLY ain't fscking
> > with me for the sheer fun of it, futexes being made of pure evil :)
> 
> So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in
> general or just with "futex_wait -n 4" ?

-rt20 is verified to work fine, -rt21 starts hanging with futextest.
 The futex_wait -n 4 testcase was distilled out of seeing the full
futextest/run.sh hanging.  The only symptom I've _seen_ on the DL980 is
futextest hanging.  On the desktop, I've seen more, and may still, I'll
know when I see or don't see desktop gizmos occasionally go comatose.

> > My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
> > thing is that it only reproduces on the DL980 if I let it use multiple
> > sockets, pin it to one, and all is peachy, (rather seems to be given)
> > whereas on desktop box, the hang is far more intermittent, but there.
> 
> do I parse it right, as v4.9-rt21 (without the change above) works with
> the testcase mentioned if you pin it to one socket but does not work if
> you let it use multiple sockets.
> And your desktop box hangs no matter what?

No no, desktop box will reproduce, but not nearly as reliably as the 8
socket box does, but yes, it seems to work fine on the DL980 when
pinned to one socket.  I was testing 4.9-rt because hunt was in
progress when 4.11-rt was born.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 12:44 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-19 12:14:51 [+0200], Mike Galbraith wrote:
> > Ok, doesn't matter for RT testing.  What does matter, is that...
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 30b24f774198..10e832da70b6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
> >   */
> >  int wake_up_lock_sleeper(struct task_struct *p)
> >  {
> > -   return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
> > +   return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
> >  }
> > 
> > ...appears to be inducing lost futex wakeups.
> 
> has this something to do with "rtmutex: Fix lock stealing logic" ?

Nope.  The above is fallout of me being inspired me to stare, that
inspiration having come initially from seeing lost wakeup symptoms in
my desktop, telling me something has gone sour in rt-land, so a hunting
I did go.  I expected to find I had made a booboo in my trees, but
maybe not, as I found a suspiciously similar symptom to what I was
looking for in virgin source.

> > Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
> > running otherwise absolutely pristine 4.9-rt21, after having double
> > verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
> > make sure that the little bugger really really REALLY ain't fscking
> > with me for the sheer fun of it, futexes being made of pure evil :)
> 
> So v4.9-rt20 works fine but -rt21 starts to lose wakeups on DL980 in
> general or just with "futex_wait -n 4" ?

-rt20 is verified to work fine, -rt21 starts hanging with futextest.
 The futex_wait -n 4 testcase was distilled out of seeing the full
futextest/run.sh hanging.  The only symptom I've _seen_ on the DL980 is
futextest hanging.  On the desktop, I've seen more, and may still, I'll
know when I see or don't see desktop gizmos occasionally go comatose.

> > My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
> > thing is that it only reproduces on the DL980 if I let it use multiple
> > sockets, pin it to one, and all is peachy, (rather seems to be given)
> > whereas on desktop box, the hang is far more intermittent, but there.
> 
> do I parse it right, as v4.9-rt21 (without the change above) works with
> the testcase mentioned if you pin it to one socket but does not work if
> you let it use multiple sockets.
> And your desktop box hangs no matter what?

No no, desktop box will reproduce, but not nearly as reliably as the 8
socket box does, but yes, it seems to work fine on the DL980 when
pinned to one socket.  I was testing 4.9-rt because hunt was in
progress when 4.11-rt was born.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 10:52 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote:
> 
> > During that rebase, migrate_disable() was changed to no longer map to
> > preempt_disable() for nonrt, but some patches still assume it does.  It
> > now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
> > grumbling in nonrt builds with PREEMPT_COUNT enabled.
> 
> argh, right. It was planned to get it merged upstream but due to
> $reasons we never got that far. For that reason I would simply revert
> that change and let migrate_disable() map to preempt_disable() as it did
> earlier.

Ok, doesn't matter for RT testing.  What does matter, is that...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 30b24f774198..10e832da70b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
  */
 int wake_up_lock_sleeper(struct task_struct *p)
 {
-   return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
+   return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
 }

...appears to be inducing lost futex wakeups.

Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
running otherwise absolutely pristine 4.9-rt21, after having double
verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
make sure that the little bugger really really REALLY ain't fscking
with me for the sheer fun of it, futexes being made of pure evil :)

My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
thing is that it only reproduces on the DL980 if I let it use multiple
sockets, pin it to one, and all is peachy, (rather seems to be given)
whereas on desktop box, the hang is far more intermittent, but there.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-19 Thread Mike Galbraith
On Mon, 2017-06-19 at 10:52 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-17 10:14:37 [+0200], Mike Galbraith wrote:
> 
> > During that rebase, migrate_disable() was changed to no longer map to
> > preempt_disable() for nonrt, but some patches still assume it does.  It
> > now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
> > grumbling in nonrt builds with PREEMPT_COUNT enabled.
> 
> argh, right. It was planned to get it merged upstream but due to
> $reasons we never got that far. For that reason I would simply revert
> that change and let migrate_disable() map to preempt_disable() as it did
> earlier.

Ok, doesn't matter for RT testing.  What does matter, is that...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 30b24f774198..10e832da70b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2284,7 +2284,7 @@ EXPORT_SYMBOL(wake_up_process);
  */
 int wake_up_lock_sleeper(struct task_struct *p)
 {
-   return try_to_wake_up(p, TASK_ALL, WF_LOCK_SLEEPER);
+   return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER);
 }

...appears to be inducing lost futex wakeups.

Scratch that "appears", changing it to TASK_NORMAL just fixed my DL980
running otherwise absolutely pristine 4.9-rt21, after having double
verified that rt20 works fine.  Now to go back to 4.11/master/tip-rt,
make sure that the little bugger really really REALLY ain't fscking
with me for the sheer fun of it, futexes being made of pure evil :)

My testcase is to run futex_wait -n 4 in a modest sized loop.  Odd
thing is that it only reproduces on the DL980 if I let it use multiple
sockets, pin it to one, and all is peachy, (rather seems to be given)
whereas on desktop box, the hang is far more intermittent, but there.

-Mike


[patch-rt] rtmutex: Fix lock stealing logic

2017-06-18 Thread Mike Galbraith

1. When trying to acquire an rtmutex, we first try to grab it without
queueing the waiter, and explicitly check for that initial attempt
in the !waiter path of __try_to_take_rt_mutex().  Checking whether
the lock taker is top waiter before allowing a steal attempt in that
path is a thinko: the lock taker has not yet blocked.

2. It seems wrong to change the definition of rt_mutex_waiter_less()
to mean less or perhaps equal when we have an rt_mutex_waiter_equal().

Remove the thinko, and implement rt_mutex_stealable(steal_mode) using
a restored rt_mutex_waiter_less(), and rt_mutex_waiter_equal().

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Fixes: f34d077c6f28 ("rt: Add the preempt-rt lock replacement APIs")
---
 kernel/locking/rtmutex.c |   69 +++
 1 file changed, 35 insertions(+), 34 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
-#define STEAL_NORMAL  0
-#define STEAL_LATERAL 1
-
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
-#define task_to_waiter(p)  \
-   &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = 
(p)->dl.deadline }
+#define task_to_waiter(p) &(struct rt_mutex_waiter) \
+   { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
-struct rt_mutex_waiter *right, int mode)
+struct rt_mutex_waiter *right)
 {
-   if (mode == STEAL_NORMAL) {
-   if (left->prio < right->prio)
-   return 1;
-   } else {
-   if (left->prio <= right->prio)
-   return 1;
-   }
+   if (left->prio < right->prio)
+   return 1;
+
/*
 * If both waiters have dl_prio(), we check the deadlines of the
 * associated tasks.
@@ -287,6 +280,25 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
return 1;
 }
 
+#define STEAL_NORMAL  0
+#define STEAL_LATERAL 1
+
+static inline int rt_mutex_stealable(struct rt_mutex_waiter *thief,
+struct rt_mutex_waiter *victim, int mode)
+{
+   if (rt_mutex_waiter_less(thief, victim))
+   return 1;
+
+   /*
+* Note that RT tasks are excluded from lateral-steals
+* to prevent the introduction of an unbounded latency.
+*/
+   if (mode == STEAL_NORMAL || rt_task(thief->task))
+   return 0;
+
+   return rt_mutex_waiter_equal(thief, victim);
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
@@ -298,7 +310,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -337,7 +349,7 @@ rt_mutex_enqueue_pi(struct task_struct *
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -847,6 +859,7 @@ static int rt_mutex_adjust_prio_chain(st
  * @task:   The task which wants to acquire the lock
  * @waiter: The waiter that is queued to the lock's wait tree if the
  * callsite called task_blocked_on_lock(), otherwise NULL
+ * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
  */
 static int __try_to_take_rt_mutex(struct rt_mutex *lock,
  struct task_struct *task,
@@ -890,10 +903,9 @@ static int __try_to_take_rt_mutex(struct
 * @lock, give up.
 */
if (waiter != rt_mutex_top_waiter(lock)) {
-   /* XXX rt_mutex_waiter_less() ? */
+   /* XXX rt_mutex_stealable() ? */
return 0;
}
-
/*
 * We can acquire the lock. Remove the waiter from the
 * lock waiters tree.
@@ -910,25 +922,14 @@ static int __try_to_take_rt_mutex(struct
 * not need to be dequeued.
 */
if (rt_mutex_has_waiters(lock)) {
-   struct task_struct *pown = 
rt_mutex_top_waiter(lock)->task;
-
-   if (task != pown)
-   return 0;
-
-   /*
-* Note that RT tasks are ex

[patch-rt] rtmutex: Fix lock stealing logic

2017-06-18 Thread Mike Galbraith

1. When trying to acquire an rtmutex, we first try to grab it without
queueing the waiter, and explicitly check for that initial attempt
in the !waiter path of __try_to_take_rt_mutex().  Checking whether
the lock taker is top waiter before allowing a steal attempt in that
path is a thinko: the lock taker has not yet blocked.

2. It seems wrong to change the definition of rt_mutex_waiter_less()
to mean less or perhaps equal when we have an rt_mutex_waiter_equal().

Remove the thinko, and implement rt_mutex_stealable(steal_mode) using
a restored rt_mutex_waiter_less(), and rt_mutex_waiter_equal().

Signed-off-by: Mike Galbraith 
Fixes: f34d077c6f28 ("rt: Add the preempt-rt lock replacement APIs")
---
 kernel/locking/rtmutex.c |   69 +++
 1 file changed, 35 insertions(+), 34 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -236,26 +236,19 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
-#define STEAL_NORMAL  0
-#define STEAL_LATERAL 1
-
 /*
  * Only use with rt_mutex_waiter_{less,equal}()
  */
-#define task_to_waiter(p)  \
-   &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = 
(p)->dl.deadline }
+#define task_to_waiter(p) &(struct rt_mutex_waiter) \
+   { .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) }
 
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
-struct rt_mutex_waiter *right, int mode)
+struct rt_mutex_waiter *right)
 {
-   if (mode == STEAL_NORMAL) {
-   if (left->prio < right->prio)
-   return 1;
-   } else {
-   if (left->prio <= right->prio)
-   return 1;
-   }
+   if (left->prio < right->prio)
+   return 1;
+
/*
 * If both waiters have dl_prio(), we check the deadlines of the
 * associated tasks.
@@ -287,6 +280,25 @@ rt_mutex_waiter_equal(struct rt_mutex_wa
return 1;
 }
 
+#define STEAL_NORMAL  0
+#define STEAL_LATERAL 1
+
+static inline int rt_mutex_stealable(struct rt_mutex_waiter *thief,
+struct rt_mutex_waiter *victim, int mode)
+{
+   if (rt_mutex_waiter_less(thief, victim))
+   return 1;
+
+   /*
+* Note that RT tasks are excluded from lateral-steals
+* to prevent the introduction of an unbounded latency.
+*/
+   if (mode == STEAL_NORMAL || rt_task(thief->task))
+   return 0;
+
+   return rt_mutex_waiter_equal(thief, victim);
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
@@ -298,7 +310,7 @@ rt_mutex_enqueue(struct rt_mutex *lock,
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -337,7 +349,7 @@ rt_mutex_enqueue_pi(struct task_struct *
while (*link) {
parent = *link;
entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry, STEAL_NORMAL)) {
+   if (rt_mutex_waiter_less(waiter, entry)) {
link = >rb_left;
} else {
link = >rb_right;
@@ -847,6 +859,7 @@ static int rt_mutex_adjust_prio_chain(st
  * @task:   The task which wants to acquire the lock
  * @waiter: The waiter that is queued to the lock's wait tree if the
  * callsite called task_blocked_on_lock(), otherwise NULL
+ * @mode:   Lock steal mode (STEAL_NORMAL, STEAL_LATERAL)
  */
 static int __try_to_take_rt_mutex(struct rt_mutex *lock,
  struct task_struct *task,
@@ -890,10 +903,9 @@ static int __try_to_take_rt_mutex(struct
 * @lock, give up.
 */
if (waiter != rt_mutex_top_waiter(lock)) {
-   /* XXX rt_mutex_waiter_less() ? */
+   /* XXX rt_mutex_stealable() ? */
return 0;
}
-
/*
 * We can acquire the lock. Remove the waiter from the
 * lock waiters tree.
@@ -910,25 +922,14 @@ static int __try_to_take_rt_mutex(struct
 * not need to be dequeued.
 */
if (rt_mutex_has_waiters(lock)) {
-   struct task_struct *pown = 
rt_mutex_top_waiter(lock)->task;
-
-   if (task != pown)
-   return 0;
-
-   /*
-* Note that RT tasks are excluded from lateral-steals
-  

Re: [ANNOUNCE] v4.11.5-rt1

2017-06-18 Thread Mike Galbraith
On Sat, 2017-06-17 at 10:14 +0200, Mike Galbraith wrote:
> 
>... the RT workaround in futex.c induces
> grumbling in nonrt builds with PREEMPT_COUNT enabled.

A trivial way to fix it up is to...

futex: Fix migrate_disable/enable workaround for !PREEMPT_RT_FULL

The imbalance fixed by aed0f50e58eb only exists for PREEMPT_RT_FULL,
and creates for other PREEMPT_COUNT configs.  Create/use _rt variants
of migrate_disable/enable() which are compiled away when not needed.

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Fixes: aed0f50e58eb ("futex: workaround migrate_disable/enable in different 
context")
---
 include/linux/preempt.h |   11 +++
 kernel/futex.c  |8 
 2 files changed, 15 insertions(+), 4 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -227,12 +227,21 @@ do { \
 
 extern void migrate_disable(void);
 extern void migrate_enable(void);
+#ifdef CONFIG_PREEMPT_RT_FULL
+#define migrate_disable_rt()   migrate_disable()
+#define migrate_enable_rt()migrate_enable()
+#else
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
+#endif
 
 int __migrate_disabled(struct task_struct *p);
 
 #else
 #define migrate_disable()  barrier()
 #define migrate_enable()   barrier()
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
 static inline int __migrate_disabled(struct task_struct *p)
 {
return 0;
@@ -323,6 +332,8 @@ do { \
 
 #define migrate_disable()  barrier()
 #define migrate_enable()   barrier()
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
 
 static inline int __migrate_disabled(struct task_struct *p)
 {
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2690,12 +2690,12 @@ static int futex_lock_pi(u32 __user *uad
 * one migrate_disable() pending in the slow-path which is reversed
 * after the raw_spin_unlock_irq() where we leave the atomic context.
 */
-   migrate_disable();
+   migrate_disable_rt();
 
spin_unlock(q.lock_ptr);
ret = __rt_mutex_start_proxy_lock(_state->pi_mutex, _waiter, 
current);
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
-   migrate_enable();
+   migrate_enable_rt();
 
if (ret) {
if (ret == 1)
@@ -2846,13 +2846,13 @@ static int futex_unlock_pi(u32 __user *u
 * won't undo the migrate_disable() which was issued when
 * locking hb->lock.
 */
-   migrate_disable();
+   migrate_disable_rt();
spin_unlock(>lock);
 
/* Drops pi_state->pi_mutex.wait_lock */
ret = wake_futex_pi(uaddr, uval, pi_state);
 
-   migrate_enable();
+   migrate_enable_rt();
 
put_pi_state(pi_state);
 


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-18 Thread Mike Galbraith
On Sat, 2017-06-17 at 10:14 +0200, Mike Galbraith wrote:
> 
>... the RT workaround in futex.c induces
> grumbling in nonrt builds with PREEMPT_COUNT enabled.

A trivial way to fix it up is to...

futex: Fix migrate_disable/enable workaround for !PREEMPT_RT_FULL

The imbalance fixed by aed0f50e58eb only exists for PREEMPT_RT_FULL,
and creates for other PREEMPT_COUNT configs.  Create/use _rt variants
of migrate_disable/enable() which are compiled away when not needed.

Signed-off-by: Mike Galbraith 
Fixes: aed0f50e58eb ("futex: workaround migrate_disable/enable in different 
context")
---
 include/linux/preempt.h |   11 +++
 kernel/futex.c  |8 
 2 files changed, 15 insertions(+), 4 deletions(-)

--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -227,12 +227,21 @@ do { \
 
 extern void migrate_disable(void);
 extern void migrate_enable(void);
+#ifdef CONFIG_PREEMPT_RT_FULL
+#define migrate_disable_rt()   migrate_disable()
+#define migrate_enable_rt()migrate_enable()
+#else
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
+#endif
 
 int __migrate_disabled(struct task_struct *p);
 
 #else
 #define migrate_disable()  barrier()
 #define migrate_enable()   barrier()
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
 static inline int __migrate_disabled(struct task_struct *p)
 {
return 0;
@@ -323,6 +332,8 @@ do { \
 
 #define migrate_disable()  barrier()
 #define migrate_enable()   barrier()
+static inline void migrate_disable_rt(void) { }
+static inline void migrate_enable_rt(void) { }
 
 static inline int __migrate_disabled(struct task_struct *p)
 {
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2690,12 +2690,12 @@ static int futex_lock_pi(u32 __user *uad
 * one migrate_disable() pending in the slow-path which is reversed
 * after the raw_spin_unlock_irq() where we leave the atomic context.
 */
-   migrate_disable();
+   migrate_disable_rt();
 
spin_unlock(q.lock_ptr);
ret = __rt_mutex_start_proxy_lock(_state->pi_mutex, _waiter, 
current);
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
-   migrate_enable();
+   migrate_enable_rt();
 
if (ret) {
if (ret == 1)
@@ -2846,13 +2846,13 @@ static int futex_unlock_pi(u32 __user *u
 * won't undo the migrate_disable() which was issued when
 * locking hb->lock.
 */
-   migrate_disable();
+   migrate_disable_rt();
spin_unlock(>lock);
 
/* Drops pi_state->pi_mutex.wait_lock */
ret = wake_futex_pi(uaddr, uval, pi_state);
 
-   migrate_enable();
+   migrate_enable_rt();
 
put_pi_state(pi_state);
 


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-17 Thread Mike Galbraith
On Fri, 2017-06-16 at 12:56 +0200, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
> 
> I'm pleased to announce the v4.11.5-rt1 patch set. 
> The release has been delayed due to the hotplug rework that was started
> before the final v4.11 release. However the new code has not been
> stabilized yet and it was decided to bring back the old patches before
> delaying the v4.11-RT release any longer. We will try to complete the
> hotplug rework (for RT) in the v4.11 cycle.
> 
> Changes since v4.9.39-rt21:
> 
>   - rebase to v4.11.5

Hi Sebastian,

I noticed a couple things wrt migrate_disable() changes...

During that rebase, migrate_disable() was changed to no longer map to
preempt_disable() for nonrt, but some patches still assume it does.  It
now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
grumbling in nonrt builds with PREEMPT_COUNT enabled.

-Mike


Re: [ANNOUNCE] v4.11.5-rt1

2017-06-17 Thread Mike Galbraith
On Fri, 2017-06-16 at 12:56 +0200, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
> 
> I'm pleased to announce the v4.11.5-rt1 patch set. 
> The release has been delayed due to the hotplug rework that was started
> before the final v4.11 release. However the new code has not been
> stabilized yet and it was decided to bring back the old patches before
> delaying the v4.11-RT release any longer. We will try to complete the
> hotplug rework (for RT) in the v4.11 cycle.
> 
> Changes since v4.9.39-rt21:
> 
>   - rebase to v4.11.5

Hi Sebastian,

I noticed a couple things wrt migrate_disable() changes...

During that rebase, migrate_disable() was changed to no longer map to
preempt_disable() for nonrt, but some patches still assume it does.  It
now depends upon PREEMPT_COUNT, the RT workaround in futex.c induces
grumbling in nonrt builds with PREEMPT_COUNT enabled.

-Mike


Re: kernel BUG at kernel/locking/rtmutex.c:1027

2017-06-08 Thread Mike Galbraith
On Thu, 2017-06-08 at 07:01 +, Feng Feng24 Liu wrote:
>  
>   Our kernel version is:  kernel4.4.6-rt14
> 

Latest 4.4-rt is 4.4.70-rt83...


Re: kernel BUG at kernel/locking/rtmutex.c:1027

2017-06-08 Thread Mike Galbraith
On Thu, 2017-06-08 at 07:01 +, Feng Feng24 Liu wrote:
>  
>   Our kernel version is:  kernel4.4.6-rt14
> 

Latest 4.4-rt is 4.4.70-rt83...


Re: [PATCH] Fix loop device flush before configure v2

2017-06-07 Thread Mike Galbraith
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote:
> This condition check was exist at before commit b5dd2f6047ca ("block: loop:
> improve performance via blk-mq") When add MQ support to loop device, it be
> removed because the member of '->lo_thread' be removed. And then upstream
> add '->worker_task', I think they forget add it to here.
> 
> When I install SLES-12 product is base on 4.4 kernel I found installer will
> hang +60 second at scan disks. and I found LVM tools would take this action.
> finally I found this problem is more obvious on AMD platform. This problem
> will impact all scenarios that scan loop devcie.
> 
> When the loop device didn't configure backing file or Request Queue, we
> shouldn't to cost a lot of time to flush it.

The changelog sounds odd to me, perhaps reword/condense a bit?...

While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
done

Test output:  stock  patched
/dev/loop018.1217e-058.3842e-05
/dev/loop1 6.1114e-050.000147979
/dev/loop100.414701  0.000116564
/dev/loop110.74746.7942e-05
/dev/loop120.747986  8.9082e-05
/dev/loop130.746532  7.4799e-05
/dev/loop140.480041  9.3926e-05
/dev/loop151.26453   7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.

Reviewed-by: Hannes Reinecke 
Signed-off-by: James Wang 
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
---
>  drivers/block/loop.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 48f6fa6f810e..2e5b8538760c 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct 
> file *file)
>   */
>  static int loop_flush(struct loop_device *lo)
>  {
> + /* loop not yet configured, no running thread, nothing to flush */
> + if (lo->lo_state != Lo_bound)
> + return 0;
>   return loop_switch(lo, NULL);
>  }
>  


Re: [PATCH] Fix loop device flush before configure v2

2017-06-07 Thread Mike Galbraith
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote:
> This condition check was exist at before commit b5dd2f6047ca ("block: loop:
> improve performance via blk-mq") When add MQ support to loop device, it be
> removed because the member of '->lo_thread' be removed. And then upstream
> add '->worker_task', I think they forget add it to here.
> 
> When I install SLES-12 product is base on 4.4 kernel I found installer will
> hang +60 second at scan disks. and I found LVM tools would take this action.
> finally I found this problem is more obvious on AMD platform. This problem
> will impact all scenarios that scan loop devcie.
> 
> When the loop device didn't configure backing file or Request Queue, we
> shouldn't to cost a lot of time to flush it.

The changelog sounds odd to me, perhaps reword/condense a bit?...

While installing SLES-12 (based on v4.4), I found that the installer
will stall for 60+ seconds during LVM disk scan.  The root cause was
determined to be the removal of a bound device check in loop_flush()
by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq").

Restoring this check, examining ->lo_state as set by loop_set_fd()
eliminates the bad behavior.

Test method:
modprobe loop max_loop=64
dd if=/dev/zero of=disk bs=512 count=200K
for((i=0;i<4;i++))do losetup -f disk; done
mkfs.ext4 -F /dev/loop0
for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
for f in `ls /dev/loop[0-9]*|sort`; do \
echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
done

Test output:  stock  patched
/dev/loop018.1217e-058.3842e-05
/dev/loop1 6.1114e-050.000147979
/dev/loop100.414701  0.000116564
/dev/loop110.74746.7942e-05
/dev/loop120.747986  8.9082e-05
/dev/loop130.746532  7.4799e-05
/dev/loop140.480041  9.3926e-05
/dev/loop151.26453   7.2522e-05

Note that from loop10 onward, the device is not mounted, yet the
stock kernel consumes several orders of magnitude more wall time
than it does for a mounted device.

Reviewed-by: Hannes Reinecke 
Signed-off-by: James Wang 
Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq")
---
>  drivers/block/loop.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 48f6fa6f810e..2e5b8538760c 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -625,6 +625,9 @@ static int loop_switch(struct loop_device *lo, struct 
> file *file)
>   */
>  static int loop_flush(struct loop_device *lo)
>  {
> + /* loop not yet configured, no running thread, nothing to flush */
> + if (lo->lo_state != Lo_bound)
> + return 0;
>   return loop_switch(lo, NULL);
>  }
>  


Re: [lkp-robot] [tty] 52b03e644f: calltrace:flush_to_ldisc

2017-06-06 Thread Mike Galbraith
On Wed, 2017-06-07 at 10:07 +0800, kernel test robot wrote:
> FYI, we noticed the following commit:
> 
> commit: 52b03e644f702dbcb252f6aec92fc0f0d6e29f78 ("tty: fix port buffer 
> locking V2")
> url: 
> https://github.com/0day-ci/linux/commits/Mike-Galbraith/tty-fix-port-buffer-locking-V2/20170605-134432
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing

?  Patchlet has been properly vaporized, no longer exists.

-Mike


Re: [lkp-robot] [tty] 52b03e644f: calltrace:flush_to_ldisc

2017-06-06 Thread Mike Galbraith
On Wed, 2017-06-07 at 10:07 +0800, kernel test robot wrote:
> FYI, we noticed the following commit:
> 
> commit: 52b03e644f702dbcb252f6aec92fc0f0d6e29f78 ("tty: fix port buffer 
> locking V2")
> url: 
> https://github.com/0day-ci/linux/commits/Mike-Galbraith/tty-fix-port-buffer-locking-V2/20170605-134432
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing

?  Patchlet has been properly vaporized, no longer exists.

-Mike


Re: [bisected] Re: tty lockdep trace

2017-06-05 Thread Mike Galbraith
On Mon, 2017-06-05 at 16:09 +0100, Alan Cox wrote:
> On Sun, 04 Jun 2017 13:34:31 +0200
> Mike Galbraith <efa...@gmx.de> wrote:
> 
> > On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote:
> > > 
> > > I don't know how you did it, but this passes my testing (reproducers for
> > > both the original issue and the lockdep splat/hang).  
> > 
> > I suppose I can sign it off, see if that inspires anyone to come up
> > with something better.
> > 
> > drivers/tty: Fix 925bb1ce47f4 circular locking dependency
> 
> This is still completely broken and has the same underlying flaw as the
> original.

Thanks for looking.  It didn't look the least bit lovely, it being
equally busted (well, ever so slightly less, lockdep did sthu;) is not
a shocker.  Off to the bin ya go little patchlet.

-Mike


Re: [bisected] Re: tty lockdep trace

2017-06-05 Thread Mike Galbraith
On Mon, 2017-06-05 at 16:09 +0100, Alan Cox wrote:
> On Sun, 04 Jun 2017 13:34:31 +0200
> Mike Galbraith  wrote:
> 
> > On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote:
> > > 
> > > I don't know how you did it, but this passes my testing (reproducers for
> > > both the original issue and the lockdep splat/hang).  
> > 
> > I suppose I can sign it off, see if that inspires anyone to come up
> > with something better.
> > 
> > drivers/tty: Fix 925bb1ce47f4 circular locking dependency
> 
> This is still completely broken and has the same underlying flaw as the
> original.

Thanks for looking.  It didn't look the least bit lovely, it being
equally busted (well, ever so slightly less, lockdep did sthu;) is not
a shocker.  Off to the bin ya go little patchlet.

-Mike


Re: [patch] tty: fix port buffer locking V2

2017-06-05 Thread Mike Galbraith
On Mon, 2017-06-05 at 08:48 +0200, Greg Kroah-Hartman wrote:
> 
> Can you redo this against 4.12-rc4 which has the original tty locking
> patch reverted?

It's against 4.12-rc4.

-Mike


Re: [patch] tty: fix port buffer locking V2

2017-06-05 Thread Mike Galbraith
On Mon, 2017-06-05 at 08:48 +0200, Greg Kroah-Hartman wrote:
> 
> Can you redo this against 4.12-rc4 which has the original tty locking
> patch reverted?

It's against 4.12-rc4.

-Mike


[patch] tty: fix port buffer locking V2

2017-06-04 Thread Mike Galbraith
This is just in case.  While it works, I consider it to be diagnostic
data for those unfortunate enough to be intimate with tty locking :)
---

V1 (925bb1ce47f4) changelog:
tty_insert_flip_string_fixed_flag() is racy against itself when called
from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
workqueue path [2].

The problem is that port->buf.tail->used is modified without consistent
locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue
path takes ldata->output_lock.

We cannot simply take ldata->output_lock, since that is specific to the
N_TTY line discipline.

It might seem natural to try to take port->buf.lock inside
tty_insert_flip_string_fixed_flag() and friends (where port->buf is
actually used/modified), but this creates problems for flush_to_ldisc()
which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem,
and ldata->output_lock.

Therefore, the simplest solution for now seems to be to take
tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock
is also used in the write path [3] with a consistent ordering.

[1]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_send_xchar // down_read(_tty->termios_rwsem)
// mutex_lock(>atomic_write_lock)
 n_tty_ioctl_helper
 n_tty_ioctl
 tty_ioctl  // down_read(>ldisc_sem)
 do_vfs_ioctl
 SyS_ioctl

[2]: Workqueue: events_unbound flush_to_ldisc
Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_put_char
 __process_echoes
 commit_echoes  // mutex_lock(>output_lock)
 n_tty_receive_buf_common
 n_tty_receive_buf2
 tty_ldisc_receive_buf  // down_read(_tty->termios_rwsem)
 tty_port_default_receive_buf   // down_read(>ldisc_sem)
 flush_to_ldisc // mutex_lock(>buf.lock)
 process_one_work

[3]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 n_tty_write// mutex_lock(>output_lock)
// down_read(>termios_rwsem)
 do_tty_write (inline)  // mutex_lock(>atomic_write_lock)
 tty_write  // down_read(>ldisc_sem)
 __vfs_write
 vfs_write
 SyS_write

The bug can result in about a dozen different crashes depending on what
exactly gets corrupted when port->buf.tail->used points outside the
buffer.

The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is
always welcome.

Found using syzkaller.

V2: The V1 solution induced an ordering issue, holding buf->lock while
acquiring tty->atomic_write_lock.  Resolve it by moving acquisition to
flush_to_ldisc(), prior to acquisition of buf->lock.

Credit to Vegard Nossum for problem analysis/resolution, blame to me for
trivial adaptation thereof.

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Vegard Nossum <vegard.nos...@oracle.com>
Cc: <sta...@vger.kernel.org>
---
 drivers/tty/tty_buffer.c |   10 ++
 1 file changed, 10 insertions(+)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s
 {
struct tty_port *port = container_of(work, struct tty_port, buf.work);
struct tty_bufhead *buf = >buf;
+   struct tty_struct *tty = READ_ONCE(port->itty);
+   struct tty_ldisc *disc = NULL;
 
+   if (tty)
+   disc = tty_ldisc_ref(tty);
+   if (disc)
+   mutex_lock(>atomic_write_lock);
mutex_lock(>lock);
 
while (1) {
@@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s
}
 
mutex_unlock(>lock);
+   if (disc) {
+   mutex_unlock(>atomic_write_lock);
+   tty_ldisc_deref(disc);
+   }
 
 }
 


[patch] tty: fix port buffer locking V2

2017-06-04 Thread Mike Galbraith
This is just in case.  While it works, I consider it to be diagnostic
data for those unfortunate enough to be intimate with tty locking :)
---

V1 (925bb1ce47f4) changelog:
tty_insert_flip_string_fixed_flag() is racy against itself when called
from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
workqueue path [2].

The problem is that port->buf.tail->used is modified without consistent
locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue
path takes ldata->output_lock.

We cannot simply take ldata->output_lock, since that is specific to the
N_TTY line discipline.

It might seem natural to try to take port->buf.lock inside
tty_insert_flip_string_fixed_flag() and friends (where port->buf is
actually used/modified), but this creates problems for flush_to_ldisc()
which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem,
and ldata->output_lock.

Therefore, the simplest solution for now seems to be to take
tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock
is also used in the write path [3] with a consistent ordering.

[1]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_send_xchar // down_read(_tty->termios_rwsem)
// mutex_lock(>atomic_write_lock)
 n_tty_ioctl_helper
 n_tty_ioctl
 tty_ioctl  // down_read(>ldisc_sem)
 do_vfs_ioctl
 SyS_ioctl

[2]: Workqueue: events_unbound flush_to_ldisc
Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_put_char
 __process_echoes
 commit_echoes  // mutex_lock(>output_lock)
 n_tty_receive_buf_common
 n_tty_receive_buf2
 tty_ldisc_receive_buf  // down_read(_tty->termios_rwsem)
 tty_port_default_receive_buf   // down_read(>ldisc_sem)
 flush_to_ldisc // mutex_lock(>buf.lock)
 process_one_work

[3]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 n_tty_write// mutex_lock(>output_lock)
// down_read(>termios_rwsem)
 do_tty_write (inline)  // mutex_lock(>atomic_write_lock)
 tty_write  // down_read(>ldisc_sem)
 __vfs_write
 vfs_write
 SyS_write

The bug can result in about a dozen different crashes depending on what
exactly gets corrupted when port->buf.tail->used points outside the
buffer.

The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is
always welcome.

Found using syzkaller.

V2: The V1 solution induced an ordering issue, holding buf->lock while
acquiring tty->atomic_write_lock.  Resolve it by moving acquisition to
flush_to_ldisc(), prior to acquisition of buf->lock.

Credit to Vegard Nossum for problem analysis/resolution, blame to me for
trivial adaptation thereof.

Signed-off-by: Mike Galbraith 
Cc: Vegard Nossum 
Cc: 
---
 drivers/tty/tty_buffer.c |   10 ++
 1 file changed, 10 insertions(+)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s
 {
struct tty_port *port = container_of(work, struct tty_port, buf.work);
struct tty_bufhead *buf = >buf;
+   struct tty_struct *tty = READ_ONCE(port->itty);
+   struct tty_ldisc *disc = NULL;
 
+   if (tty)
+   disc = tty_ldisc_ref(tty);
+   if (disc)
+   mutex_lock(>atomic_write_lock);
mutex_lock(>lock);
 
while (1) {
@@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s
}
 
mutex_unlock(>lock);
+   if (disc) {
+   mutex_unlock(>atomic_write_lock);
+   tty_ldisc_deref(disc);
+   }
 
 }
 


Re: [bisected] Re: tty lockdep trace

2017-06-04 Thread Mike Galbraith
On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote:
> 
> I don't know how you did it, but this passes my testing (reproducers for
> both the original issue and the lockdep splat/hang).

I suppose I can sign it off, see if that inspires anyone to come up
with something better.

drivers/tty: Fix 925bb1ce47f4 circular locking dependency

925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock
while acquiring tty->atomic_write_lock.  Move acquisition to flush_to_ldisc(),
taking it prior to taking buf->lock.  Costs a reference, but appeases lockdep.

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Fixes: 925bb1ce47f4 ("tty: fix port buffer locking")
---
 drivers/tty/tty_buffer.c |   10 ++
 drivers/tty/tty_port.c   |2 --
 2 files changed, 10 insertions(+), 2 deletions(-)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s
 {
struct tty_port *port = container_of(work, struct tty_port, buf.work);
struct tty_bufhead *buf = >buf;
+   struct tty_struct *tty = READ_ONCE(port->itty);
+   struct tty_ldisc *disc = NULL;
 
+   if (tty)
+   disc = tty_ldisc_ref(tty);
+   if (disc)
+   mutex_lock(>atomic_write_lock);
mutex_lock(>lock);
 
while (1) {
@@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s
}
 
mutex_unlock(>lock);
+   if (disc) {
+   mutex_unlock(>atomic_write_lock);
+   tty_ldisc_deref(disc);
+   }
 
 }
 
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -34,9 +34,7 @@ static int tty_port_default_receive_buf(
if (!disc)
return 0;
 
-   mutex_lock(>atomic_write_lock);
ret = tty_ldisc_receive_buf(disc, p, (char *)f, count);
-   mutex_unlock(>atomic_write_lock);
 
tty_ldisc_deref(disc);
 


Re: [bisected] Re: tty lockdep trace

2017-06-04 Thread Mike Galbraith
On Sun, 2017-06-04 at 12:00 +0200, Vegard Nossum wrote:
> 
> I don't know how you did it, but this passes my testing (reproducers for
> both the original issue and the lockdep splat/hang).

I suppose I can sign it off, see if that inspires anyone to come up
with something better.

drivers/tty: Fix 925bb1ce47f4 circular locking dependency

925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock
while acquiring tty->atomic_write_lock.  Move acquisition to flush_to_ldisc(),
taking it prior to taking buf->lock.  Costs a reference, but appeases lockdep.

Signed-off-by: Mike Galbraith 
Fixes: 925bb1ce47f4 ("tty: fix port buffer locking")
---
 drivers/tty/tty_buffer.c |   10 ++
 drivers/tty/tty_port.c   |2 --
 2 files changed, 10 insertions(+), 2 deletions(-)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s
 {
struct tty_port *port = container_of(work, struct tty_port, buf.work);
struct tty_bufhead *buf = >buf;
+   struct tty_struct *tty = READ_ONCE(port->itty);
+   struct tty_ldisc *disc = NULL;
 
+   if (tty)
+   disc = tty_ldisc_ref(tty);
+   if (disc)
+   mutex_lock(>atomic_write_lock);
mutex_lock(>lock);
 
while (1) {
@@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s
}
 
mutex_unlock(>lock);
+   if (disc) {
+   mutex_unlock(>atomic_write_lock);
+   tty_ldisc_deref(disc);
+   }
 
 }
 
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -34,9 +34,7 @@ static int tty_port_default_receive_buf(
if (!disc)
return 0;
 
-   mutex_lock(>atomic_write_lock);
ret = tty_ldisc_receive_buf(disc, p, (char *)f, count);
-   mutex_unlock(>atomic_write_lock);
 
tty_ldisc_deref(disc);
 


Re: [bisected] Re: tty lockdep trace

2017-06-04 Thread Mike Galbraith
On Sun, 2017-06-04 at 10:32 +0200, Greg Kroah-Hartman wrote:
> On Sat, Jun 03, 2017 at 08:33:52AM +0200, Mike Galbraith wrote:
> > On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote:
> > > Just hit this during a trinity run.
> > 
> > 925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit
> > commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e
> > Author: Vegard Nossum <vegard.nos...@oracle.com>
> > Date:   Thu May 11 12:18:52 2017 +0200
> > 
> > tty: fix port buffer locking
> 
> Now reverting this.  Oops, sorry, forgot to add Dave and your names to
> the patch revert.  The list of people who reported this was really long,
> many thanks for this.

If flush_to_ldisc() is the problem, and taking atomic_write_lock in
that path an acceptable solution, how about do that a bit differently
instead.  Lockdep stopped grumbling, vbox seems happy.

925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock
while acquiring tty->atomic_write_lock.  Move acquisition to flush_to_ldisc(),
taking it prior to taking buf->lock.  Costs a reference, but appeases lockdep.

Not-so-signed-off-by: /me
---
 drivers/tty/tty_buffer.c |   10 ++
 drivers/tty/tty_port.c   |2 --
 2 files changed, 10 insertions(+), 2 deletions(-)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s
 {
struct tty_port *port = container_of(work, struct tty_port, buf.work);
struct tty_bufhead *buf = >buf;
+   struct tty_struct *tty = READ_ONCE(port->itty);
+   struct tty_ldisc *disc = NULL;
 
+   if (tty)
+   disc = tty_ldisc_ref(tty);
+   if (disc)
+   mutex_lock(>atomic_write_lock);
mutex_lock(>lock);
 
while (1) {
@@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s
}
 
mutex_unlock(>lock);
+   if (disc) {
+   mutex_unlock(>atomic_write_lock);
+   tty_ldisc_deref(disc);
+   }
 
 }
 
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -34,9 +34,7 @@ static int tty_port_default_receive_buf(
if (!disc)
return 0;
 
-   mutex_lock(>atomic_write_lock);
ret = tty_ldisc_receive_buf(disc, p, (char *)f, count);
-   mutex_unlock(>atomic_write_lock);
 
tty_ldisc_deref(disc);
 


Re: [bisected] Re: tty lockdep trace

2017-06-04 Thread Mike Galbraith
On Sun, 2017-06-04 at 10:32 +0200, Greg Kroah-Hartman wrote:
> On Sat, Jun 03, 2017 at 08:33:52AM +0200, Mike Galbraith wrote:
> > On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote:
> > > Just hit this during a trinity run.
> > 
> > 925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit
> > commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e
> > Author: Vegard Nossum 
> > Date:   Thu May 11 12:18:52 2017 +0200
> > 
> > tty: fix port buffer locking
> 
> Now reverting this.  Oops, sorry, forgot to add Dave and your names to
> the patch revert.  The list of people who reported this was really long,
> many thanks for this.

If flush_to_ldisc() is the problem, and taking atomic_write_lock in
that path an acceptable solution, how about do that a bit differently
instead.  Lockdep stopped grumbling, vbox seems happy.

925bb1ce47f4 (tty: fix port buffer locking) upset lockdep by holding buf->lock
while acquiring tty->atomic_write_lock.  Move acquisition to flush_to_ldisc(),
taking it prior to taking buf->lock.  Costs a reference, but appeases lockdep.

Not-so-signed-off-by: /me
---
 drivers/tty/tty_buffer.c |   10 ++
 drivers/tty/tty_port.c   |2 --
 2 files changed, 10 insertions(+), 2 deletions(-)

--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -465,7 +465,13 @@ static void flush_to_ldisc(struct work_s
 {
struct tty_port *port = container_of(work, struct tty_port, buf.work);
struct tty_bufhead *buf = >buf;
+   struct tty_struct *tty = READ_ONCE(port->itty);
+   struct tty_ldisc *disc = NULL;
 
+   if (tty)
+   disc = tty_ldisc_ref(tty);
+   if (disc)
+   mutex_lock(>atomic_write_lock);
mutex_lock(>lock);
 
while (1) {
@@ -501,6 +507,10 @@ static void flush_to_ldisc(struct work_s
}
 
mutex_unlock(>lock);
+   if (disc) {
+   mutex_unlock(>atomic_write_lock);
+   tty_ldisc_deref(disc);
+   }
 
 }
 
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -34,9 +34,7 @@ static int tty_port_default_receive_buf(
if (!disc)
return 0;
 
-   mutex_lock(>atomic_write_lock);
ret = tty_ldisc_receive_buf(disc, p, (char *)f, count);
-   mutex_unlock(>atomic_write_lock);
 
tty_ldisc_deref(disc);
 


[bisected] Re: tty lockdep trace

2017-06-03 Thread Mike Galbraith
On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote:
> Just hit this during a trinity run.

925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit
commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e
Author: Vegard Nossum 
Date:   Thu May 11 12:18:52 2017 +0200

tty: fix port buffer locking

tty_insert_flip_string_fixed_flag() is racy against itself when called
from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
workqueue path [2].

The problem is that port->buf.tail->used is modified without consistent
locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue
path takes ldata->output_lock.

We cannot simply take ldata->output_lock, since that is specific to the
N_TTY line discipline.

It might seem natural to try to take port->buf.lock inside
tty_insert_flip_string_fixed_flag() and friends (where port->buf is
actually used/modified), but this creates problems for flush_to_ldisc()
which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem,
and ldata->output_lock.

Therefore, the simplest solution for now seems to be to take
tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock
is also used in the write path [3] with a consistent ordering.

[1]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_send_xchar // down_read(_tty->termios_rwsem)
// mutex_lock(>atomic_write_lock)
 n_tty_ioctl_helper
 n_tty_ioctl
 tty_ioctl  // down_read(>ldisc_sem)
 do_vfs_ioctl
 SyS_ioctl

[2]: Workqueue: events_unbound flush_to_ldisc
Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_put_char
 __process_echoes
 commit_echoes  // mutex_lock(>output_lock)
 n_tty_receive_buf_common
 n_tty_receive_buf2
 tty_ldisc_receive_buf  // down_read(_tty->termios_rwsem)
 tty_port_default_receive_buf   // down_read(>ldisc_sem)
 flush_to_ldisc // mutex_lock(>buf.lock)
 process_one_work

[3]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 n_tty_write// mutex_lock(>output_lock)
// down_read(>termios_rwsem)
 do_tty_write (inline)  // mutex_lock(>atomic_write_lock)
 tty_write  // down_read(>ldisc_sem)
 __vfs_write
 vfs_write
 SyS_write

The bug can result in about a dozen different crashes depending on what
exactly gets corrupted when port->buf.tail->used points outside the
buffer.

The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is
always welcome.

Found using syzkaller.

Cc: 
Signed-off-by: Vegard Nossum 
Signed-off-by: Greg Kroah-Hartman 

:04 04 6df9d7d2f22ff6ac1ad265869c1cb0d15621a0eb 
56e5e66a5e451bf55857025899f4a092b51dca6e M  drivers

git bisect start
# good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12
# bad: [104c08ba8e921ef97abfdc10408d54921a6d9003] Merge tag 'acpi-4.12-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 104c08ba8e921ef97abfdc10408d54921a6d9003
# good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 16a12fa9aed176444fc795b09e796be41902bb08
# good: [2d3e4866dea96b0506395b47bfefb234f2088dac] Merge tag 'for-linus' of 
git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 2d3e4866dea96b0506395b47bfefb234f2088dac
# good: [f94c128eefcce2e3448d543f13cd7d7b8aa660a5] Merge tag 'metag-for-v4.12' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
git bisect good f94c128eefcce2e3448d543f13cd7d7b8aa660a5
# good: [18365225f0440d09708ad9daade2ec11275c3df9] hwpoison, memcg: forcibly 
uncharge LRU pages
git bisect good 18365225f0440d09708ad9daade2ec11275c3df9
# good: [5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11] nvme_fc: Support 
ctrl_loss_tmo
git bisect good 5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11
# good: [d024baa58a4a7e5eb6058017771d15b9e47b56db] Merge branch 
'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d024baa58a4a7e5eb6058017771d15b9e47b56db
# bad: [393bcfaeb8be7f46a4cd7d673e33541ebee76b12] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
git bisect bad 393bcfaeb8be7f46a4cd7d673e33541ebee76b12
# bad: [b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08] kthread: fix boot hang 
(regression) on MIPS/OpenRISC
git bisect bad b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08
# good: [6f68a6ae1f0ea2fd87e26dc345866e928b5850a8] Merge tag 

[bisected] Re: tty lockdep trace

2017-06-03 Thread Mike Galbraith
On Wed, 2017-05-31 at 13:21 -0400, Dave Jones wrote:
> Just hit this during a trinity run.

925bb1ce47f429f69aad35876df7ecd8c53deb7e is the first bad commit
commit 925bb1ce47f429f69aad35876df7ecd8c53deb7e
Author: Vegard Nossum 
Date:   Thu May 11 12:18:52 2017 +0200

tty: fix port buffer locking

tty_insert_flip_string_fixed_flag() is racy against itself when called
from the ioctl(TCXONC, TCION/TCIOFF) path [1] and the flush_to_ldisc()
workqueue path [2].

The problem is that port->buf.tail->used is modified without consistent
locking; the ioctl path takes tty->atomic_write_lock, whereas the workqueue
path takes ldata->output_lock.

We cannot simply take ldata->output_lock, since that is specific to the
N_TTY line discipline.

It might seem natural to try to take port->buf.lock inside
tty_insert_flip_string_fixed_flag() and friends (where port->buf is
actually used/modified), but this creates problems for flush_to_ldisc()
which takes it before grabbing tty->ldisc_sem, o_tty->termios_rwsem,
and ldata->output_lock.

Therefore, the simplest solution for now seems to be to take
tty->atomic_write_lock inside tty_port_default_receive_buf(). This lock
is also used in the write path [3] with a consistent ordering.

[1]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_send_xchar // down_read(_tty->termios_rwsem)
// mutex_lock(>atomic_write_lock)
 n_tty_ioctl_helper
 n_tty_ioctl
 tty_ioctl  // down_read(>ldisc_sem)
 do_vfs_ioctl
 SyS_ioctl

[2]: Workqueue: events_unbound flush_to_ldisc
Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 tty_put_char
 __process_echoes
 commit_echoes  // mutex_lock(>output_lock)
 n_tty_receive_buf_common
 n_tty_receive_buf2
 tty_ldisc_receive_buf  // down_read(_tty->termios_rwsem)
 tty_port_default_receive_buf   // down_read(>ldisc_sem)
 flush_to_ldisc // mutex_lock(>buf.lock)
 process_one_work

[3]: Call Trace:
 tty_insert_flip_string_fixed_flag
 pty_write
 n_tty_write// mutex_lock(>output_lock)
// down_read(>termios_rwsem)
 do_tty_write (inline)  // mutex_lock(>atomic_write_lock)
 tty_write  // down_read(>ldisc_sem)
 __vfs_write
 vfs_write
 SyS_write

The bug can result in about a dozen different crashes depending on what
exactly gets corrupted when port->buf.tail->used points outside the
buffer.

The patch passes my LOCKDEP/PROVE_LOCKING testing but more testing is
always welcome.

Found using syzkaller.

Cc: 
Signed-off-by: Vegard Nossum 
Signed-off-by: Greg Kroah-Hartman 

:04 04 6df9d7d2f22ff6ac1ad265869c1cb0d15621a0eb 
56e5e66a5e451bf55857025899f4a092b51dca6e M  drivers

git bisect start
# good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12
# bad: [104c08ba8e921ef97abfdc10408d54921a6d9003] Merge tag 'acpi-4.12-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 104c08ba8e921ef97abfdc10408d54921a6d9003
# good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 16a12fa9aed176444fc795b09e796be41902bb08
# good: [2d3e4866dea96b0506395b47bfefb234f2088dac] Merge tag 'for-linus' of 
git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 2d3e4866dea96b0506395b47bfefb234f2088dac
# good: [f94c128eefcce2e3448d543f13cd7d7b8aa660a5] Merge tag 'metag-for-v4.12' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
git bisect good f94c128eefcce2e3448d543f13cd7d7b8aa660a5
# good: [18365225f0440d09708ad9daade2ec11275c3df9] hwpoison, memcg: forcibly 
uncharge LRU pages
git bisect good 18365225f0440d09708ad9daade2ec11275c3df9
# good: [5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11] nvme_fc: Support 
ctrl_loss_tmo
git bisect good 5bbecdbc8e7ffaaf47ac1f02014bf3bedda3fd11
# good: [d024baa58a4a7e5eb6058017771d15b9e47b56db] Merge branch 
'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d024baa58a4a7e5eb6058017771d15b9e47b56db
# bad: [393bcfaeb8be7f46a4cd7d673e33541ebee76b12] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
git bisect bad 393bcfaeb8be7f46a4cd7d673e33541ebee76b12
# bad: [b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08] kthread: fix boot hang 
(regression) on MIPS/OpenRISC
git bisect bad b0f5a8f32e8bbdaae1abb8abe2d3cbafaba57e08
# good: [6f68a6ae1f0ea2fd87e26dc345866e928b5850a8] Merge tag 'powerpc-4.12-4' 
of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good 

Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

2017-05-19 Thread Mike Galbraith
On Fri, 2017-05-19 at 16:38 -0400, Tejun Heo wrote:
> Hello, Waiman.
> 
> On Mon, May 15, 2017 at 09:34:11AM -0400, Waiman Long wrote:
> > The rationale behind the cgroup v2 no internal process constraint is
> > to avoid resouorce competition between internal processes and child
> > cgroups. However, not all controllers have problem with internal
> > process competiton. Enforcing this rule may lead to unnatural process
> > hierarchy and unneeded levels for those controllers.
> 
> This isn't necessarily something we can determine by looking at the
> current state of controllers.  It's true that some controllers - pid
> and perf - inherently only care about membership of each task but at
> the same time neither really suffers from the constraint either.  CPU
> which is the problematic one here...

(+ cpuacct + cpuset)


Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

2017-05-19 Thread Mike Galbraith
On Fri, 2017-05-19 at 16:38 -0400, Tejun Heo wrote:
> Hello, Waiman.
> 
> On Mon, May 15, 2017 at 09:34:11AM -0400, Waiman Long wrote:
> > The rationale behind the cgroup v2 no internal process constraint is
> > to avoid resouorce competition between internal processes and child
> > cgroups. However, not all controllers have problem with internal
> > process competiton. Enforcing this rule may lead to unnatural process
> > hierarchy and unneeded levels for those controllers.
> 
> This isn't necessarily something we can determine by looking at the
> current state of controllers.  It's true that some controllers - pid
> and perf - inherently only care about membership of each task but at
> the same time neither really suffers from the constraint either.  CPU
> which is the problematic one here...

(+ cpuacct + cpuset)


Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-18 Thread Mike Galbraith
On Thu, 2017-05-18 at 17:40 -0300, Gabriel Krisman Bertazi wrote:
> > Mike Galbraith <efa...@gmx.de> writes:
> 
> > 
> > On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote:
> > 
> > > 
> > > Thanks for reporting this.  Can you confirm the following patch prevents
> > > the issue?
> > 
> > Nope, it still gripes.
> 
> Oops... Thanks for checking.  The following patch fixes the issue for
> me.  Can you please test that one?

Yup, all better.

-Mike


Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-18 Thread Mike Galbraith
On Thu, 2017-05-18 at 17:40 -0300, Gabriel Krisman Bertazi wrote:
> > Mike Galbraith  writes:
> 
> > 
> > On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote:
> > 
> > > 
> > > Thanks for reporting this.  Can you confirm the following patch prevents
> > > the issue?
> > 
> > Nope, it still gripes.
> 
> Oops... Thanks for checking.  The following patch fixes the issue for
> me.  Can you please test that one?

Yup, all better.

-Mike


Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-11 Thread Mike Galbraith
On Tue, 2017-05-09 at 04:37 +0200, Mike Galbraith wrote:
> On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote:
> 
> > Thanks for reporting this.  Can you confirm the following patch prevents
> > the issue?
> 
> Nope, it still gripes.

The reason for this gripe is that we find that we don't have memory
reserved.. a tad too late.

Xorg-2252  [000]    135.409756: qxl_release_map 
<-qxl_cursor_atomic_update
Xorg-2252  [000]    135.409756: qxl_bo_kmap_atomic_page 
<-qxl_release_map
Xorg-2252  [000]    135.409757: qxl_bo_kmap_atomic_page: ENTER
Xorg-2252  [000]    135.409757: ttm_mem_io_lock 
<-qxl_bo_kmap_atomic_page
Xorg-2252  [000]    135.409757: ttm_mem_io_reserve 
<-qxl_bo_kmap_atomic_page
Xorg-2252  [000]    135.409757: qxl_ttm_io_mem_reserve 
<-ttm_mem_io_reserve
Xorg-2252  [000]    135.409757: ttm_mem_io_unlock 
<-qxl_bo_kmap_atomic_page
Xorg-2252  [000]    135.409757: qxl_bo_kmap_atomic_page: 
PREEMPTION DISABLED
Xorg-2252  [000] ...1   135.409758: qxl_bo_kmap 
<-qxl_cursor_atomic_update
Xorg-2252  [000] ...1   135.409758: ttm_bo_kmap <-qxl_bo_kmap  <== 
too late
Xorg-2252  [000] ...1   135.409758: ttm_mem_io_reserve <-ttm_bo_kmap
Xorg-2252  [000] ...1   135.409758: qxl_ttm_io_mem_reserve 
<-ttm_mem_io_reserve
Xorg-2252  [000] ...1   135.409759: ioremap_nocache <-ttm_bo_kmap 
<== game over
Xorg-2252  [000] ...1   135.409759: __ioremap_caller 
<-ioremap_nocache
Xorg-2252  [000] ...1   135.409759: walk_system_ram_range 
<-__ioremap_caller
Xorg-2252  [000] ...1   135.409759: find_next_iomem_res 
<-walk_system_ram_range
Xorg-2252  [000] ...1   135.409759: _raw_read_lock 
<-find_next_iomem_res
Xorg-2252  [000] ...1   135.409760: reserve_memtype 
<-__ioremap_caller
Xorg-2252  [000] ...1   135.409760: pat_pagerange_is_ram 
<-reserve_memtype
Xorg-2252  [000] ...1   135.409761: walk_system_ram_range 
<-pat_pagerange_is_ram
Xorg-2252  [000] ...1   135.409761: find_next_iomem_res 
<-walk_system_ram_range
Xorg-2252  [000] ...1   135.409761: _raw_read_lock 
<-find_next_iomem_res
Xorg-2252  [000] ...1   135.409761: kmem_cache_alloc_trace 
<-reserve_memtype
Xorg-2252  [000] ...1   135.409761: __might_sleep 
<-kmem_cache_alloc_trace
Xorg-2252  [000] ...1   135.409762: ___might_sleep <-__might_sleep


Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-11 Thread Mike Galbraith
On Tue, 2017-05-09 at 04:37 +0200, Mike Galbraith wrote:
> On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote:
> 
> > Thanks for reporting this.  Can you confirm the following patch prevents
> > the issue?
> 
> Nope, it still gripes.

The reason for this gripe is that we find that we don't have memory
reserved.. a tad too late.

Xorg-2252  [000]    135.409756: qxl_release_map 
<-qxl_cursor_atomic_update
Xorg-2252  [000]    135.409756: qxl_bo_kmap_atomic_page 
<-qxl_release_map
Xorg-2252  [000]    135.409757: qxl_bo_kmap_atomic_page: ENTER
Xorg-2252  [000]    135.409757: ttm_mem_io_lock 
<-qxl_bo_kmap_atomic_page
Xorg-2252  [000]    135.409757: ttm_mem_io_reserve 
<-qxl_bo_kmap_atomic_page
Xorg-2252  [000]    135.409757: qxl_ttm_io_mem_reserve 
<-ttm_mem_io_reserve
Xorg-2252  [000]    135.409757: ttm_mem_io_unlock 
<-qxl_bo_kmap_atomic_page
Xorg-2252  [000]    135.409757: qxl_bo_kmap_atomic_page: 
PREEMPTION DISABLED
Xorg-2252  [000] ...1   135.409758: qxl_bo_kmap 
<-qxl_cursor_atomic_update
Xorg-2252  [000] ...1   135.409758: ttm_bo_kmap <-qxl_bo_kmap  <== 
too late
Xorg-2252  [000] ...1   135.409758: ttm_mem_io_reserve <-ttm_bo_kmap
Xorg-2252  [000] ...1   135.409758: qxl_ttm_io_mem_reserve 
<-ttm_mem_io_reserve
Xorg-2252  [000] ...1   135.409759: ioremap_nocache <-ttm_bo_kmap 
<== game over
Xorg-2252  [000] ...1   135.409759: __ioremap_caller 
<-ioremap_nocache
Xorg-2252  [000] ...1   135.409759: walk_system_ram_range 
<-__ioremap_caller
Xorg-2252  [000] ...1   135.409759: find_next_iomem_res 
<-walk_system_ram_range
Xorg-2252  [000] ...1   135.409759: _raw_read_lock 
<-find_next_iomem_res
Xorg-2252  [000] ...1   135.409760: reserve_memtype 
<-__ioremap_caller
Xorg-2252  [000] ...1   135.409760: pat_pagerange_is_ram 
<-reserve_memtype
Xorg-2252  [000] ...1   135.409761: walk_system_ram_range 
<-pat_pagerange_is_ram
Xorg-2252  [000] ...1   135.409761: find_next_iomem_res 
<-walk_system_ram_range
Xorg-2252  [000] ...1   135.409761: _raw_read_lock 
<-find_next_iomem_res
Xorg-2252  [000] ...1   135.409761: kmem_cache_alloc_trace 
<-reserve_memtype
Xorg-2252  [000] ...1   135.409761: __might_sleep 
<-kmem_cache_alloc_trace
Xorg-2252  [000] ...1   135.409762: ___might_sleep <-__might_sleep


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 10:32 -0600, Jens Axboe wrote:

> > Sure, but s/get_cpu_var/get_cpu_ptr first.
> 
> Doh.

Ditto, box OTOH spotted the booboo instantly :)

-Mike


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 10:32 -0600, Jens Axboe wrote:

> > Sure, but s/get_cpu_var/get_cpu_ptr first.
> 
> Doh.

Ditto, box OTOH spotted the booboo instantly :)

-Mike


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 09:45 -0600, Jens Axboe wrote:

> > > > > > Is it from this_cpu_ptr() in blk_stat_add()?
> > > > > 
> > > > > Yeah.
> > > > 
> > > > So why is this complaining, doesn't rcu_read_lock() disable
> > > > preemption?
> > > 
> > > Ah, I guess it doesn't if PREEMPT_RCU is set. How about the
> > > below?
> > 
> > Should do it.  I was about to run LTP (where it turned up) again
> > anyway, I'll add this.  No news is good news, what you should hear.
> 
> Thanks, let me know, so I can add your tested-by (or whatever you
> prefer).

Sure, but s/get_cpu_var/get_cpu_ptr first.

-MIke


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 09:45 -0600, Jens Axboe wrote:

> > > > > > Is it from this_cpu_ptr() in blk_stat_add()?
> > > > > 
> > > > > Yeah.
> > > > 
> > > > So why is this complaining, doesn't rcu_read_lock() disable
> > > > preemption?
> > > 
> > > Ah, I guess it doesn't if PREEMPT_RCU is set. How about the
> > > below?
> > 
> > Should do it.  I was about to run LTP (where it turned up) again
> > anyway, I'll add this.  No news is good news, what you should hear.
> 
> Thanks, let me know, so I can add your tested-by (or whatever you
> prefer).

Sure, but s/get_cpu_var/get_cpu_ptr first.

-MIke


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 09:15 -0600, Jens Axboe wrote:
> On 05/09/2017 09:13 AM, Jens Axboe wrote:
> > On 05/09/2017 09:04 AM, Mike Galbraith wrote:
> > > On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote:
> > > > On 05/09/2017 12:07 AM, Mike Galbraith wrote:
> > > > > Hi Jens,
> > > > > 
> > > > > I was about to fix up this splat..
> > > > > 
> > > > > [  445.022141] loop: module loaded
> > > > > [  445.078116] BUG: using smp_processor_id() in preemptible
> > > > > [] code: loop0/3801
> > > > > [  445.085873] caller is debug_smp_processor_id+0x17/0x20
> > > > > [  445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: G   
> > > > >  E   4.12.0-default #40
> > > > > [  445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]
> > > > > -/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
> > > > > [  445.108564] Call Trace:
> > > > > [  445.111016]  dump_stack+0x65/0x89
> > > > > [  445.114330]  check_preemption_disabled+0xde/0xf0
> > > > > [  445.118945]  debug_smp_processor_id+0x17/0x20
> > > > > [  445.123300]  blk_stat_add+0xb0/0x130
> > > > > [  445.126876]  __blk_mq_complete_request+0xb5/0x150
> > > > > [  445.131575]  blk_mq_complete_request+0x16/0x20
> > > > > [  445.136020]  loop_queue_work+0x5f/0xaa0 [loop]
> > > > > [  445.140461]  ? _raw_spin_unlock_irq+0x21/0x40
> > > > > [  445.144816]  ? finish_task_switch+0x85/0x270
> > > > > [  445.149085]  ? __schedule+0x291/0x8c0
> > > > > [  445.152747]  kthread_worker_fn+0xc2/0x1d0
> > > > > [  445.156754]  kthread+0x114/0x150
> > > > > [  445.159983]  ? __kthread_init_worker+0xb0/0xb0
> > > > > [  445.164424]  ? kthread_park+0x60/0x60
> > > > > [  445.168085]  ret_from_fork+0x2c/0x40
> > > > 
> > > > Is it from this_cpu_ptr() in blk_stat_add()?
> > > 
> > > Yeah.
> > 
> > So why is this complaining, doesn't rcu_read_lock() disable
> > preemption?
> 
> Ah, I guess it doesn't if PREEMPT_RCU is set. How about the below?

Should do it.  I was about to run LTP (where it turned up) again
anyway, I'll add this.  No news is good news, what you should hear.

> diff --git a/block/blk-stat.c b/block/blk-stat.c
> index 6c2f40940439..022f5925a427 100644
> --- a/block/blk-stat.c
> +++ b/block/blk-stat.c
> @@ -96,13 +96,16 @@ void blk_stat_add(struct request *rq)
>  
>   rcu_read_lock();
>   list_for_each_entry_rcu(cb, >stats->callbacks, list) {
> - if (blk_stat_is_active(cb)) {
> - bucket = cb->bucket_fn(rq);
> - if (bucket < 0)
> - continue;
> - stat = _cpu_ptr(cb->cpu_stat)[bucket];
> - __blk_stat_add(stat, value);
> - }
> + if (!blk_stat_is_active(cb))
> + continue;
> +
> + bucket = cb->bucket_fn(rq);
> + if (bucket < 0)
> + continue;
> +
> + stat = _cpu_var(cb->cpu_stat)[bucket];
> + __blk_stat_add(stat, value);
> + put_cpu_var(cb->cpu_stat);
>   }
>   rcu_read_unlock();
>  }
> 


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 09:15 -0600, Jens Axboe wrote:
> On 05/09/2017 09:13 AM, Jens Axboe wrote:
> > On 05/09/2017 09:04 AM, Mike Galbraith wrote:
> > > On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote:
> > > > On 05/09/2017 12:07 AM, Mike Galbraith wrote:
> > > > > Hi Jens,
> > > > > 
> > > > > I was about to fix up this splat..
> > > > > 
> > > > > [  445.022141] loop: module loaded
> > > > > [  445.078116] BUG: using smp_processor_id() in preemptible
> > > > > [] code: loop0/3801
> > > > > [  445.085873] caller is debug_smp_processor_id+0x17/0x20
> > > > > [  445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: G   
> > > > >  E   4.12.0-default #40
> > > > > [  445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]
> > > > > -/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
> > > > > [  445.108564] Call Trace:
> > > > > [  445.111016]  dump_stack+0x65/0x89
> > > > > [  445.114330]  check_preemption_disabled+0xde/0xf0
> > > > > [  445.118945]  debug_smp_processor_id+0x17/0x20
> > > > > [  445.123300]  blk_stat_add+0xb0/0x130
> > > > > [  445.126876]  __blk_mq_complete_request+0xb5/0x150
> > > > > [  445.131575]  blk_mq_complete_request+0x16/0x20
> > > > > [  445.136020]  loop_queue_work+0x5f/0xaa0 [loop]
> > > > > [  445.140461]  ? _raw_spin_unlock_irq+0x21/0x40
> > > > > [  445.144816]  ? finish_task_switch+0x85/0x270
> > > > > [  445.149085]  ? __schedule+0x291/0x8c0
> > > > > [  445.152747]  kthread_worker_fn+0xc2/0x1d0
> > > > > [  445.156754]  kthread+0x114/0x150
> > > > > [  445.159983]  ? __kthread_init_worker+0xb0/0xb0
> > > > > [  445.164424]  ? kthread_park+0x60/0x60
> > > > > [  445.168085]  ret_from_fork+0x2c/0x40
> > > > 
> > > > Is it from this_cpu_ptr() in blk_stat_add()?
> > > 
> > > Yeah.
> > 
> > So why is this complaining, doesn't rcu_read_lock() disable
> > preemption?
> 
> Ah, I guess it doesn't if PREEMPT_RCU is set. How about the below?

Should do it.  I was about to run LTP (where it turned up) again
anyway, I'll add this.  No news is good news, what you should hear.

> diff --git a/block/blk-stat.c b/block/blk-stat.c
> index 6c2f40940439..022f5925a427 100644
> --- a/block/blk-stat.c
> +++ b/block/blk-stat.c
> @@ -96,13 +96,16 @@ void blk_stat_add(struct request *rq)
>  
>   rcu_read_lock();
>   list_for_each_entry_rcu(cb, >stats->callbacks, list) {
> - if (blk_stat_is_active(cb)) {
> - bucket = cb->bucket_fn(rq);
> - if (bucket < 0)
> - continue;
> - stat = _cpu_ptr(cb->cpu_stat)[bucket];
> - __blk_stat_add(stat, value);
> - }
> + if (!blk_stat_is_active(cb))
> + continue;
> +
> + bucket = cb->bucket_fn(rq);
> + if (bucket < 0)
> + continue;
> +
> + stat = _cpu_var(cb->cpu_stat)[bucket];
> + __blk_stat_add(stat, value);
> + put_cpu_var(cb->cpu_stat);
>   }
>   rcu_read_unlock();
>  }
> 


Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote:
> On 05/09/2017 12:07 AM, Mike Galbraith wrote:
> > Hi Jens,
> > 
> > I was about to fix up this splat..
> > 
> > [  445.022141] loop: module loaded
> > [  445.078116] BUG: using smp_processor_id() in preemptible [] 
> > code: loop0/3801
> > [  445.085873] caller is debug_smp_processor_id+0x17/0x20
> > [  445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE   
> > 4.12.0-default #40
> > [  445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , 
> > BIOS -[D6E150AUS-1.10]- 12/15/2010
> > [  445.108564] Call Trace:
> > [  445.111016]  dump_stack+0x65/0x89
> > [  445.114330]  check_preemption_disabled+0xde/0xf0
> > [  445.118945]  debug_smp_processor_id+0x17/0x20
> > [  445.123300]  blk_stat_add+0xb0/0x130
> > [  445.126876]  __blk_mq_complete_request+0xb5/0x150
> > [  445.131575]  blk_mq_complete_request+0x16/0x20
> > [  445.136020]  loop_queue_work+0x5f/0xaa0 [loop]
> > [  445.140461]  ? _raw_spin_unlock_irq+0x21/0x40
> > [  445.144816]  ? finish_task_switch+0x85/0x270
> > [  445.149085]  ? __schedule+0x291/0x8c0
> > [  445.152747]  kthread_worker_fn+0xc2/0x1d0
> > [  445.156754]  kthread+0x114/0x150
> > [  445.159983]  ? __kthread_init_worker+0xb0/0xb0
> > [  445.164424]  ? kthread_park+0x60/0x60
> > [  445.168085]  ret_from_fork+0x2c/0x40
> 
> Is it from this_cpu_ptr() in blk_stat_add()?

Yeah.

> > ..but then noticed that __blk_mq_run_hw_queue() is also called under
> > get_cpu(), but recently grew a might_sleep(), so maybe wants some block
> > layer eyeballs.
> 
> That part is fine. The might sleep depends on BLOCKING being set in
> the hardware queue flags. And if that is set, then we don't call
> it under get_cpu().

Ah, right.

-Mike




Re: get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
On Tue, 2017-05-09 at 08:53 -0600, Jens Axboe wrote:
> On 05/09/2017 12:07 AM, Mike Galbraith wrote:
> > Hi Jens,
> > 
> > I was about to fix up this splat..
> > 
> > [  445.022141] loop: module loaded
> > [  445.078116] BUG: using smp_processor_id() in preemptible [] 
> > code: loop0/3801
> > [  445.085873] caller is debug_smp_processor_id+0x17/0x20
> > [  445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE   
> > 4.12.0-default #40
> > [  445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , 
> > BIOS -[D6E150AUS-1.10]- 12/15/2010
> > [  445.108564] Call Trace:
> > [  445.111016]  dump_stack+0x65/0x89
> > [  445.114330]  check_preemption_disabled+0xde/0xf0
> > [  445.118945]  debug_smp_processor_id+0x17/0x20
> > [  445.123300]  blk_stat_add+0xb0/0x130
> > [  445.126876]  __blk_mq_complete_request+0xb5/0x150
> > [  445.131575]  blk_mq_complete_request+0x16/0x20
> > [  445.136020]  loop_queue_work+0x5f/0xaa0 [loop]
> > [  445.140461]  ? _raw_spin_unlock_irq+0x21/0x40
> > [  445.144816]  ? finish_task_switch+0x85/0x270
> > [  445.149085]  ? __schedule+0x291/0x8c0
> > [  445.152747]  kthread_worker_fn+0xc2/0x1d0
> > [  445.156754]  kthread+0x114/0x150
> > [  445.159983]  ? __kthread_init_worker+0xb0/0xb0
> > [  445.164424]  ? kthread_park+0x60/0x60
> > [  445.168085]  ret_from_fork+0x2c/0x40
> 
> Is it from this_cpu_ptr() in blk_stat_add()?

Yeah.

> > ..but then noticed that __blk_mq_run_hw_queue() is also called under
> > get_cpu(), but recently grew a might_sleep(), so maybe wants some block
> > layer eyeballs.
> 
> That part is fine. The might sleep depends on BLOCKING being set in
> the hardware queue flags. And if that is set, then we don't call
> it under get_cpu().

Ah, right.

-Mike




get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
Hi Jens,

I was about to fix up this splat..

[  445.022141] loop: module loaded
[  445.078116] BUG: using smp_processor_id() in preemptible [] code: 
loop0/3801
[  445.085873] caller is debug_smp_processor_id+0x17/0x20
[  445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE   
4.12.0-default #40
[  445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , 
BIOS -[D6E150AUS-1.10]- 12/15/2010
[  445.108564] Call Trace:
[  445.111016]  dump_stack+0x65/0x89
[  445.114330]  check_preemption_disabled+0xde/0xf0
[  445.118945]  debug_smp_processor_id+0x17/0x20
[  445.123300]  blk_stat_add+0xb0/0x130
[  445.126876]  __blk_mq_complete_request+0xb5/0x150
[  445.131575]  blk_mq_complete_request+0x16/0x20
[  445.136020]  loop_queue_work+0x5f/0xaa0 [loop]
[  445.140461]  ? _raw_spin_unlock_irq+0x21/0x40
[  445.144816]  ? finish_task_switch+0x85/0x270
[  445.149085]  ? __schedule+0x291/0x8c0
[  445.152747]  kthread_worker_fn+0xc2/0x1d0
[  445.156754]  kthread+0x114/0x150
[  445.159983]  ? __kthread_init_worker+0xb0/0xb0
[  445.164424]  ? kthread_park+0x60/0x60
[  445.168085]  ret_from_fork+0x2c/0x40

..but then noticed that __blk_mq_run_hw_queue() is also called under
get_cpu(), but recently grew a might_sleep(), so maybe wants some block
layer eyeballs.

-Mike


get/put_cpu() usage in block/blk-mq.c

2017-05-09 Thread Mike Galbraith
Hi Jens,

I was about to fix up this splat..

[  445.022141] loop: module loaded
[  445.078116] BUG: using smp_processor_id() in preemptible [] code: 
loop0/3801
[  445.085873] caller is debug_smp_processor_id+0x17/0x20
[  445.091016] CPU: 7 PID: 3801 Comm: loop0 Tainted: GE   
4.12.0-default #40
[  445.098838] Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , 
BIOS -[D6E150AUS-1.10]- 12/15/2010
[  445.108564] Call Trace:
[  445.111016]  dump_stack+0x65/0x89
[  445.114330]  check_preemption_disabled+0xde/0xf0
[  445.118945]  debug_smp_processor_id+0x17/0x20
[  445.123300]  blk_stat_add+0xb0/0x130
[  445.126876]  __blk_mq_complete_request+0xb5/0x150
[  445.131575]  blk_mq_complete_request+0x16/0x20
[  445.136020]  loop_queue_work+0x5f/0xaa0 [loop]
[  445.140461]  ? _raw_spin_unlock_irq+0x21/0x40
[  445.144816]  ? finish_task_switch+0x85/0x270
[  445.149085]  ? __schedule+0x291/0x8c0
[  445.152747]  kthread_worker_fn+0xc2/0x1d0
[  445.156754]  kthread+0x114/0x150
[  445.159983]  ? __kthread_init_worker+0xb0/0xb0
[  445.164424]  ? kthread_park+0x60/0x60
[  445.168085]  ret_from_fork+0x2c/0x40

..but then noticed that __blk_mq_run_hw_queue() is also called under
get_cpu(), but recently grew a might_sleep(), so maybe wants some block
layer eyeballs.

-Mike


Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-08 Thread Mike Galbraith
On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote:

> Thanks for reporting this.  Can you confirm the following patch prevents
> the issue?

Nope, it still gripes.

[   43.910362] BUG: sleeping function called from invalid context at 
mm/slab.h:432
[   43.910955] in_atomic(): 1, irqs_disabled(): 0, pid: 2077, name: Xorg
[   43.911472] Preemption disabled at:
[   43.911478] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   43.912103] CPU: 0 PID: 2077 Comm: Xorg Tainted: GE   
4.12.0-master #38
[   43.912550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   43.913202] Call Trace:
[   43.913371]  dump_stack+0x65/0x89
[   43.913581]  ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   43.913876]  ___might_sleep+0x11a/0x190
[   43.914095]  __might_sleep+0x4a/0x80
[   43.914319]  ? qxl_bo_create+0x50/0x190 [qxl]
[   43.914565]  kmem_cache_alloc_trace+0x46/0x180
[   43.914836]  qxl_bo_create+0x50/0x190 [qxl]
[   43.915082]  ? refcount_dec_and_test+0x11/0x20
[   43.915332]  ? ttm_mem_io_reserve+0x41/0xe0 [ttm]
[   43.915595]  qxl_alloc_bo_reserved+0x37/0xb0 [qxl]
[   43.915884]  qxl_cursor_atomic_update+0x8f/0x260 [qxl]
[   43.916172]  ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 
[drm_kms_helper]
[   43.916623]  drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper]
[   43.916995]  drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper]
[   43.917398]  commit_tail+0x65/0x70 [drm_kms_helper]
[   43.917693]  drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper]
[   43.918039]  drm_atomic_commit+0x4b/0x50 [drm]
[   43.918334]  drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper]
[   43.918902]  __setplane_internal+0x19f/0x280 [drm]
[   43.919240]  drm_mode_cursor_universal+0x101/0x1c0 [drm]
[   43.919541]  drm_mode_cursor_common+0x15b/0x1d0 [drm]
[   43.919858]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[   43.920157]  drm_ioctl+0x211/0x460 [drm]
[   43.920383]  ? drm_mode_cursor_ioctl+0x50/0x50 [drm]
[   43.920664]  ? handle_mm_fault+0x93/0x160
[   43.920893]  do_vfs_ioctl+0x96/0x6e0
[   43.921117]  ? __fget+0x73/0xa0
[   43.921322]  SyS_ioctl+0x41/0x70
[   43.921545]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   43.922188] RIP: 0033:0x7f1145804bc7
[   43.922526] RSP: 002b:7ffcd3e50508 EFLAGS: 3246 ORIG_RAX: 
0010
[   43.923367] RAX: ffda RBX: 0040 RCX: 7f1145804bc7
[   43.923852] RDX: 7ffcd3e50540 RSI: c02464bb RDI: 000b
[   43.924299] RBP: 0040 R08: 0040 R09: 000c
[   43.924694] R10: 7ffcd3e50340 R11: 3246 R12: 0018
[   43.925128] R13: 022bc390 R14: 0040 R15: 7ffcd3e5062c


Re: [drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-08 Thread Mike Galbraith
On Mon, 2017-05-08 at 16:48 -0300, Gabriel Krisman Bertazi wrote:

> Thanks for reporting this.  Can you confirm the following patch prevents
> the issue?

Nope, it still gripes.

[   43.910362] BUG: sleeping function called from invalid context at 
mm/slab.h:432
[   43.910955] in_atomic(): 1, irqs_disabled(): 0, pid: 2077, name: Xorg
[   43.911472] Preemption disabled at:
[   43.911478] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   43.912103] CPU: 0 PID: 2077 Comm: Xorg Tainted: GE   
4.12.0-master #38
[   43.912550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   43.913202] Call Trace:
[   43.913371]  dump_stack+0x65/0x89
[   43.913581]  ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   43.913876]  ___might_sleep+0x11a/0x190
[   43.914095]  __might_sleep+0x4a/0x80
[   43.914319]  ? qxl_bo_create+0x50/0x190 [qxl]
[   43.914565]  kmem_cache_alloc_trace+0x46/0x180
[   43.914836]  qxl_bo_create+0x50/0x190 [qxl]
[   43.915082]  ? refcount_dec_and_test+0x11/0x20
[   43.915332]  ? ttm_mem_io_reserve+0x41/0xe0 [ttm]
[   43.915595]  qxl_alloc_bo_reserved+0x37/0xb0 [qxl]
[   43.915884]  qxl_cursor_atomic_update+0x8f/0x260 [qxl]
[   43.916172]  ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 
[drm_kms_helper]
[   43.916623]  drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper]
[   43.916995]  drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper]
[   43.917398]  commit_tail+0x65/0x70 [drm_kms_helper]
[   43.917693]  drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper]
[   43.918039]  drm_atomic_commit+0x4b/0x50 [drm]
[   43.918334]  drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper]
[   43.918902]  __setplane_internal+0x19f/0x280 [drm]
[   43.919240]  drm_mode_cursor_universal+0x101/0x1c0 [drm]
[   43.919541]  drm_mode_cursor_common+0x15b/0x1d0 [drm]
[   43.919858]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[   43.920157]  drm_ioctl+0x211/0x460 [drm]
[   43.920383]  ? drm_mode_cursor_ioctl+0x50/0x50 [drm]
[   43.920664]  ? handle_mm_fault+0x93/0x160
[   43.920893]  do_vfs_ioctl+0x96/0x6e0
[   43.921117]  ? __fget+0x73/0xa0
[   43.921322]  SyS_ioctl+0x41/0x70
[   43.921545]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   43.922188] RIP: 0033:0x7f1145804bc7
[   43.922526] RSP: 002b:7ffcd3e50508 EFLAGS: 3246 ORIG_RAX: 
0010
[   43.923367] RAX: ffda RBX: 0040 RCX: 7f1145804bc7
[   43.923852] RDX: 7ffcd3e50540 RSI: c02464bb RDI: 000b
[   43.924299] RBP: 0040 R08: 0040 R09: 000c
[   43.924694] R10: 7ffcd3e50340 R11: 3246 R12: 0018
[   43.925128] R13: 022bc390 R14: 0040 R15: 7ffcd3e5062c


[drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-08 Thread Mike Galbraith
Greetings,

I'm meeting this splat in master, call io_mapping_map_atomic_wc(),
then do something sleepy.  In master-rt, I meet a variant of this
doing read_lock() in find_next_iomem_res(), due to rwlocks being
sleepy in RT.

[   53.286236] BUG: sleeping function called from invalid context at 
mm/slab.h:432
[   53.286809] in_atomic(): 1, irqs_disabled(): 0, pid: 2078, name: Xorg
[   53.287354] Preemption disabled at:
[   53.287360] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   53.288028] CPU: 0 PID: 2078 Comm: Xorg Tainted: GE   
4.12.0-master #35
[   53.288530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   53.289337] Call Trace:
[   53.289513]  dump_stack+0x65/0x89
[   53.289739]  ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   53.290084]  ___might_sleep+0x11a/0x190
[   53.290336]  __might_sleep+0x4a/0x80
[   53.290575]  ? reserve_memtype+0xa5/0x390
[   53.290841]  kmem_cache_alloc_trace+0x46/0x180
[   53.291185]  reserve_memtype+0xa5/0x390
[   53.291442]  ? ttm_bo_kmap+0x1f5/0x230 [ttm]
[   53.291724]  __ioremap_caller+0xf2/0x330
[   53.291987]  ? preempt_count_add+0x47/0xb0
[   53.292284]  ioremap_nocache+0x17/0x20
[   53.292536]  ttm_bo_kmap+0x1f5/0x230 [ttm]
[   53.292816]  qxl_bo_kmap+0x43/0x70 [qxl]
[   53.293076]  qxl_cursor_atomic_update+0x8d/0x260 [qxl]
[   53.293427]  ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 
[drm_kms_helper]
[   53.293966]  drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper]
[   53.294403]  drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper]
[   53.294824]  commit_tail+0x65/0x70 [drm_kms_helper]
[   53.295145]  drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper]
[   53.295549]  drm_atomic_commit+0x4b/0x50 [drm]
[   53.295843]  drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper]
[   53.296407]  __setplane_internal+0x19f/0x280 [drm]
[   53.296737]  drm_mode_cursor_universal+0x101/0x1c0 [drm]
[   53.297091]  drm_mode_cursor_common+0x15b/0x1d0 [drm]
[   53.297434]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[   53.297758]  drm_ioctl+0x211/0x460 [drm]
[   53.298205]  ? drm_mode_cursor_ioctl+0x50/0x50 [drm]
[   53.298638]  ? handle_mm_fault+0x93/0x160
[   53.298960]  do_vfs_ioctl+0x96/0x6e0
[   53.299200]  ? __fget+0x73/0xa0
[   53.299410]  SyS_ioctl+0x41/0x70
[   53.299630]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   53.299934] RIP: 0033:0x7fcdedfb0bc7
[   53.300170] RSP: 002b:7fffbde2ed48 EFLAGS: 3246 ORIG_RAX: 
0010
[   53.300660] RAX: ffda RBX: 0040 RCX: 7fcdedfb0bc7
[   53.301130] RDX: 7fffbde2ed80 RSI: c02464bb RDI: 000b
[   53.301609] RBP: 0040 R08: 0040 R09: 000c
[   53.302124] R10: 7fffbde2eb80 R11: 3246 R12: 0018
[   53.302614] R13: 0275b390 R14: 0040 R15: 7fffbde2ee6c



[drm:qxl] BUG: sleeping function called from invalid context - qxl_bo_kmap_atomic_page()...splat

2017-05-08 Thread Mike Galbraith
Greetings,

I'm meeting this splat in master, call io_mapping_map_atomic_wc(),
then do something sleepy.  In master-rt, I meet a variant of this
doing read_lock() in find_next_iomem_res(), due to rwlocks being
sleepy in RT.

[   53.286236] BUG: sleeping function called from invalid context at 
mm/slab.h:432
[   53.286809] in_atomic(): 1, irqs_disabled(): 0, pid: 2078, name: Xorg
[   53.287354] Preemption disabled at:
[   53.287360] [] qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   53.288028] CPU: 0 PID: 2078 Comm: Xorg Tainted: GE   
4.12.0-master #35
[   53.288530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20161202_174313-build11a 04/01/2014
[   53.289337] Call Trace:
[   53.289513]  dump_stack+0x65/0x89
[   53.289739]  ? qxl_bo_kmap_atomic_page+0xa5/0x100 [qxl]
[   53.290084]  ___might_sleep+0x11a/0x190
[   53.290336]  __might_sleep+0x4a/0x80
[   53.290575]  ? reserve_memtype+0xa5/0x390
[   53.290841]  kmem_cache_alloc_trace+0x46/0x180
[   53.291185]  reserve_memtype+0xa5/0x390
[   53.291442]  ? ttm_bo_kmap+0x1f5/0x230 [ttm]
[   53.291724]  __ioremap_caller+0xf2/0x330
[   53.291987]  ? preempt_count_add+0x47/0xb0
[   53.292284]  ioremap_nocache+0x17/0x20
[   53.292536]  ttm_bo_kmap+0x1f5/0x230 [ttm]
[   53.292816]  qxl_bo_kmap+0x43/0x70 [qxl]
[   53.293076]  qxl_cursor_atomic_update+0x8d/0x260 [qxl]
[   53.293427]  ? drm_atomic_helper_update_legacy_modeset_state+0x1d6/0x210 
[drm_kms_helper]
[   53.293966]  drm_atomic_helper_commit_planes+0xec/0x230 [drm_kms_helper]
[   53.294403]  drm_atomic_helper_commit_tail+0x2b/0x60 [drm_kms_helper]
[   53.294824]  commit_tail+0x65/0x70 [drm_kms_helper]
[   53.295145]  drm_atomic_helper_commit+0xa9/0x100 [drm_kms_helper]
[   53.295549]  drm_atomic_commit+0x4b/0x50 [drm]
[   53.295843]  drm_atomic_helper_update_plane+0xf1/0x110 [drm_kms_helper]
[   53.296407]  __setplane_internal+0x19f/0x280 [drm]
[   53.296737]  drm_mode_cursor_universal+0x101/0x1c0 [drm]
[   53.297091]  drm_mode_cursor_common+0x15b/0x1d0 [drm]
[   53.297434]  drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[   53.297758]  drm_ioctl+0x211/0x460 [drm]
[   53.298205]  ? drm_mode_cursor_ioctl+0x50/0x50 [drm]
[   53.298638]  ? handle_mm_fault+0x93/0x160
[   53.298960]  do_vfs_ioctl+0x96/0x6e0
[   53.299200]  ? __fget+0x73/0xa0
[   53.299410]  SyS_ioctl+0x41/0x70
[   53.299630]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[   53.299934] RIP: 0033:0x7fcdedfb0bc7
[   53.300170] RSP: 002b:7fffbde2ed48 EFLAGS: 3246 ORIG_RAX: 
0010
[   53.300660] RAX: ffda RBX: 0040 RCX: 7fcdedfb0bc7
[   53.301130] RDX: 7fffbde2ed80 RSI: c02464bb RDI: 000b
[   53.301609] RBP: 0040 R08: 0040 R09: 000c
[   53.302124] R10: 7fffbde2eb80 R11: 3246 R12: 0018
[   53.302614] R13: 0275b390 R14: 0040 R15: 7fffbde2ee6c



[patch] device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

2017-05-05 Thread Mike Galbraith

ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined!
ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined!
ERROR: "dax_region_put" [drivers/dax/dax_pmem.ko] undefined!

Signed-off-by: Mike Galbraith <efa...@gmx.de>
---
 drivers/dax/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -19,7 +19,7 @@ config DEV_DAX
 
 config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
-   depends on LIBNVDIMM && NVDIMM_DAX
+   depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
default DEV_DAX
help
  Support raw access to persistent memory.  Note that this


[patch] device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX

2017-05-05 Thread Mike Galbraith

ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined!
ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined!
ERROR: "dax_region_put" [drivers/dax/dax_pmem.ko] undefined!

Signed-off-by: Mike Galbraith 
---
 drivers/dax/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -19,7 +19,7 @@ config DEV_DAX
 
 config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
-   depends on LIBNVDIMM && NVDIMM_DAX
+   depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
default DEV_DAX
help
  Support raw access to persistent memory.  Note that this


Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic

2017-05-04 Thread Mike Galbraith
On Fri, 2017-05-05 at 06:26 +0200, Mike Galbraith wrote:
> > To get rteval:
> > 
> >  $ git clone
> > git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git
> >  $ cd rteval
> >  $ git checkout origin/v2/master
> 
> ImportError: No module named ethtool
> 
> Where does one find the ethtool bits it seems to depend upon?

Nevermind, google found a tarball.

-Mike


Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic

2017-05-04 Thread Mike Galbraith
On Fri, 2017-05-05 at 06:26 +0200, Mike Galbraith wrote:
> > To get rteval:
> > 
> >  $ git clone
> > git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git
> >  $ cd rteval
> >  $ git checkout origin/v2/master
> 
> ImportError: No module named ethtool
> 
> Where does one find the ethtool bits it seems to depend upon?

Nevermind, google found a tarball.

-Mike


Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic

2017-05-04 Thread Mike Galbraith

> To get rteval:
> 
>  $ git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git
>  $ cd rteval
>  $ git checkout origin/v2/master

ImportError: No module named ethtool

Where does one find the ethtool bits it seems to depend upon?

-Mike


Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic

2017-05-04 Thread Mike Galbraith

> To get rteval:
> 
>  $ git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rteval.git
>  $ cd rteval
>  $ git checkout origin/v2/master

ImportError: No module named ethtool

Where does one find the ethtool bits it seems to depend upon?

-Mike


<    5   6   7   8   9   10   11   12   13   14   >