Re: [GIT PULL] scheduler changes for v4.3
* Markus Trippelsdorf wrote: > On 2015.09.01 at 10:38 +0200, Ingo Molnar wrote: > > > > * Markus Trippelsdorf wrote: > > > > > Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : > > > > > > commit a1d8561172f369ba56d636df49a6b4d6d77e2123 > > > Merge: 3959df1dfb95 ff277d4250fe > > > Author: Linus Torvalds > > > Date: Mon Aug 31 20:26:22 2015 -0700 > > > > > diff --cc kernel/cpu.c > > > index 3c91a3fdfce5,664ce5299334..82cf9dff4295 > > > --- a/kernel/cpu.c > > > +++ b/kernel/cpu.c > > > @@@ -394,15 -392,10 +394,15 @@@ static int _cpu_down(unsigned int cpu, > > > smpboot_park_threads(cpu); > > > > > > /* > > > - * So now all preempt/rcu users must observe !cpu_active(). > > > + * Prevent irq alloc/free while the dying cpu reorganizes the > > > + * interrupt affinities. > > >*/ > > > +irq_lock_sparse(); > > > > > > +/* > > > + * So now all preempt/rcu users must observe !cpu_active(). > > > + */ > > > - err = __stop_machine(take_cpu_down, _param, > > > cpumask_of(cpu)); > > > + err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > > if (err) { > > > /* CPU didn't die: tell everyone. Can't complain. */ > > > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > > > > So the irq_lock_sparse() change is from a commit that got merged in the > > last merge > > window, which is part of v4.2: > > > > ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and > > move it to x86 for now") > > > > Could you please post the patch against Linus's latest that you have tested > > on > > your system to make it boot fine? > > > > The one you posted cannot possibly build, because access to > > __stop_machine() is > > gone from cpu.c: > > As I wrote in my other reply. The boot failure is nondeterministic (boot > succeeds roughly every sixth time). So the bisection and the patch is > just bogus (,but the boot failure is real). > > Sorry. No problem. Please let us know if any of these commits does turn out to be the culprit. (Which is always a possibility.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 10:38 +0200, Ingo Molnar wrote: > The one you posted cannot possibly build, because access to __stop_machine() > is > gone from cpu.c: > > kernel/cpu.c: In function ‘_cpu_down’: > kernel/cpu.c:404:2: error: implicit declaration of function ‘__stop_machine’ > [-Werror=implicit-function-declaration] > err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); Just to clear this up: It did build on my machine because it wasn't compiled at all (CONFIG_HOTPLUG_CPU is not set). -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 10:38 +0200, Ingo Molnar wrote: > > * Markus Trippelsdorf wrote: > > > Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : > > > > commit a1d8561172f369ba56d636df49a6b4d6d77e2123 > > Merge: 3959df1dfb95 ff277d4250fe > > Author: Linus Torvalds > > Date: Mon Aug 31 20:26:22 2015 -0700 > > > diff --cc kernel/cpu.c > > index 3c91a3fdfce5,664ce5299334..82cf9dff4295 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@@ -394,15 -392,10 +394,15 @@@ static int _cpu_down(unsigned int cpu, > > smpboot_park_threads(cpu); > > > > /* > > - * So now all preempt/rcu users must observe !cpu_active(). > > + * Prevent irq alloc/free while the dying cpu reorganizes the > > + * interrupt affinities. > > */ > > + irq_lock_sparse(); > > > > + /* > > + * So now all preempt/rcu users must observe !cpu_active(). > > + */ > > - err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > + err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > if (err) { > > /* CPU didn't die: tell everyone. Can't complain. */ > > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > > So the irq_lock_sparse() change is from a commit that got merged in the last > merge > window, which is part of v4.2: > > ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and move > it to x86 for now") > > Could you please post the patch against Linus's latest that you have tested > on > your system to make it boot fine? > > The one you posted cannot possibly build, because access to __stop_machine() > is > gone from cpu.c: As I wrote in my other reply. The boot failure is nondeterministic (boot succeeds roughly every sixth time). So the bisection and the patch is just bogus (,but the boot failure is real). Sorry. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 10:18 +0200, Markus Trippelsdorf wrote: > On 2015.09.01 at 09:27 +0200, Ingo Molnar wrote: > > > > * Markus Trippelsdorf wrote: > > > > > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > > > Please pull the latest sched-core-for-linus git tree from: > > > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > > > sched-core-for-linus > > > > > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > > > comment in enqueue_task_dl() > > > > > > Linus, > > > > > > your merge (commit a1d8561172f369ba) breaks booting on my machine. > > > > So could you please double check your side? > > I just double checked and my git tree is clean. > But it could be that the boot failure is nondeterministic, so git bisect > got off the track. Just to confirm. The boot failure _is_ nondeterministic. I just booted a few times in a row and it failed five times and succeeded the sixth time. So please ignore my bisection result and the bogus patch that I've posted. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
* Markus Trippelsdorf wrote: > Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : > > commit a1d8561172f369ba56d636df49a6b4d6d77e2123 > Merge: 3959df1dfb95 ff277d4250fe > Author: Linus Torvalds > Date: Mon Aug 31 20:26:22 2015 -0700 > diff --cc kernel/cpu.c > index 3c91a3fdfce5,664ce5299334..82cf9dff4295 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@@ -394,15 -392,10 +394,15 @@@ static int _cpu_down(unsigned int cpu, > smpboot_park_threads(cpu); > > /* > - * So now all preempt/rcu users must observe !cpu_active(). > + * Prevent irq alloc/free while the dying cpu reorganizes the > + * interrupt affinities. >*/ > +irq_lock_sparse(); > > +/* > + * So now all preempt/rcu users must observe !cpu_active(). > + */ > - err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > + err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > if (err) { > /* CPU didn't die: tell everyone. Can't complain. */ > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); So the irq_lock_sparse() change is from a commit that got merged in the last merge window, which is part of v4.2: ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now") Could you please post the patch against Linus's latest that you have tested on your system to make it boot fine? The one you posted cannot possibly build, because access to __stop_machine() is gone from cpu.c: kernel/cpu.c: In function ‘_cpu_down’: kernel/cpu.c:404:2: error: implicit declaration of function ‘__stop_machine’ [-Werror=implicit-function-declaration] err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); ^ Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 09:27 +0200, Ingo Molnar wrote: > > * Markus Trippelsdorf wrote: > > > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > > Please pull the latest sched-core-for-linus git tree from: > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > > sched-core-for-linus > > > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > > comment in enqueue_task_dl() > > > > Linus, > > > > your merge (commit a1d8561172f369ba) breaks booting on my machine. > > So could you please double check your side? I just double checked and my git tree is clean. But it could be that the boot failure is nondeterministic, so git bisect got off the track. Anyway I don't have time to debug this further ATM. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 09:27 +0200, Ingo Molnar wrote: > > * Markus Trippelsdorf wrote: > > > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > > Please pull the latest sched-core-for-linus git tree from: > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > > sched-core-for-linus > > > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > > comment in enqueue_task_dl() > > > > Linus, > > > > your merge (commit a1d8561172f369ba) breaks booting on my machine. > > So I just double checked Linus's merge resolution, re-created it from > scratch, and > it looks correct. Furthermore, I resolved the conflict similarly in the past > and > this resolution had been in -tip and linux-next testing for some while. > > But I noticed something weird in your revert patch: > > > > > I wrote down the backtrace: > > > > map_vsyscall > > kvm_arch_hardware_setup > > map_vsyscall > > kvm_init > > map_vsyscall > > do_one_initcall > > kernel_init_freeable > > rest_init > > kernel_init > > ret_from_fork > > rest_init > > > > RIP: svm_hardware_setup > > > > Reverting your merge resolution fixes the issue: > > > > diff --git a/kernel/cpu.c b/kernel/cpu.c > > index 82cf9dff4295..873aa0757b04 100644 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@ -397,12 +397,11 @@ static int _cpu_down(unsigned int cpu, int > > tasks_frozen) > > * Prevent irq alloc/free while the dying cpu reorganizes the > > * interrupt affinities. > > */ > > - irq_lock_sparse(); > > So where does this chunk come from? None of the trees nor the merge > resolution > touches this code. > > Maybe you had other changes in your tree that interfered? > > That missing irq_lock_sparse() might indeed break the boot. But that's not > something that got in there from Linus's tree AFAICS. > > > /* > > * So now all preempt/rcu users must observe !cpu_active(). > > */ > > - err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > if (err) { > > /* CPU didn't die: tell everyone. Can't complain. */ > > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > > This change cannot possibly have built on Linus's tree, as __stop_machine() > got > unexported, it is now internal and static to kernel/stop_machine.c... > > So could you please double check your side? Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : commit a1d8561172f369ba56d636df49a6b4d6d77e2123 Merge: 3959df1dfb95 ff277d4250fe Author: Linus Torvalds Date: Mon Aug 31 20:26:22 2015 -0700 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into
Re: [GIT PULL] scheduler changes for v4.3
* Markus Trippelsdorf wrote: > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > Please pull the latest sched-core-for-linus git tree from: > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > sched-core-for-linus > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > comment in enqueue_task_dl() > > Linus, > > your merge (commit a1d8561172f369ba) breaks booting on my machine. So I just double checked Linus's merge resolution, re-created it from scratch, and it looks correct. Furthermore, I resolved the conflict similarly in the past and this resolution had been in -tip and linux-next testing for some while. But I noticed something weird in your revert patch: > > I wrote down the backtrace: > > map_vsyscall > kvm_arch_hardware_setup > map_vsyscall > kvm_init > map_vsyscall > do_one_initcall > kernel_init_freeable > rest_init > kernel_init > ret_from_fork > rest_init > > RIP: svm_hardware_setup > > Reverting your merge resolution fixes the issue: > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 82cf9dff4295..873aa0757b04 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -397,12 +397,11 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) >* Prevent irq alloc/free while the dying cpu reorganizes the >* interrupt affinities. >*/ > - irq_lock_sparse(); So where does this chunk come from? None of the trees nor the merge resolution touches this code. Maybe you had other changes in your tree that interfered? That missing irq_lock_sparse() might indeed break the boot. But that's not something that got in there from Linus's tree AFAICS. > /* >* So now all preempt/rcu users must observe !cpu_active(). >*/ > - err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > if (err) { > /* CPU didn't die: tell everyone. Can't complain. */ > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); This change cannot possibly have built on Linus's tree, as __stop_machine() got unexported, it is now internal and static to kernel/stop_machine.c... So could you please double check your side? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > Please pull the latest sched-core-for-linus git tree from: > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > sched-core-for-linus > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > comment in enqueue_task_dl() Linus, your merge (commit a1d8561172f369ba) breaks booting on my machine. I wrote down the backtrace: map_vsyscall kvm_arch_hardware_setup map_vsyscall kvm_init map_vsyscall do_one_initcall kernel_init_freeable rest_init kernel_init ret_from_fork rest_init RIP: svm_hardware_setup Reverting your merge resolution fixes the issue: diff --git a/kernel/cpu.c b/kernel/cpu.c index 82cf9dff4295..873aa0757b04 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -397,12 +397,11 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) * Prevent irq alloc/free while the dying cpu reorganizes the * interrupt affinities. */ - irq_lock_sparse(); /* * So now all preempt/rcu users must observe !cpu_active(). */ - err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); if (err) { /* CPU didn't die: tell everyone. Can't complain. */ cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
* Markus Trippelsdorfwrote: > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > Please pull the latest sched-core-for-linus git tree from: > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > sched-core-for-linus > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > comment in enqueue_task_dl() > > Linus, > > your merge (commit a1d8561172f369ba) breaks booting on my machine. So I just double checked Linus's merge resolution, re-created it from scratch, and it looks correct. Furthermore, I resolved the conflict similarly in the past and this resolution had been in -tip and linux-next testing for some while. But I noticed something weird in your revert patch: > > I wrote down the backtrace: > > map_vsyscall > kvm_arch_hardware_setup > map_vsyscall > kvm_init > map_vsyscall > do_one_initcall > kernel_init_freeable > rest_init > kernel_init > ret_from_fork > rest_init > > RIP: svm_hardware_setup > > Reverting your merge resolution fixes the issue: > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 82cf9dff4295..873aa0757b04 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -397,12 +397,11 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) >* Prevent irq alloc/free while the dying cpu reorganizes the >* interrupt affinities. >*/ > - irq_lock_sparse(); So where does this chunk come from? None of the trees nor the merge resolution touches this code. Maybe you had other changes in your tree that interfered? That missing irq_lock_sparse() might indeed break the boot. But that's not something that got in there from Linus's tree AFAICS. > /* >* So now all preempt/rcu users must observe !cpu_active(). >*/ > - err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > if (err) { > /* CPU didn't die: tell everyone. Can't complain. */ > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); This change cannot possibly have built on Linus's tree, as __stop_machine() got unexported, it is now internal and static to kernel/stop_machine.c... So could you please double check your side? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > Please pull the latest sched-core-for-linus git tree from: > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > sched-core-for-linus > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > comment in enqueue_task_dl() Linus, your merge (commit a1d8561172f369ba) breaks booting on my machine. I wrote down the backtrace: map_vsyscall kvm_arch_hardware_setup map_vsyscall kvm_init map_vsyscall do_one_initcall kernel_init_freeable rest_init kernel_init ret_from_fork rest_init RIP: svm_hardware_setup Reverting your merge resolution fixes the issue: diff --git a/kernel/cpu.c b/kernel/cpu.c index 82cf9dff4295..873aa0757b04 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -397,12 +397,11 @@ static int _cpu_down(unsigned int cpu, int tasks_frozen) * Prevent irq alloc/free while the dying cpu reorganizes the * interrupt affinities. */ - irq_lock_sparse(); /* * So now all preempt/rcu users must observe !cpu_active(). */ - err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); if (err) { /* CPU didn't die: tell everyone. Can't complain. */ cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 09:27 +0200, Ingo Molnar wrote: > > * Markus Trippelsdorfwrote: > > > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > > Please pull the latest sched-core-for-linus git tree from: > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > > sched-core-for-linus > > > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > > comment in enqueue_task_dl() > > > > Linus, > > > > your merge (commit a1d8561172f369ba) breaks booting on my machine. > > So I just double checked Linus's merge resolution, re-created it from > scratch, and > it looks correct. Furthermore, I resolved the conflict similarly in the past > and > this resolution had been in -tip and linux-next testing for some while. > > But I noticed something weird in your revert patch: > > > > > I wrote down the backtrace: > > > > map_vsyscall > > kvm_arch_hardware_setup > > map_vsyscall > > kvm_init > > map_vsyscall > > do_one_initcall > > kernel_init_freeable > > rest_init > > kernel_init > > ret_from_fork > > rest_init > > > > RIP: svm_hardware_setup > > > > Reverting your merge resolution fixes the issue: > > > > diff --git a/kernel/cpu.c b/kernel/cpu.c > > index 82cf9dff4295..873aa0757b04 100644 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@ -397,12 +397,11 @@ static int _cpu_down(unsigned int cpu, int > > tasks_frozen) > > * Prevent irq alloc/free while the dying cpu reorganizes the > > * interrupt affinities. > > */ > > - irq_lock_sparse(); > > So where does this chunk come from? None of the trees nor the merge > resolution > touches this code. > > Maybe you had other changes in your tree that interfered? > > That missing irq_lock_sparse() might indeed break the boot. But that's not > something that got in there from Linus's tree AFAICS. > > > /* > > * So now all preempt/rcu users must observe !cpu_active(). > > */ > > - err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > + err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > if (err) { > > /* CPU didn't die: tell everyone. Can't complain. */ > > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > > This change cannot possibly have built on Linus's tree, as __stop_machine() > got > unexported, it is now internal and static to kernel/stop_machine.c... > > So could you please double check your side? Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : commit a1d8561172f369ba56d636df49a6b4d6d77e2123 Merge: 3959df1dfb95 ff277d4250fe Author: Linus Torvalds Date: Mon Aug 31 20:26:22 2015 -0700 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 09:27 +0200, Ingo Molnar wrote: > > * Markus Trippelsdorfwrote: > > > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > > Please pull the latest sched-core-for-linus git tree from: > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > > sched-core-for-linus > > > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > > comment in enqueue_task_dl() > > > > Linus, > > > > your merge (commit a1d8561172f369ba) breaks booting on my machine. > > So could you please double check your side? I just double checked and my git tree is clean. But it could be that the boot failure is nondeterministic, so git bisect got off the track. Anyway I don't have time to debug this further ATM. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 10:18 +0200, Markus Trippelsdorf wrote: > On 2015.09.01 at 09:27 +0200, Ingo Molnar wrote: > > > > * Markus Trippelsdorfwrote: > > > > > On 2015.08.31 at 19:24 +0200, Ingo Molnar wrote: > > > > Please pull the latest sched-core-for-linus git tree from: > > > > > > > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > > > > sched-core-for-linus > > > > > > > ># HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix > > > > comment in enqueue_task_dl() > > > > > > Linus, > > > > > > your merge (commit a1d8561172f369ba) breaks booting on my machine. > > > > So could you please double check your side? > > I just double checked and my git tree is clean. > But it could be that the boot failure is nondeterministic, so git bisect > got off the track. Just to confirm. The boot failure _is_ nondeterministic. I just booted a few times in a row and it failed five times and succeeded the sixth time. So please ignore my bisection result and the bogus patch that I've posted. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 10:38 +0200, Ingo Molnar wrote: > > * Markus Trippelsdorfwrote: > > > Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : > > > > commit a1d8561172f369ba56d636df49a6b4d6d77e2123 > > Merge: 3959df1dfb95 ff277d4250fe > > Author: Linus Torvalds > > Date: Mon Aug 31 20:26:22 2015 -0700 > > > diff --cc kernel/cpu.c > > index 3c91a3fdfce5,664ce5299334..82cf9dff4295 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@@ -394,15 -392,10 +394,15 @@@ static int _cpu_down(unsigned int cpu, > > smpboot_park_threads(cpu); > > > > /* > > - * So now all preempt/rcu users must observe !cpu_active(). > > + * Prevent irq alloc/free while the dying cpu reorganizes the > > + * interrupt affinities. > > */ > > + irq_lock_sparse(); > > > > + /* > > + * So now all preempt/rcu users must observe !cpu_active(). > > + */ > > - err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > + err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > if (err) { > > /* CPU didn't die: tell everyone. Can't complain. */ > > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > > So the irq_lock_sparse() change is from a commit that got merged in the last > merge > window, which is part of v4.2: > > ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and move > it to x86 for now") > > Could you please post the patch against Linus's latest that you have tested > on > your system to make it boot fine? > > The one you posted cannot possibly build, because access to __stop_machine() > is > gone from cpu.c: As I wrote in my other reply. The boot failure is nondeterministic (boot succeeds roughly every sixth time). So the bisection and the patch is just bogus (,but the boot failure is real). Sorry. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
On 2015.09.01 at 10:38 +0200, Ingo Molnar wrote: > The one you posted cannot possibly build, because access to __stop_machine() > is > gone from cpu.c: > > kernel/cpu.c: In function ‘_cpu_down’: > kernel/cpu.c:404:2: error: implicit declaration of function ‘__stop_machine’ > [-Werror=implicit-function-declaration] > err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); Just to clear this up: It did build on my machine because it wasn't compiled at all (CONFIG_HOTPLUG_CPU is not set). -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
* Markus Trippelsdorfwrote: > Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : > > commit a1d8561172f369ba56d636df49a6b4d6d77e2123 > Merge: 3959df1dfb95 ff277d4250fe > Author: Linus Torvalds > Date: Mon Aug 31 20:26:22 2015 -0700 > diff --cc kernel/cpu.c > index 3c91a3fdfce5,664ce5299334..82cf9dff4295 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@@ -394,15 -392,10 +394,15 @@@ static int _cpu_down(unsigned int cpu, > smpboot_park_threads(cpu); > > /* > - * So now all preempt/rcu users must observe !cpu_active(). > + * Prevent irq alloc/free while the dying cpu reorganizes the > + * interrupt affinities. >*/ > +irq_lock_sparse(); > > +/* > + * So now all preempt/rcu users must observe !cpu_active(). > + */ > - err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > + err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > if (err) { > /* CPU didn't die: tell everyone. Can't complain. */ > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); So the irq_lock_sparse() change is from a commit that got merged in the last merge window, which is part of v4.2: ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now") Could you please post the patch against Linus's latest that you have tested on your system to make it boot fine? The one you posted cannot possibly build, because access to __stop_machine() is gone from cpu.c: kernel/cpu.c: In function ‘_cpu_down’: kernel/cpu.c:404:2: error: implicit declaration of function ‘__stop_machine’ [-Werror=implicit-function-declaration] err = __stop_machine(take_cpu_down, _param, cpumask_of(cpu)); ^ Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] scheduler changes for v4.3
* Markus Trippelsdorfwrote: > On 2015.09.01 at 10:38 +0200, Ingo Molnar wrote: > > > > * Markus Trippelsdorf wrote: > > > > > Well, git show a1d8561172f369ba56d636df49a6b4d6d77e2123 : > > > > > > commit a1d8561172f369ba56d636df49a6b4d6d77e2123 > > > Merge: 3959df1dfb95 ff277d4250fe > > > Author: Linus Torvalds > > > Date: Mon Aug 31 20:26:22 2015 -0700 > > > > > diff --cc kernel/cpu.c > > > index 3c91a3fdfce5,664ce5299334..82cf9dff4295 > > > --- a/kernel/cpu.c > > > +++ b/kernel/cpu.c > > > @@@ -394,15 -392,10 +394,15 @@@ static int _cpu_down(unsigned int cpu, > > > smpboot_park_threads(cpu); > > > > > > /* > > > - * So now all preempt/rcu users must observe !cpu_active(). > > > + * Prevent irq alloc/free while the dying cpu reorganizes the > > > + * interrupt affinities. > > >*/ > > > +irq_lock_sparse(); > > > > > > +/* > > > + * So now all preempt/rcu users must observe !cpu_active(). > > > + */ > > > - err = __stop_machine(take_cpu_down, _param, > > > cpumask_of(cpu)); > > > + err = stop_machine(take_cpu_down, _param, cpumask_of(cpu)); > > > if (err) { > > > /* CPU didn't die: tell everyone. Can't complain. */ > > > cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu); > > > > So the irq_lock_sparse() change is from a commit that got merged in the > > last merge > > window, which is part of v4.2: > > > > ce0d3c0a6fb1 ("genirq: Revert sparse irq locking around __cpu_up() and > > move it to x86 for now") > > > > Could you please post the patch against Linus's latest that you have tested > > on > > your system to make it boot fine? > > > > The one you posted cannot possibly build, because access to > > __stop_machine() is > > gone from cpu.c: > > As I wrote in my other reply. The boot failure is nondeterministic (boot > succeeds roughly every sixth time). So the bisection and the patch is > just bogus (,but the boot failure is real). > > Sorry. No problem. Please let us know if any of these commits does turn out to be the culprit. (Which is always a possibility.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] scheduler changes for v4.3
Linus, Please pull the latest sched-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-for-linus # HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix comment in enqueue_task_dl() The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups Thanks, Ingo --> Andrea Parri (2): sched/deadline: Fix comment in push_dl_tasks() sched/deadline: Fix comment in enqueue_task_dl() Aravind Gopalakrishnan (1): sched/numa: Fix NUMA_DIRECT topology identification Boqun Feng (1): sched/fair: Clean up the __sched_period() code Byungchul Park (2): sched/fair: Fix a comment reflecting function name change sched: Ensure a task has a non-normalized vruntime when returning back to CFS Chris Metcalf (2): sched, tile: Remove finish_arch_switch tile: Reorganize _switch_to() Ingo Molnar (1): sched, sparc32: Update scheduler comments in copy_thread() Konstantin Khlebnikov (3): sched/preempt, xen: Use need_resched() instead of should_resched() sched/preempt, powerpc, kvm: Use need_resched() instead of should_resched() sched/preempt: Fix cond_resched_lock() and cond_resched_softirq() Lucas Stach (1): sched/idle: Move latency tracing stop/start calls deeper inside the idle loop Markus Elfring (1): sched, sysctl: Delete an unnecessary check before unregister_sysctl_table() Mike Galbraith (1): sched/fair: Beef up wake_wide() Oleg Nesterov (5): stop_machine: Move 'cpu_stopper_task' and 'stop_cpus_work' into 'struct cpu_stopper' stop_machine: Don't do for_each_cpu() twice in queue_stop_cpus_work() stop_machine: Unexport __stop_machine() stop_machine: Use 'cpu_stop_fn_t' where possible stop_machine: Remove cpu_stop_work's from list in cpu_stop_park() Peter Zijlstra (9): sched/cputime: Guarantee stime + utime == rtime sched: Introduce the 'trace_sched_waking' tracepoint sched, avr32: Remove finish_arch_switch() sched, score: Remove finish_arch_switch() sched, sh: Fold finish_arch_switch() into switch_to() sched: Remove finish_arch_switch() sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Make sched_class::set_cpus_allowed() unconditional sched: Change the sched_class::set_cpus_allowed() calling context Ralf Baechle (1): sched, MIPS: Get rid of finish_arch_switch() Srikar Dronamraju (2): sched/numa: Prefer NUMA hotness over cache hotness sched/numa: Consider 'imbalance_pct' when comparing loads in numa_has_capacity() Vincent Guittot (1): sched/fair: Implement update_blocked_averages() for CONFIG_FAIR_GROUP_SCHED=n Will Deacon (1): sched, arm: Remove finish_arch_switch() Xunlei Pang (2): sched/rt: Remove a redundant condition from task_woken_rt() sched/deadline: Remove a redundant condition from task_woken_dl() Yuyang Du (7): sched/fair: Avoid pulling all tasks in idle balancing sched/fair: Remove rq's runnable avg sched/fair: Rewrite runnable load and utilization average tracking sched/fair: Init cfs_rq's sched_entity load average sched/fair: Remove task and group entity load when they are dead sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Clean up load average references bseg...@google.com (1): sched/numa: Check sched_feat(NUMA) in migrate_improves_locality() arch/arm/include/asm/switch_to.h | 5 +- arch/avr32/include/asm/switch_to.h | 7 +- arch/mips/include/asm/switch_to.h | 48 +- arch/powerpc/kvm/book3s_hv.c | 2 +- arch/score/include/asm/switch_to.h | 2 - arch/sh/include/asm/switch_to_32.h | 8 +- arch/sparc/kernel/process_32.c | 10 +-
[GIT PULL] scheduler changes for v4.3
Linus, Please pull the latest sched-core-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-for-linus # HEAD: ff277d4250fe715b219b1a3423b863418794 sched/deadline: Fix comment in enqueue_task_dl() The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups Thanks, Ingo --> Andrea Parri (2): sched/deadline: Fix comment in push_dl_tasks() sched/deadline: Fix comment in enqueue_task_dl() Aravind Gopalakrishnan (1): sched/numa: Fix NUMA_DIRECT topology identification Boqun Feng (1): sched/fair: Clean up the __sched_period() code Byungchul Park (2): sched/fair: Fix a comment reflecting function name change sched: Ensure a task has a non-normalized vruntime when returning back to CFS Chris Metcalf (2): sched, tile: Remove finish_arch_switch tile: Reorganize _switch_to() Ingo Molnar (1): sched, sparc32: Update scheduler comments in copy_thread() Konstantin Khlebnikov (3): sched/preempt, xen: Use need_resched() instead of should_resched() sched/preempt, powerpc, kvm: Use need_resched() instead of should_resched() sched/preempt: Fix cond_resched_lock() and cond_resched_softirq() Lucas Stach (1): sched/idle: Move latency tracing stop/start calls deeper inside the idle loop Markus Elfring (1): sched, sysctl: Delete an unnecessary check before unregister_sysctl_table() Mike Galbraith (1): sched/fair: Beef up wake_wide() Oleg Nesterov (5): stop_machine: Move 'cpu_stopper_task' and 'stop_cpus_work' into 'struct cpu_stopper' stop_machine: Don't do for_each_cpu() twice in queue_stop_cpus_work() stop_machine: Unexport __stop_machine() stop_machine: Use 'cpu_stop_fn_t' where possible stop_machine: Remove cpu_stop_work's from list in cpu_stop_park() Peter Zijlstra (9): sched/cputime: Guarantee stime + utime == rtime sched: Introduce the 'trace_sched_waking' tracepoint sched, avr32: Remove finish_arch_switch() sched, score: Remove finish_arch_switch() sched, sh: Fold finish_arch_switch() into switch_to() sched: Remove finish_arch_switch() sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Make sched_class::set_cpus_allowed() unconditional sched: Change the sched_class::set_cpus_allowed() calling context Ralf Baechle (1): sched, MIPS: Get rid of finish_arch_switch() Srikar Dronamraju (2): sched/numa: Prefer NUMA hotness over cache hotness sched/numa: Consider 'imbalance_pct' when comparing loads in numa_has_capacity() Vincent Guittot (1): sched/fair: Implement update_blocked_averages() for CONFIG_FAIR_GROUP_SCHED=n Will Deacon (1): sched, arm: Remove finish_arch_switch() Xunlei Pang (2): sched/rt: Remove a redundant condition from task_woken_rt() sched/deadline: Remove a redundant condition from task_woken_dl() Yuyang Du (7): sched/fair: Avoid pulling all tasks in idle balancing sched/fair: Remove rq's runnable avg sched/fair: Rewrite runnable load and utilization average tracking sched/fair: Init cfs_rq's sched_entity load average sched/fair: Remove task and group entity load when they are dead sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Clean up load average references bseg...@google.com (1): sched/numa: Check sched_feat(NUMA) in migrate_improves_locality() arch/arm/include/asm/switch_to.h | 5 +- arch/avr32/include/asm/switch_to.h | 7 +- arch/mips/include/asm/switch_to.h | 48 +- arch/powerpc/kvm/book3s_hv.c | 2 +- arch/score/include/asm/switch_to.h | 2 - arch/sh/include/asm/switch_to_32.h | 8 +- arch/sparc/kernel/process_32.c | 10 +-