Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 11.04.2018 um 19:00 schrieb Vincent Guittot: > Hi Heiner, > > On 9 April 2018 at 19:33, Heiner Kallweitwrote: >> Am 06.04.2018 um 18:03 schrieb Vincent Guittot: >>> Hi Heiner, >>> >>> On 30 March 2018 at 10:37, Heiner Kallweit wrote: Am 30.03.2018 um 08:50 schrieb Vincent Guittot: > On 29 March 2018 at 19:40, Heiner Kallweit wrote: >> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: > >>> >>> I'm finally not so sure that i have the right set up to reproduce the >>> problem as I haven't been able to reproduce it since. >>> >>> Heiner, >>> >>> How fast the problem happens on your board ? >>> Are you doing anything specific on the console that trigger the problem >>> ? >>> >> Hi Vincent, >> >> the lag when working on the console is constantly there, the "rcu_preempt >> detected stalls" happens after several hours (so far always within 24h) >> w/o any triggering event I would be aware of. It occured also when the >> system was idle at that point in time. > > Ok, so I don't have the problem on my hikey as the console never lag > on my setup. > > Can you send me the config of your kernel ? I'd like to check if you > have enable something that could trigger such problem > Sure, he we go. I also add a system log. >>> >>> Thanks for the config. I have used it for my setup but I can't >>> reproduce your regression. My platforms stay stable so I probably >>> missing something. Are you facing similar problem with other platforms >>> or only this celeron based platform ? >>> >>> I have reviewed the code but don't see any obvious place in the patch >>> that can generate the problem. Nevertheless, would you mind to try the >>> patch below ? It's a blind test to try to narrow the problem. >>> >>> Thanks >>> >> Hi Vincent, >> >> I tried again with today's linux-next and it's much better. The lag isn't >> completely gone but it's much less annoying. Every ~30 secs the console >> hangs for about half a second, that's much less frequent than before. > > That's interesting because nothing related to commit > 31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently > AFAICT > >> >> I saw some patches from Rafael have been merged in the last days. >> Maybe they improved the situation. > > Yes, Peter mentions in another thread that lastest Rafael's patches > avoid stopping tick when entering short idle thus reducing the time to > enter idle. commit 31e77 is adding some background activity when > entering idle so it can be that we take too much time > > You also mentioned that the CPU was relatively slow on the platform. > Can you try to use cpufreq performance governor instead of ondemand ? > The system uses intel_pstate scaling driver, so only powersave and performance are available. Min/max frequency are 600MHz / 1100 MHz. I didn't really notice a difference when switching between both modes. Regards, Heiner > I'm also going to prepare a patch for adding some trace in the code to > highlight the problem > > Thanks, > Vincent > >> >> Regards, Heiner >> > > [snip] > >> >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 11.04.2018 um 19:00 schrieb Vincent Guittot: > Hi Heiner, > > On 9 April 2018 at 19:33, Heiner Kallweit wrote: >> Am 06.04.2018 um 18:03 schrieb Vincent Guittot: >>> Hi Heiner, >>> >>> On 30 March 2018 at 10:37, Heiner Kallweit wrote: Am 30.03.2018 um 08:50 schrieb Vincent Guittot: > On 29 March 2018 at 19:40, Heiner Kallweit wrote: >> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: > >>> >>> I'm finally not so sure that i have the right set up to reproduce the >>> problem as I haven't been able to reproduce it since. >>> >>> Heiner, >>> >>> How fast the problem happens on your board ? >>> Are you doing anything specific on the console that trigger the problem >>> ? >>> >> Hi Vincent, >> >> the lag when working on the console is constantly there, the "rcu_preempt >> detected stalls" happens after several hours (so far always within 24h) >> w/o any triggering event I would be aware of. It occured also when the >> system was idle at that point in time. > > Ok, so I don't have the problem on my hikey as the console never lag > on my setup. > > Can you send me the config of your kernel ? I'd like to check if you > have enable something that could trigger such problem > Sure, he we go. I also add a system log. >>> >>> Thanks for the config. I have used it for my setup but I can't >>> reproduce your regression. My platforms stay stable so I probably >>> missing something. Are you facing similar problem with other platforms >>> or only this celeron based platform ? >>> >>> I have reviewed the code but don't see any obvious place in the patch >>> that can generate the problem. Nevertheless, would you mind to try the >>> patch below ? It's a blind test to try to narrow the problem. >>> >>> Thanks >>> >> Hi Vincent, >> >> I tried again with today's linux-next and it's much better. The lag isn't >> completely gone but it's much less annoying. Every ~30 secs the console >> hangs for about half a second, that's much less frequent than before. > > That's interesting because nothing related to commit > 31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently > AFAICT > >> >> I saw some patches from Rafael have been merged in the last days. >> Maybe they improved the situation. > > Yes, Peter mentions in another thread that lastest Rafael's patches > avoid stopping tick when entering short idle thus reducing the time to > enter idle. commit 31e77 is adding some background activity when > entering idle so it can be that we take too much time > > You also mentioned that the CPU was relatively slow on the platform. > Can you try to use cpufreq performance governor instead of ondemand ? > The system uses intel_pstate scaling driver, so only powersave and performance are available. Min/max frequency are 600MHz / 1100 MHz. I didn't really notice a difference when switching between both modes. Regards, Heiner > I'm also going to prepare a patch for adding some trace in the code to > highlight the problem > > Thanks, > Vincent > >> >> Regards, Heiner >> > > [snip] > >> >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi Heiner, On 9 April 2018 at 19:33, Heiner Kallweitwrote: > Am 06.04.2018 um 18:03 schrieb Vincent Guittot: >> Hi Heiner, >> >> On 30 March 2018 at 10:37, Heiner Kallweit wrote: >>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: On 29 March 2018 at 19:40, Heiner Kallweit wrote: > Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >> >> I'm finally not so sure that i have the right set up to reproduce the >> problem as I haven't been able to reproduce it since. >> >> Heiner, >> >> How fast the problem happens on your board ? >> Are you doing anything specific on the console that trigger the problem ? >> > Hi Vincent, > > the lag when working on the console is constantly there, the "rcu_preempt > detected stalls" happens after several hours (so far always within 24h) > w/o any triggering event I would be aware of. It occured also when the > system was idle at that point in time. Ok, so I don't have the problem on my hikey as the console never lag on my setup. Can you send me the config of your kernel ? I'd like to check if you have enable something that could trigger such problem >>> Sure, he we go. I also add a system log. >> >> Thanks for the config. I have used it for my setup but I can't >> reproduce your regression. My platforms stay stable so I probably >> missing something. Are you facing similar problem with other platforms >> or only this celeron based platform ? >> >> I have reviewed the code but don't see any obvious place in the patch >> that can generate the problem. Nevertheless, would you mind to try the >> patch below ? It's a blind test to try to narrow the problem. >> >> Thanks >> > Hi Vincent, > > I tried again with today's linux-next and it's much better. The lag isn't > completely gone but it's much less annoying. Every ~30 secs the console > hangs for about half a second, that's much less frequent than before. That's interesting because nothing related to commit 31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently AFAICT > > I saw some patches from Rafael have been merged in the last days. > Maybe they improved the situation. Yes, Peter mentions in another thread that lastest Rafael's patches avoid stopping tick when entering short idle thus reducing the time to enter idle. commit 31e77 is adding some background activity when entering idle so it can be that we take too much time You also mentioned that the CPU was relatively slow on the platform. Can you try to use cpufreq performance governor instead of ondemand ? I'm also going to prepare a patch for adding some trace in the code to highlight the problem Thanks, Vincent > > Regards, Heiner > [snip] >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi Heiner, On 9 April 2018 at 19:33, Heiner Kallweit wrote: > Am 06.04.2018 um 18:03 schrieb Vincent Guittot: >> Hi Heiner, >> >> On 30 March 2018 at 10:37, Heiner Kallweit wrote: >>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: On 29 March 2018 at 19:40, Heiner Kallweit wrote: > Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >> >> I'm finally not so sure that i have the right set up to reproduce the >> problem as I haven't been able to reproduce it since. >> >> Heiner, >> >> How fast the problem happens on your board ? >> Are you doing anything specific on the console that trigger the problem ? >> > Hi Vincent, > > the lag when working on the console is constantly there, the "rcu_preempt > detected stalls" happens after several hours (so far always within 24h) > w/o any triggering event I would be aware of. It occured also when the > system was idle at that point in time. Ok, so I don't have the problem on my hikey as the console never lag on my setup. Can you send me the config of your kernel ? I'd like to check if you have enable something that could trigger such problem >>> Sure, he we go. I also add a system log. >> >> Thanks for the config. I have used it for my setup but I can't >> reproduce your regression. My platforms stay stable so I probably >> missing something. Are you facing similar problem with other platforms >> or only this celeron based platform ? >> >> I have reviewed the code but don't see any obvious place in the patch >> that can generate the problem. Nevertheless, would you mind to try the >> patch below ? It's a blind test to try to narrow the problem. >> >> Thanks >> > Hi Vincent, > > I tried again with today's linux-next and it's much better. The lag isn't > completely gone but it's much less annoying. Every ~30 secs the console > hangs for about half a second, that's much less frequent than before. That's interesting because nothing related to commit 31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently AFAICT > > I saw some patches from Rafael have been merged in the last days. > Maybe they improved the situation. Yes, Peter mentions in another thread that lastest Rafael's patches avoid stopping tick when entering short idle thus reducing the time to enter idle. commit 31e77 is adding some background activity when entering idle so it can be that we take too much time You also mentioned that the CPU was relatively slow on the platform. Can you try to use cpufreq performance governor instead of ondemand ? I'm also going to prepare a patch for adding some trace in the code to highlight the problem Thanks, Vincent > > Regards, Heiner > [snip] >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 06.04.2018 um 18:03 schrieb Vincent Guittot: > Hi Heiner, > > On 30 March 2018 at 10:37, Heiner Kallweitwrote: >> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >>> On 29 March 2018 at 19:40, Heiner Kallweit wrote: Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >>> > > I'm finally not so sure that i have the right set up to reproduce the > problem as I haven't been able to reproduce it since. > > Heiner, > > How fast the problem happens on your board ? > Are you doing anything specific on the console that trigger the problem ? > Hi Vincent, the lag when working on the console is constantly there, the "rcu_preempt detected stalls" happens after several hours (so far always within 24h) w/o any triggering event I would be aware of. It occured also when the system was idle at that point in time. >>> >>> Ok, so I don't have the problem on my hikey as the console never lag >>> on my setup. >>> >>> Can you send me the config of your kernel ? I'd like to check if you >>> have enable something that could trigger such problem >>> >> Sure, he we go. I also add a system log. > > Thanks for the config. I have used it for my setup but I can't > reproduce your regression. My platforms stay stable so I probably > missing something. Are you facing similar problem with other platforms > or only this celeron based platform ? > > I have reviewed the code but don't see any obvious place in the patch > that can generate the problem. Nevertheless, would you mind to try the > patch below ? It's a blind test to try to narrow the problem. > > Thanks > Hi Vincent, I tried again with today's linux-next and it's much better. The lag isn't completely gone but it's much less annoying. Every ~30 secs the console hangs for about half a second, that's much less frequent than before. I saw some patches from Rafael have been merged in the last days. Maybe they improved the situation. Regards, Heiner > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..e9835f2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, > struct rq_flags *rf) > sd = rcu_dereference_check_sched_domain(this_rq->sd); > if (sd) > update_next_balance(sd, _balance); > - rcu_read_unlock(); > > nohz_newidle_balance(this_rq); > + rcu_read_unlock(); > > goto out; > } >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 06.04.2018 um 18:03 schrieb Vincent Guittot: > Hi Heiner, > > On 30 March 2018 at 10:37, Heiner Kallweit wrote: >> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >>> On 29 March 2018 at 19:40, Heiner Kallweit wrote: Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >>> > > I'm finally not so sure that i have the right set up to reproduce the > problem as I haven't been able to reproduce it since. > > Heiner, > > How fast the problem happens on your board ? > Are you doing anything specific on the console that trigger the problem ? > Hi Vincent, the lag when working on the console is constantly there, the "rcu_preempt detected stalls" happens after several hours (so far always within 24h) w/o any triggering event I would be aware of. It occured also when the system was idle at that point in time. >>> >>> Ok, so I don't have the problem on my hikey as the console never lag >>> on my setup. >>> >>> Can you send me the config of your kernel ? I'd like to check if you >>> have enable something that could trigger such problem >>> >> Sure, he we go. I also add a system log. > > Thanks for the config. I have used it for my setup but I can't > reproduce your regression. My platforms stay stable so I probably > missing something. Are you facing similar problem with other platforms > or only this celeron based platform ? > > I have reviewed the code but don't see any obvious place in the patch > that can generate the problem. Nevertheless, would you mind to try the > patch below ? It's a blind test to try to narrow the problem. > > Thanks > Hi Vincent, I tried again with today's linux-next and it's much better. The lag isn't completely gone but it's much less annoying. Every ~30 secs the console hangs for about half a second, that's much less frequent than before. I saw some patches from Rafael have been merged in the last days. Maybe they improved the situation. Regards, Heiner > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..e9835f2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, > struct rq_flags *rf) > sd = rcu_dereference_check_sched_domain(this_rq->sd); > if (sd) > update_next_balance(sd, _balance); > - rcu_read_unlock(); > > nohz_newidle_balance(this_rq); > + rcu_read_unlock(); > > goto out; > } >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 06.04.2018 um 18:03 schrieb Vincent Guittot: > Hi Heiner, > > On 30 March 2018 at 10:37, Heiner Kallweitwrote: >> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >>> On 29 March 2018 at 19:40, Heiner Kallweit wrote: Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >>> > > I'm finally not so sure that i have the right set up to reproduce the > problem as I haven't been able to reproduce it since. > > Heiner, > > How fast the problem happens on your board ? > Are you doing anything specific on the console that trigger the problem ? > Hi Vincent, the lag when working on the console is constantly there, the "rcu_preempt detected stalls" happens after several hours (so far always within 24h) w/o any triggering event I would be aware of. It occured also when the system was idle at that point in time. >>> >>> Ok, so I don't have the problem on my hikey as the console never lag >>> on my setup. >>> >>> Can you send me the config of your kernel ? I'd like to check if you >>> have enable something that could trigger such problem >>> >> Sure, he we go. I also add a system log. > > Thanks for the config. I have used it for my setup but I can't > reproduce your regression. My platforms stay stable so I probably > missing something. Are you facing similar problem with other platforms > or only this celeron based platform ? > Really appreciate your efforts. Latest linux-next works fine on a Odroid-C2 (arm64, 4 cores) I have. So the issue may be dual-core and/or platform-specific. Another possibility could be that it occurs only on relatively slow CPU's like this Celeron. > I have reviewed the code but don't see any obvious place in the patch > that can generate the problem. Nevertheless, would you mind to try the > patch below ? It's a blind test to try to narrow the problem. > I tried your patch, system behavior (with the lagging console) is as before. Regards, Heiner > Thanks > > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..e9835f2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, > struct rq_flags *rf) > sd = rcu_dereference_check_sched_domain(this_rq->sd); > if (sd) > update_next_balance(sd, _balance); > - rcu_read_unlock(); > > nohz_newidle_balance(this_rq); > + rcu_read_unlock(); > > goto out; > } >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 06.04.2018 um 18:03 schrieb Vincent Guittot: > Hi Heiner, > > On 30 March 2018 at 10:37, Heiner Kallweit wrote: >> Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >>> On 29 March 2018 at 19:40, Heiner Kallweit wrote: Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >>> > > I'm finally not so sure that i have the right set up to reproduce the > problem as I haven't been able to reproduce it since. > > Heiner, > > How fast the problem happens on your board ? > Are you doing anything specific on the console that trigger the problem ? > Hi Vincent, the lag when working on the console is constantly there, the "rcu_preempt detected stalls" happens after several hours (so far always within 24h) w/o any triggering event I would be aware of. It occured also when the system was idle at that point in time. >>> >>> Ok, so I don't have the problem on my hikey as the console never lag >>> on my setup. >>> >>> Can you send me the config of your kernel ? I'd like to check if you >>> have enable something that could trigger such problem >>> >> Sure, he we go. I also add a system log. > > Thanks for the config. I have used it for my setup but I can't > reproduce your regression. My platforms stay stable so I probably > missing something. Are you facing similar problem with other platforms > or only this celeron based platform ? > Really appreciate your efforts. Latest linux-next works fine on a Odroid-C2 (arm64, 4 cores) I have. So the issue may be dual-core and/or platform-specific. Another possibility could be that it occurs only on relatively slow CPU's like this Celeron. > I have reviewed the code but don't see any obvious place in the patch > that can generate the problem. Nevertheless, would you mind to try the > patch below ? It's a blind test to try to narrow the problem. > I tried your patch, system behavior (with the lagging console) is as before. Regards, Heiner > Thanks > > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0951d1c..e9835f2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, > struct rq_flags *rf) > sd = rcu_dereference_check_sched_domain(this_rq->sd); > if (sd) > update_next_balance(sd, _balance); > - rcu_read_unlock(); > > nohz_newidle_balance(this_rq); > + rcu_read_unlock(); > > goto out; > } >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi Heiner, On 30 March 2018 at 10:37, Heiner Kallweitwrote: > Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >> On 29 March 2018 at 19:40, Heiner Kallweit wrote: >>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >> I'm finally not so sure that i have the right set up to reproduce the problem as I haven't been able to reproduce it since. Heiner, How fast the problem happens on your board ? Are you doing anything specific on the console that trigger the problem ? >>> Hi Vincent, >>> >>> the lag when working on the console is constantly there, the "rcu_preempt >>> detected stalls" happens after several hours (so far always within 24h) >>> w/o any triggering event I would be aware of. It occured also when the >>> system was idle at that point in time. >> >> Ok, so I don't have the problem on my hikey as the console never lag >> on my setup. >> >> Can you send me the config of your kernel ? I'd like to check if you >> have enable something that could trigger such problem >> > Sure, he we go. I also add a system log. Thanks for the config. I have used it for my setup but I can't reproduce your regression. My platforms stay stable so I probably missing something. Are you facing similar problem with other platforms or only this celeron based platform ? I have reviewed the code but don't see any obvious place in the patch that can generate the problem. Nevertheless, would you mind to try the patch below ? It's a blind test to try to narrow the problem. Thanks --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0951d1c..e9835f2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf) sd = rcu_dereference_check_sched_domain(this_rq->sd); if (sd) update_next_balance(sd, _balance); - rcu_read_unlock(); nohz_newidle_balance(this_rq); + rcu_read_unlock(); goto out; } -- 2.7.4 > > # > # Automatically generated file; DO NOT EDIT. > # Linux/x86 4.16.0-rc7 Kernel Configuration > # [snip]
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi Heiner, On 30 March 2018 at 10:37, Heiner Kallweit wrote: > Am 30.03.2018 um 08:50 schrieb Vincent Guittot: >> On 29 March 2018 at 19:40, Heiner Kallweit wrote: >>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >> I'm finally not so sure that i have the right set up to reproduce the problem as I haven't been able to reproduce it since. Heiner, How fast the problem happens on your board ? Are you doing anything specific on the console that trigger the problem ? >>> Hi Vincent, >>> >>> the lag when working on the console is constantly there, the "rcu_preempt >>> detected stalls" happens after several hours (so far always within 24h) >>> w/o any triggering event I would be aware of. It occured also when the >>> system was idle at that point in time. >> >> Ok, so I don't have the problem on my hikey as the console never lag >> on my setup. >> >> Can you send me the config of your kernel ? I'd like to check if you >> have enable something that could trigger such problem >> > Sure, he we go. I also add a system log. Thanks for the config. I have used it for my setup but I can't reproduce your regression. My platforms stay stable so I probably missing something. Are you facing similar problem with other platforms or only this celeron based platform ? I have reviewed the code but don't see any obvious place in the patch that can generate the problem. Nevertheless, would you mind to try the patch below ? It's a blind test to try to narrow the problem. Thanks --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0951d1c..e9835f2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf) sd = rcu_dereference_check_sched_domain(this_rq->sd); if (sd) update_next_balance(sd, _balance); - rcu_read_unlock(); nohz_newidle_balance(this_rq); + rcu_read_unlock(); goto out; } -- 2.7.4 > > # > # Automatically generated file; DO NOT EDIT. > # Linux/x86 4.16.0-rc7 Kernel Configuration > # [snip]
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 30.03.2018 um 08:50 schrieb Vincent Guittot: > On 29 March 2018 at 19:40, Heiner Kallweitwrote: >> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: > >>> >>> I'm finally not so sure that i have the right set up to reproduce the >>> problem as I haven't been able to reproduce it since. >>> >>> Heiner, >>> >>> How fast the problem happens on your board ? >>> Are you doing anything specific on the console that trigger the problem ? >>> >> Hi Vincent, >> >> the lag when working on the console is constantly there, the "rcu_preempt >> detected stalls" happens after several hours (so far always within 24h) >> w/o any triggering event I would be aware of. It occured also when the >> system was idle at that point in time. > > Ok, so I don't have the problem on my hikey as the console never lag > on my setup. > > Can you send me the config of your kernel ? I'd like to check if you > have enable something that could trigger such problem > Sure, he we go. I also add a system log. # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.16.0-rc7 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_INTEL_TXT=y CONFIG_X86_64_SMP=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=m CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_ARCH_SUPPORTS_INT128=y # CONFIG_NUMA_BALANCING is not set CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set CONFIG_CGROUP_SCHED=y
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 30.03.2018 um 08:50 schrieb Vincent Guittot: > On 29 March 2018 at 19:40, Heiner Kallweit wrote: >> Am 29.03.2018 um 09:41 schrieb Vincent Guittot: > >>> >>> I'm finally not so sure that i have the right set up to reproduce the >>> problem as I haven't been able to reproduce it since. >>> >>> Heiner, >>> >>> How fast the problem happens on your board ? >>> Are you doing anything specific on the console that trigger the problem ? >>> >> Hi Vincent, >> >> the lag when working on the console is constantly there, the "rcu_preempt >> detected stalls" happens after several hours (so far always within 24h) >> w/o any triggering event I would be aware of. It occured also when the >> system was idle at that point in time. > > Ok, so I don't have the problem on my hikey as the console never lag > on my setup. > > Can you send me the config of your kernel ? I'd like to check if you > have enable something that could trigger such problem > Sure, he we go. I also add a system log. # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.16.0-rc7 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_INTEL_TXT=y CONFIG_X86_64_SMP=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=4 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=m CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_ARCH_SUPPORTS_INT128=y # CONFIG_NUMA_BALANCING is not set CONFIG_CGROUPS=y # CONFIG_MEMCG is not set # CONFIG_BLK_CGROUP is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y #
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
On 29 March 2018 at 19:40, Heiner Kallweitwrote: > Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >> >> I'm finally not so sure that i have the right set up to reproduce the >> problem as I haven't been able to reproduce it since. >> >> Heiner, >> >> How fast the problem happens on your board ? >> Are you doing anything specific on the console that trigger the problem ? >> > Hi Vincent, > > the lag when working on the console is constantly there, the "rcu_preempt > detected stalls" happens after several hours (so far always within 24h) > w/o any triggering event I would be aware of. It occured also when the > system was idle at that point in time. Ok, so I don't have the problem on my hikey as the console never lag on my setup. Can you send me the config of your kernel ? I'd like to check if you have enable something that could trigger such problem Thanks, Vincent > > Rgds, Heiner > >> Regards, >> Vincent >> >>> Bisecting the issue resulted in: >>> >>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit >>> commit 31e77c93e432dec79c7d90b888bbfc3652592741 >>> Author: Vincent Guittot >>> Date: Wed Feb 14 16:26:46 2018 +0100 >>> >>> sched/fair: Update blocked load when newly idle >>> >>> When NEWLY_IDLE load balance is not triggered, we might need to >>> update the >>> blocked load anyway. We can kick an ilb so an idle CPU will take >>> care of >>> updating blocked load or we can try to update them locally before >>> entering >>> idle. In the latter case, we reuse part of the nohz_idle_balance. >>> >>> After reversing this commit at least the issue with the freezing console >>> is gone. The second one appeared only sporadically, I still have to see >>> whether it pops up again. [...] >> >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
On 29 March 2018 at 19:40, Heiner Kallweit wrote: > Am 29.03.2018 um 09:41 schrieb Vincent Guittot: >> >> I'm finally not so sure that i have the right set up to reproduce the >> problem as I haven't been able to reproduce it since. >> >> Heiner, >> >> How fast the problem happens on your board ? >> Are you doing anything specific on the console that trigger the problem ? >> > Hi Vincent, > > the lag when working on the console is constantly there, the "rcu_preempt > detected stalls" happens after several hours (so far always within 24h) > w/o any triggering event I would be aware of. It occured also when the > system was idle at that point in time. Ok, so I don't have the problem on my hikey as the console never lag on my setup. Can you send me the config of your kernel ? I'd like to check if you have enable something that could trigger such problem Thanks, Vincent > > Rgds, Heiner > >> Regards, >> Vincent >> >>> Bisecting the issue resulted in: >>> >>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit >>> commit 31e77c93e432dec79c7d90b888bbfc3652592741 >>> Author: Vincent Guittot >>> Date: Wed Feb 14 16:26:46 2018 +0100 >>> >>> sched/fair: Update blocked load when newly idle >>> >>> When NEWLY_IDLE load balance is not triggered, we might need to >>> update the >>> blocked load anyway. We can kick an ilb so an idle CPU will take >>> care of >>> updating blocked load or we can try to update them locally before >>> entering >>> idle. In the latter case, we reuse part of the nohz_idle_balance. >>> >>> After reversing this commit at least the issue with the freezing console >>> is gone. The second one appeared only sporadically, I still have to see >>> whether it pops up again. [...] >> >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 29.03.2018 um 09:41 schrieb Vincent Guittot: > On 28 March 2018 at 16:01, Vincent Guittotwrote: >> Hi, >> >> On 28 March 2018 at 12:37, Dietmar Eggemann wrote: >>> Hi, >>> >>> On 03/24/2018 01:47 PM, Heiner Kallweit wrote: Am 24.03.2018 um 07:46 schrieb Vincent Guittot: > > Hi Heiner, > > Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : >> >> Recently I started to get the following problems with linux-next: >> >> - When working via Putty/SSH on the system the console frequently >> freezes >>for few seconds. Sometimes only opening a second console makes the >>first one react again. >> >> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as >>described in [1]. >> >>> >>> I can't catch this issue on my Juno r0 (arm64 big.Little). >>> >>> root@juno:~# uname -r >>> 4.16.0-rc4-00198-g31e77c93e432 >>> >>> I'm using openssh-client and openssh-server though. >> >> I think that I have finally been able to reproduce it on my hikey >> (octo cortex-A53) after unplugging 6 cores and waiting for almost 2 >> hours >> This seems to happen only on dual core system as I haven't faced that >> before on the hikey which I have used for my tests >> > > I'm finally not so sure that i have the right set up to reproduce the > problem as I haven't been able to reproduce it since. > > Heiner, > > How fast the problem happens on your board ? > Are you doing anything specific on the console that trigger the problem ? > Hi Vincent, the lag when working on the console is constantly there, the "rcu_preempt detected stalls" happens after several hours (so far always within 24h) w/o any triggering event I would be aware of. It occured also when the system was idle at that point in time. Rgds, Heiner > Regards, > Vincent > >>> >> Bisecting the issue resulted in: >> >> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit >> commit 31e77c93e432dec79c7d90b888bbfc3652592741 >> Author: Vincent Guittot >> Date: Wed Feb 14 16:26:46 2018 +0100 >> >> sched/fair: Update blocked load when newly idle >> >> When NEWLY_IDLE load balance is not triggered, we might need to >> update the >> blocked load anyway. We can kick an ilb so an idle CPU will take >> care of >> updating blocked load or we can try to update them locally before >> entering >> idle. In the latter case, we reuse part of the nohz_idle_balance. >> >> After reversing this commit at least the issue with the freezing console >> is gone. The second one appeared only sporadically, I still have to see >> whether it pops up again. >>> >>> >>> [...] >>> >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 29.03.2018 um 09:41 schrieb Vincent Guittot: > On 28 March 2018 at 16:01, Vincent Guittot wrote: >> Hi, >> >> On 28 March 2018 at 12:37, Dietmar Eggemann wrote: >>> Hi, >>> >>> On 03/24/2018 01:47 PM, Heiner Kallweit wrote: Am 24.03.2018 um 07:46 schrieb Vincent Guittot: > > Hi Heiner, > > Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : >> >> Recently I started to get the following problems with linux-next: >> >> - When working via Putty/SSH on the system the console frequently >> freezes >>for few seconds. Sometimes only opening a second console makes the >>first one react again. >> >> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as >>described in [1]. >> >>> >>> I can't catch this issue on my Juno r0 (arm64 big.Little). >>> >>> root@juno:~# uname -r >>> 4.16.0-rc4-00198-g31e77c93e432 >>> >>> I'm using openssh-client and openssh-server though. >> >> I think that I have finally been able to reproduce it on my hikey >> (octo cortex-A53) after unplugging 6 cores and waiting for almost 2 >> hours >> This seems to happen only on dual core system as I haven't faced that >> before on the hikey which I have used for my tests >> > > I'm finally not so sure that i have the right set up to reproduce the > problem as I haven't been able to reproduce it since. > > Heiner, > > How fast the problem happens on your board ? > Are you doing anything specific on the console that trigger the problem ? > Hi Vincent, the lag when working on the console is constantly there, the "rcu_preempt detected stalls" happens after several hours (so far always within 24h) w/o any triggering event I would be aware of. It occured also when the system was idle at that point in time. Rgds, Heiner > Regards, > Vincent > >>> >> Bisecting the issue resulted in: >> >> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit >> commit 31e77c93e432dec79c7d90b888bbfc3652592741 >> Author: Vincent Guittot >> Date: Wed Feb 14 16:26:46 2018 +0100 >> >> sched/fair: Update blocked load when newly idle >> >> When NEWLY_IDLE load balance is not triggered, we might need to >> update the >> blocked load anyway. We can kick an ilb so an idle CPU will take >> care of >> updating blocked load or we can try to update them locally before >> entering >> idle. In the latter case, we reuse part of the nohz_idle_balance. >> >> After reversing this commit at least the issue with the freezing console >> is gone. The second one appeared only sporadically, I still have to see >> whether it pops up again. >>> >>> >>> [...] >>> >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
On 28 March 2018 at 16:01, Vincent Guittotwrote: > Hi, > > On 28 March 2018 at 12:37, Dietmar Eggemann wrote: >> Hi, >> >> On 03/24/2018 01:47 PM, Heiner Kallweit wrote: >>> >>> Am 24.03.2018 um 07:46 schrieb Vincent Guittot: Hi Heiner, Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : > > Recently I started to get the following problems with linux-next: > > - When working via Putty/SSH on the system the console frequently > freezes >for few seconds. Sometimes only opening a second console makes the >first one react again. > > - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as >described in [1]. > >> >> I can't catch this issue on my Juno r0 (arm64 big.Little). >> >> root@juno:~# uname -r >> 4.16.0-rc4-00198-g31e77c93e432 >> >> I'm using openssh-client and openssh-server though. > > I think that I have finally been able to reproduce it on my hikey > (octo cortex-A53) after unplugging 6 cores and waiting for almost 2 > hours > This seems to happen only on dual core system as I haven't faced that > before on the hikey which I have used for my tests > I'm finally not so sure that i have the right set up to reproduce the problem as I haven't been able to reproduce it since. Heiner, How fast the problem happens on your board ? Are you doing anything specific on the console that trigger the problem ? Regards, Vincent >> > Bisecting the issue resulted in: > > 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit > commit 31e77c93e432dec79c7d90b888bbfc3652592741 > Author: Vincent Guittot > Date: Wed Feb 14 16:26:46 2018 +0100 > > sched/fair: Update blocked load when newly idle > > When NEWLY_IDLE load balance is not triggered, we might need to > update the > blocked load anyway. We can kick an ilb so an idle CPU will take > care of > updating blocked load or we can try to update them locally before > entering > idle. In the latter case, we reuse part of the nohz_idle_balance. > > After reversing this commit at least the issue with the freezing console > is gone. The second one appeared only sporadically, I still have to see > whether it pops up again. >> >> >> [...] >>
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
On 28 March 2018 at 16:01, Vincent Guittot wrote: > Hi, > > On 28 March 2018 at 12:37, Dietmar Eggemann wrote: >> Hi, >> >> On 03/24/2018 01:47 PM, Heiner Kallweit wrote: >>> >>> Am 24.03.2018 um 07:46 schrieb Vincent Guittot: Hi Heiner, Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : > > Recently I started to get the following problems with linux-next: > > - When working via Putty/SSH on the system the console frequently > freezes >for few seconds. Sometimes only opening a second console makes the >first one react again. > > - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as >described in [1]. > >> >> I can't catch this issue on my Juno r0 (arm64 big.Little). >> >> root@juno:~# uname -r >> 4.16.0-rc4-00198-g31e77c93e432 >> >> I'm using openssh-client and openssh-server though. > > I think that I have finally been able to reproduce it on my hikey > (octo cortex-A53) after unplugging 6 cores and waiting for almost 2 > hours > This seems to happen only on dual core system as I haven't faced that > before on the hikey which I have used for my tests > I'm finally not so sure that i have the right set up to reproduce the problem as I haven't been able to reproduce it since. Heiner, How fast the problem happens on your board ? Are you doing anything specific on the console that trigger the problem ? Regards, Vincent >> > Bisecting the issue resulted in: > > 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit > commit 31e77c93e432dec79c7d90b888bbfc3652592741 > Author: Vincent Guittot > Date: Wed Feb 14 16:26:46 2018 +0100 > > sched/fair: Update blocked load when newly idle > > When NEWLY_IDLE load balance is not triggered, we might need to > update the > blocked load anyway. We can kick an ilb so an idle CPU will take > care of > updating blocked load or we can try to update them locally before > entering > idle. In the latter case, we reuse part of the nohz_idle_balance. > > After reversing this commit at least the issue with the freezing console > is gone. The second one appeared only sporadically, I still have to see > whether it pops up again. >> >> >> [...] >>
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi, On 28 March 2018 at 12:37, Dietmar Eggemannwrote: > Hi, > > On 03/24/2018 01:47 PM, Heiner Kallweit wrote: >> >> Am 24.03.2018 um 07:46 schrieb Vincent Guittot: >>> >>> Hi Heiner, >>> >>> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : Recently I started to get the following problems with linux-next: - When working via Putty/SSH on the system the console frequently freezes for few seconds. Sometimes only opening a second console makes the first one react again. - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as described in [1]. > > I can't catch this issue on my Juno r0 (arm64 big.Little). > > root@juno:~# uname -r > 4.16.0-rc4-00198-g31e77c93e432 > > I'm using openssh-client and openssh-server though. I think that I have finally been able to reproduce it on my hikey (octo cortex-A53) after unplugging 6 cores and waiting for almost 2 hours This seems to happen only on dual core system as I haven't faced that before on the hikey which I have used for my tests [ 191.365730] CPU2: shutdown [ 191.368482] psci: CPU2 killed. [ 195.601017] CPU3: shutdown [ 195.603767] psci: CPU3 killed. [ 199.037500] CPU4: shutdown [ 199.040251] psci: CPU4 killed. [ 201.813237] CPU5: shutdown [ 201.815996] psci: CPU5 killed. [ 204.624902] CPU6: shutdown [ 204.627646] psci: CPU6 killed. [ 207.652478] CPU7: shutdown [ 207.655204] psci: CPU7 killed. [ 6017.160463] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 6017.166151] 1-...!: (4 GPs behind) idle=e20/0/0 softirq=10820/10864 fqs=0 [ 6017.173113] (detected by 0, t=20705 jiffies, g=1389, c=1388, q=27) [ 6017.179386] Task dump for CPU 1: [ 6017.182612] swapper/1 R running task0 0 1 0x [ 6017.189666] Call trace: [ 6017.192120] __switch_to+0x8c/0xd0 [ 6017.195524] cpuidle_enter_state+0x64/0x360 [ 6017.199706] cpuidle_enter+0x18/0x20 [ 6017.203282] call_cpuidle+0x18/0x30 [ 6017.206771] do_idle+0x1a4/0x1e0 [ 6017.20] cpu_startup_entry+0x20/0x28 [ 6017.213923] secondary_start_kernel+0x188/0x1c8 [ 6017.218457] rcu_preempt kthread starved for 20705 jiffies! g1389 c1388 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 6017.228985] rcu_preempt I0 8 2 0x [ 6017.234474] Call trace: [ 6017.236918] __switch_to+0x8c/0xd0 [ 6017.240322] __schedule+0x1b8/0x730 [ 6017.243810] schedule+0x38/0xa0 [ 6017.246952] schedule_timeout+0x194/0x428 [ 6017.250964] rcu_gp_kthread+0x4d4/0x780 [ 6017.254802] kthread+0xfc/0x128 [ 6017.257942] ret_from_fork+0x10/0x18 [ 6066.541736] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 6066.547428] 1-...!: (5 GPs behind) idle=e28/0/0 softirq=10820/10864 fqs=0 [ 6066.554392] (detected by 0, t=12345 jiffies, g=1390, c=1389, q=48) [ 6066.560666] Task dump for CPU 1: [ 6066.563893] swapper/1 R running task0 0 1 0x [ 6066.570948] Call trace: [ 6066.573404] __switch_to+0x8c/0xd0 [ 6066.576809] cpuidle_enter_state+0x64/0x360 [ 6066.580992] cpuidle_enter+0x18/0x20 [ 6066.584568] call_cpuidle+0x18/0x30 [ 6066.588056] do_idle+0x1a4/0x1e0 [ 6066.591284] cpu_startup_entry+0x20/0x28 [ 6066.595208] secondary_start_kernel+0x188/0x1c8 [ 6066.599742] rcu_preempt kthread starved for 12345 jiffies! g1390 c1389 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 6066.610270] rcu_preempt I0 8 2 0x [ 6066.615758] Call trace: [ 6066.618203] __switch_to+0x8c/0xd0 [ 6066.621607] __schedule+0x1b8/0x730 [ 6066.625095] schedule+0x38/0xa0 [ 6066.628236] schedule_timeout+0x194/0x428 [ 6066.632249] rcu_gp_kthread+0x4d4/0x780 [ 6066.636087] kthread+0xfc/0x128 > Bisecting the issue resulted in: 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit commit 31e77c93e432dec79c7d90b888bbfc3652592741 Author: Vincent Guittot Date: Wed Feb 14 16:26:46 2018 +0100 sched/fair: Update blocked load when newly idle When NEWLY_IDLE load balance is not triggered, we might need to update the blocked load anyway. We can kick an ilb so an idle CPU will take care of updating blocked load or we can try to update them locally before entering idle. In the latter case, we reuse part of the nohz_idle_balance. After reversing this commit at least the issue with the freezing console is gone. The second one appeared only sporadically, I still have to see whether it pops up again. > > > [...] >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi, On 28 March 2018 at 12:37, Dietmar Eggemann wrote: > Hi, > > On 03/24/2018 01:47 PM, Heiner Kallweit wrote: >> >> Am 24.03.2018 um 07:46 schrieb Vincent Guittot: >>> >>> Hi Heiner, >>> >>> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : Recently I started to get the following problems with linux-next: - When working via Putty/SSH on the system the console frequently freezes for few seconds. Sometimes only opening a second console makes the first one react again. - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as described in [1]. > > I can't catch this issue on my Juno r0 (arm64 big.Little). > > root@juno:~# uname -r > 4.16.0-rc4-00198-g31e77c93e432 > > I'm using openssh-client and openssh-server though. I think that I have finally been able to reproduce it on my hikey (octo cortex-A53) after unplugging 6 cores and waiting for almost 2 hours This seems to happen only on dual core system as I haven't faced that before on the hikey which I have used for my tests [ 191.365730] CPU2: shutdown [ 191.368482] psci: CPU2 killed. [ 195.601017] CPU3: shutdown [ 195.603767] psci: CPU3 killed. [ 199.037500] CPU4: shutdown [ 199.040251] psci: CPU4 killed. [ 201.813237] CPU5: shutdown [ 201.815996] psci: CPU5 killed. [ 204.624902] CPU6: shutdown [ 204.627646] psci: CPU6 killed. [ 207.652478] CPU7: shutdown [ 207.655204] psci: CPU7 killed. [ 6017.160463] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 6017.166151] 1-...!: (4 GPs behind) idle=e20/0/0 softirq=10820/10864 fqs=0 [ 6017.173113] (detected by 0, t=20705 jiffies, g=1389, c=1388, q=27) [ 6017.179386] Task dump for CPU 1: [ 6017.182612] swapper/1 R running task0 0 1 0x [ 6017.189666] Call trace: [ 6017.192120] __switch_to+0x8c/0xd0 [ 6017.195524] cpuidle_enter_state+0x64/0x360 [ 6017.199706] cpuidle_enter+0x18/0x20 [ 6017.203282] call_cpuidle+0x18/0x30 [ 6017.206771] do_idle+0x1a4/0x1e0 [ 6017.20] cpu_startup_entry+0x20/0x28 [ 6017.213923] secondary_start_kernel+0x188/0x1c8 [ 6017.218457] rcu_preempt kthread starved for 20705 jiffies! g1389 c1388 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 6017.228985] rcu_preempt I0 8 2 0x [ 6017.234474] Call trace: [ 6017.236918] __switch_to+0x8c/0xd0 [ 6017.240322] __schedule+0x1b8/0x730 [ 6017.243810] schedule+0x38/0xa0 [ 6017.246952] schedule_timeout+0x194/0x428 [ 6017.250964] rcu_gp_kthread+0x4d4/0x780 [ 6017.254802] kthread+0xfc/0x128 [ 6017.257942] ret_from_fork+0x10/0x18 [ 6066.541736] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 6066.547428] 1-...!: (5 GPs behind) idle=e28/0/0 softirq=10820/10864 fqs=0 [ 6066.554392] (detected by 0, t=12345 jiffies, g=1390, c=1389, q=48) [ 6066.560666] Task dump for CPU 1: [ 6066.563893] swapper/1 R running task0 0 1 0x [ 6066.570948] Call trace: [ 6066.573404] __switch_to+0x8c/0xd0 [ 6066.576809] cpuidle_enter_state+0x64/0x360 [ 6066.580992] cpuidle_enter+0x18/0x20 [ 6066.584568] call_cpuidle+0x18/0x30 [ 6066.588056] do_idle+0x1a4/0x1e0 [ 6066.591284] cpu_startup_entry+0x20/0x28 [ 6066.595208] secondary_start_kernel+0x188/0x1c8 [ 6066.599742] rcu_preempt kthread starved for 12345 jiffies! g1390 c1389 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 6066.610270] rcu_preempt I0 8 2 0x [ 6066.615758] Call trace: [ 6066.618203] __switch_to+0x8c/0xd0 [ 6066.621607] __schedule+0x1b8/0x730 [ 6066.625095] schedule+0x38/0xa0 [ 6066.628236] schedule_timeout+0x194/0x428 [ 6066.632249] rcu_gp_kthread+0x4d4/0x780 [ 6066.636087] kthread+0xfc/0x128 > Bisecting the issue resulted in: 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit commit 31e77c93e432dec79c7d90b888bbfc3652592741 Author: Vincent Guittot Date: Wed Feb 14 16:26:46 2018 +0100 sched/fair: Update blocked load when newly idle When NEWLY_IDLE load balance is not triggered, we might need to update the blocked load anyway. We can kick an ilb so an idle CPU will take care of updating blocked load or we can try to update them locally before entering idle. In the latter case, we reuse part of the nohz_idle_balance. After reversing this commit at least the issue with the freezing console is gone. The second one appeared only sporadically, I still have to see whether it pops up again. > > > [...] >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi, On 03/24/2018 01:47 PM, Heiner Kallweit wrote: Am 24.03.2018 um 07:46 schrieb Vincent Guittot: Hi Heiner, Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : Recently I started to get the following problems with linux-next: - When working via Putty/SSH on the system the console frequently freezes for few seconds. Sometimes only opening a second console makes the first one react again. - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as described in [1]. I can't catch this issue on my Juno r0 (arm64 big.Little). root@juno:~# uname -r 4.16.0-rc4-00198-g31e77c93e432 I'm using openssh-client and openssh-server though. Bisecting the issue resulted in: 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit commit 31e77c93e432dec79c7d90b888bbfc3652592741 Author: Vincent GuittotDate: Wed Feb 14 16:26:46 2018 +0100 sched/fair: Update blocked load when newly idle When NEWLY_IDLE load balance is not triggered, we might need to update the blocked load anyway. We can kick an ilb so an idle CPU will take care of updating blocked load or we can try to update them locally before entering idle. In the latter case, we reuse part of the nohz_idle_balance. After reversing this commit at least the issue with the freezing console is gone. The second one appeared only sporadically, I still have to see whether it pops up again. [...]
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi, On 03/24/2018 01:47 PM, Heiner Kallweit wrote: Am 24.03.2018 um 07:46 schrieb Vincent Guittot: Hi Heiner, Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : Recently I started to get the following problems with linux-next: - When working via Putty/SSH on the system the console frequently freezes for few seconds. Sometimes only opening a second console makes the first one react again. - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as described in [1]. I can't catch this issue on my Juno r0 (arm64 big.Little). root@juno:~# uname -r 4.16.0-rc4-00198-g31e77c93e432 I'm using openssh-client and openssh-server though. Bisecting the issue resulted in: 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit commit 31e77c93e432dec79c7d90b888bbfc3652592741 Author: Vincent Guittot Date: Wed Feb 14 16:26:46 2018 +0100 sched/fair: Update blocked load when newly idle When NEWLY_IDLE load balance is not triggered, we might need to update the blocked load anyway. We can kick an ilb so an idle CPU will take care of updating blocked load or we can try to update them locally before entering idle. In the latter case, we reuse part of the nohz_idle_balance. After reversing this commit at least the issue with the freezing console is gone. The second one appeared only sporadically, I still have to see whether it pops up again. [...]
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 24.03.2018 um 07:46 schrieb Vincent Guittot: > Hi Heiner, > > Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : >> Recently I started to get the following problems with linux-next: >> >> - When working via Putty/SSH on the system the console frequently freezes >> for few seconds. Sometimes only opening a second console makes the >> first one react again. >> >> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as >> described in [1]. >> >> Bisecting the issue resulted in: >> >> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit >> commit 31e77c93e432dec79c7d90b888bbfc3652592741 >> Author: Vincent Guittot>> Date: Wed Feb 14 16:26:46 2018 +0100 >> >> sched/fair: Update blocked load when newly idle >> >> When NEWLY_IDLE load balance is not triggered, we might need to update >> the >> blocked load anyway. We can kick an ilb so an idle CPU will take care of >> updating blocked load or we can try to update them locally before >> entering >> idle. In the latter case, we reuse part of the nohz_idle_balance. >> >> After reversing this commit at least the issue with the freezing console >> is gone. The second one appeared only sporadically, I still have to see >> whether it pops up again. >> > > Can you check if the change below fix the problem ? > Thanks for the quick feedback. The change however didn't fix the probem, I didn't notice any changed behavior. > --- > kernel/sched/fair.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3582117..672f212 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, > unsigned int flags, > > has_blocked_load |= update_nohz_stats(rq, true); > > + if (flags == NOHZ_STATS_KICK) > + continue; > + > /* >* If time for next balance is due, >* do the balance. >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Am 24.03.2018 um 07:46 schrieb Vincent Guittot: > Hi Heiner, > > Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : >> Recently I started to get the following problems with linux-next: >> >> - When working via Putty/SSH on the system the console frequently freezes >> for few seconds. Sometimes only opening a second console makes the >> first one react again. >> >> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as >> described in [1]. >> >> Bisecting the issue resulted in: >> >> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit >> commit 31e77c93e432dec79c7d90b888bbfc3652592741 >> Author: Vincent Guittot >> Date: Wed Feb 14 16:26:46 2018 +0100 >> >> sched/fair: Update blocked load when newly idle >> >> When NEWLY_IDLE load balance is not triggered, we might need to update >> the >> blocked load anyway. We can kick an ilb so an idle CPU will take care of >> updating blocked load or we can try to update them locally before >> entering >> idle. In the latter case, we reuse part of the nohz_idle_balance. >> >> After reversing this commit at least the issue with the freezing console >> is gone. The second one appeared only sporadically, I still have to see >> whether it pops up again. >> > > Can you check if the change below fix the problem ? > Thanks for the quick feedback. The change however didn't fix the probem, I didn't notice any changed behavior. > --- > kernel/sched/fair.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3582117..672f212 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, > unsigned int flags, > > has_blocked_load |= update_nohz_stats(rq, true); > > + if (flags == NOHZ_STATS_KICK) > + continue; > + > /* >* If time for next balance is due, >* do the balance. >
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi Heiner, Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : > Recently I started to get the following problems with linux-next: > > - When working via Putty/SSH on the system the console frequently freezes > for few seconds. Sometimes only opening a second console makes the > first one react again. > > - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as > described in [1]. > > Bisecting the issue resulted in: > > 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit > commit 31e77c93e432dec79c7d90b888bbfc3652592741 > Author: Vincent Guittot> Date: Wed Feb 14 16:26:46 2018 +0100 > > sched/fair: Update blocked load when newly idle > > When NEWLY_IDLE load balance is not triggered, we might need to update the > blocked load anyway. We can kick an ilb so an idle CPU will take care of > updating blocked load or we can try to update them locally before entering > idle. In the latter case, we reuse part of the nohz_idle_balance. > > After reversing this commit at least the issue with the freezing console > is gone. The second one appeared only sporadically, I still have to see > whether it pops up again. > Can you check if the change below fix the problem ? --- kernel/sched/fair.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3582117..672f212 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, has_blocked_load |= update_nohz_stats(rq, true); + if (flags == NOHZ_STATS_KICK) + continue; + /* * If time for next balance is due, * do the balance. -- > System is a Zotac CI321 mini PC with Intel Celeron 2961Y CPU. > If you need more details, please let me know. > > Regards, Heiner > > [1] https://lkml.org/lkml/2018/3/22/605
Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"
Hi Heiner, Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit : > Recently I started to get the following problems with linux-next: > > - When working via Putty/SSH on the system the console frequently freezes > for few seconds. Sometimes only opening a second console makes the > first one react again. > > - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as > described in [1]. > > Bisecting the issue resulted in: > > 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit > commit 31e77c93e432dec79c7d90b888bbfc3652592741 > Author: Vincent Guittot > Date: Wed Feb 14 16:26:46 2018 +0100 > > sched/fair: Update blocked load when newly idle > > When NEWLY_IDLE load balance is not triggered, we might need to update the > blocked load anyway. We can kick an ilb so an idle CPU will take care of > updating blocked load or we can try to update them locally before entering > idle. In the latter case, we reuse part of the nohz_idle_balance. > > After reversing this commit at least the issue with the freezing console > is gone. The second one appeared only sporadically, I still have to see > whether it pops up again. > Can you check if the change below fix the problem ? --- kernel/sched/fair.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3582117..672f212 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, has_blocked_load |= update_nohz_stats(rq, true); + if (flags == NOHZ_STATS_KICK) + continue; + /* * If time for next balance is due, * do the balance. -- > System is a Zotac CI321 mini PC with Intel Celeron 2961Y CPU. > If you need more details, please let me know. > > Regards, Heiner > > [1] https://lkml.org/lkml/2018/3/22/605