Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-11 Thread Heiner Kallweit
Am 11.04.2018 um 19:00 schrieb Vincent Guittot:
> Hi Heiner,
> 
> On 9 April 2018 at 19:33, Heiner Kallweit  wrote:
>> Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
>>> Hi Heiner,
>>>
>>> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
 Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>
>>>
>>> I'm finally not so sure that i have the right set up to reproduce the
>>> problem as I haven't been able to reproduce it since.
>>>
>>> Heiner,
>>>
>>> How fast the problem happens on your board ?
>>> Are you doing anything specific on the console that trigger the problem 
>>> ?
>>>
>> Hi Vincent,
>>
>> the lag when working on the console is constantly there, the "rcu_preempt
>> detected stalls" happens after several hours (so far always within 24h)
>> w/o any triggering event I would be aware of. It occured also when the
>> system was idle at that point in time.
>
> Ok, so I don't have the problem on my hikey as the console never lag
> on my setup.
>
> Can you send me the config of  your kernel ? I'd like to check if you
> have enable something that could trigger such problem
>
 Sure, he we go. I also add a system log.
>>>
>>> Thanks for the config. I have used it for my setup but I can't
>>> reproduce your regression. My platforms stay stable so I probably
>>> missing something. Are you facing similar problem with other platforms
>>> or only this celeron based platform ?
>>>
>>> I have reviewed the code but don't see any obvious place in the patch
>>> that can generate the problem. Nevertheless, would you mind to try the
>>> patch below ? It's a blind test to try to narrow the problem.
>>>
>>> Thanks
>>>
>> Hi Vincent,
>>
>> I tried again with today's linux-next and it's much better. The lag isn't
>> completely gone but it's much less annoying. Every ~30 secs the console
>> hangs for about half a second, that's much less frequent than before.
> 
> That's interesting because nothing related to commit
> 31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently
> AFAICT
> 
>>
>> I saw some patches from Rafael have been merged in the last days.
>> Maybe they improved the situation.
> 
> Yes, Peter mentions in another thread that lastest Rafael's patches
> avoid stopping tick when entering short idle thus reducing the time to
> enter idle. commit 31e77 is adding some background activity when
> entering idle so it can be that we take too much time
> 
> You also mentioned that the CPU was relatively slow on the platform.
> Can you try to use cpufreq performance governor instead of ondemand ?
> 

The system uses intel_pstate scaling driver, so only powersave and
performance are available. Min/max frequency are 600MHz / 1100 MHz.
I didn't really notice a difference when switching between both
modes.

Regards, Heiner

> I'm also going to prepare a patch for adding some trace in the code to
> highlight the problem
> 
> Thanks,
> Vincent
> 
>>
>> Regards, Heiner
>>
> 
> [snip]
> 
>>
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-11 Thread Heiner Kallweit
Am 11.04.2018 um 19:00 schrieb Vincent Guittot:
> Hi Heiner,
> 
> On 9 April 2018 at 19:33, Heiner Kallweit  wrote:
>> Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
>>> Hi Heiner,
>>>
>>> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
 Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>
>>>
>>> I'm finally not so sure that i have the right set up to reproduce the
>>> problem as I haven't been able to reproduce it since.
>>>
>>> Heiner,
>>>
>>> How fast the problem happens on your board ?
>>> Are you doing anything specific on the console that trigger the problem 
>>> ?
>>>
>> Hi Vincent,
>>
>> the lag when working on the console is constantly there, the "rcu_preempt
>> detected stalls" happens after several hours (so far always within 24h)
>> w/o any triggering event I would be aware of. It occured also when the
>> system was idle at that point in time.
>
> Ok, so I don't have the problem on my hikey as the console never lag
> on my setup.
>
> Can you send me the config of  your kernel ? I'd like to check if you
> have enable something that could trigger such problem
>
 Sure, he we go. I also add a system log.
>>>
>>> Thanks for the config. I have used it for my setup but I can't
>>> reproduce your regression. My platforms stay stable so I probably
>>> missing something. Are you facing similar problem with other platforms
>>> or only this celeron based platform ?
>>>
>>> I have reviewed the code but don't see any obvious place in the patch
>>> that can generate the problem. Nevertheless, would you mind to try the
>>> patch below ? It's a blind test to try to narrow the problem.
>>>
>>> Thanks
>>>
>> Hi Vincent,
>>
>> I tried again with today's linux-next and it's much better. The lag isn't
>> completely gone but it's much less annoying. Every ~30 secs the console
>> hangs for about half a second, that's much less frequent than before.
> 
> That's interesting because nothing related to commit
> 31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently
> AFAICT
> 
>>
>> I saw some patches from Rafael have been merged in the last days.
>> Maybe they improved the situation.
> 
> Yes, Peter mentions in another thread that lastest Rafael's patches
> avoid stopping tick when entering short idle thus reducing the time to
> enter idle. commit 31e77 is adding some background activity when
> entering idle so it can be that we take too much time
> 
> You also mentioned that the CPU was relatively slow on the platform.
> Can you try to use cpufreq performance governor instead of ondemand ?
> 

The system uses intel_pstate scaling driver, so only powersave and
performance are available. Min/max frequency are 600MHz / 1100 MHz.
I didn't really notice a difference when switching between both
modes.

Regards, Heiner

> I'm also going to prepare a patch for adding some trace in the code to
> highlight the problem
> 
> Thanks,
> Vincent
> 
>>
>> Regards, Heiner
>>
> 
> [snip]
> 
>>
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-11 Thread Vincent Guittot
Hi Heiner,

On 9 April 2018 at 19:33, Heiner Kallweit  wrote:
> Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
>> Hi Heiner,
>>
>> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
>>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
 On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:

>>
>> I'm finally not so sure that i have the right set up to reproduce the
>> problem as I haven't been able to reproduce it since.
>>
>> Heiner,
>>
>> How fast the problem happens on your board ?
>> Are you doing anything specific on the console that trigger the problem ?
>>
> Hi Vincent,
>
> the lag when working on the console is constantly there, the "rcu_preempt
> detected stalls" happens after several hours (so far always within 24h)
> w/o any triggering event I would be aware of. It occured also when the
> system was idle at that point in time.

 Ok, so I don't have the problem on my hikey as the console never lag
 on my setup.

 Can you send me the config of  your kernel ? I'd like to check if you
 have enable something that could trigger such problem

>>> Sure, he we go. I also add a system log.
>>
>> Thanks for the config. I have used it for my setup but I can't
>> reproduce your regression. My platforms stay stable so I probably
>> missing something. Are you facing similar problem with other platforms
>> or only this celeron based platform ?
>>
>> I have reviewed the code but don't see any obvious place in the patch
>> that can generate the problem. Nevertheless, would you mind to try the
>> patch below ? It's a blind test to try to narrow the problem.
>>
>> Thanks
>>
> Hi Vincent,
>
> I tried again with today's linux-next and it's much better. The lag isn't
> completely gone but it's much less annoying. Every ~30 secs the console
> hangs for about half a second, that's much less frequent than before.

That's interesting because nothing related to commit
31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently
AFAICT

>
> I saw some patches from Rafael have been merged in the last days.
> Maybe they improved the situation.

Yes, Peter mentions in another thread that lastest Rafael's patches
avoid stopping tick when entering short idle thus reducing the time to
enter idle. commit 31e77 is adding some background activity when
entering idle so it can be that we take too much time

You also mentioned that the CPU was relatively slow on the platform.
Can you try to use cpufreq performance governor instead of ondemand ?

I'm also going to prepare a patch for adding some trace in the code to
highlight the problem

Thanks,
Vincent

>
> Regards, Heiner
>

[snip]

>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-11 Thread Vincent Guittot
Hi Heiner,

On 9 April 2018 at 19:33, Heiner Kallweit  wrote:
> Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
>> Hi Heiner,
>>
>> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
>>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
 On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:

>>
>> I'm finally not so sure that i have the right set up to reproduce the
>> problem as I haven't been able to reproduce it since.
>>
>> Heiner,
>>
>> How fast the problem happens on your board ?
>> Are you doing anything specific on the console that trigger the problem ?
>>
> Hi Vincent,
>
> the lag when working on the console is constantly there, the "rcu_preempt
> detected stalls" happens after several hours (so far always within 24h)
> w/o any triggering event I would be aware of. It occured also when the
> system was idle at that point in time.

 Ok, so I don't have the problem on my hikey as the console never lag
 on my setup.

 Can you send me the config of  your kernel ? I'd like to check if you
 have enable something that could trigger such problem

>>> Sure, he we go. I also add a system log.
>>
>> Thanks for the config. I have used it for my setup but I can't
>> reproduce your regression. My platforms stay stable so I probably
>> missing something. Are you facing similar problem with other platforms
>> or only this celeron based platform ?
>>
>> I have reviewed the code but don't see any obvious place in the patch
>> that can generate the problem. Nevertheless, would you mind to try the
>> patch below ? It's a blind test to try to narrow the problem.
>>
>> Thanks
>>
> Hi Vincent,
>
> I tried again with today's linux-next and it's much better. The lag isn't
> completely gone but it's much less annoying. Every ~30 secs the console
> hangs for about half a second, that's much less frequent than before.

That's interesting because nothing related to commit
31e77c93e432dec79c7d90b888bbfc3652592741 has been merged recently
AFAICT

>
> I saw some patches from Rafael have been merged in the last days.
> Maybe they improved the situation.

Yes, Peter mentions in another thread that lastest Rafael's patches
avoid stopping tick when entering short idle thus reducing the time to
enter idle. commit 31e77 is adding some background activity when
entering idle so it can be that we take too much time

You also mentioned that the CPU was relatively slow on the platform.
Can you try to use cpufreq performance governor instead of ondemand ?

I'm also going to prepare a patch for adding some trace in the code to
highlight the problem

Thanks,
Vincent

>
> Regards, Heiner
>

[snip]

>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-09 Thread Heiner Kallweit
Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
> Hi Heiner,
> 
> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>>> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
 Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>>
>
> I'm finally not so sure that i have the right set up to reproduce the
> problem as I haven't been able to reproduce it since.
>
> Heiner,
>
> How fast the problem happens on your board ?
> Are you doing anything specific on the console that trigger the problem ?
>
 Hi Vincent,

 the lag when working on the console is constantly there, the "rcu_preempt
 detected stalls" happens after several hours (so far always within 24h)
 w/o any triggering event I would be aware of. It occured also when the
 system was idle at that point in time.
>>>
>>> Ok, so I don't have the problem on my hikey as the console never lag
>>> on my setup.
>>>
>>> Can you send me the config of  your kernel ? I'd like to check if you
>>> have enable something that could trigger such problem
>>>
>> Sure, he we go. I also add a system log.
> 
> Thanks for the config. I have used it for my setup but I can't
> reproduce your regression. My platforms stay stable so I probably
> missing something. Are you facing similar problem with other platforms
> or only this celeron based platform ?
> 
> I have reviewed the code but don't see any obvious place in the patch
> that can generate the problem. Nevertheless, would you mind to try the
> patch below ? It's a blind test to try to narrow the problem.
> 
> Thanks
> 
Hi Vincent,

I tried again with today's linux-next and it's much better. The lag isn't
completely gone but it's much less annoying. Every ~30 secs the console
hangs for about half a second, that's much less frequent than before.

I saw some patches from Rafael have been merged in the last days.
Maybe they improved the situation.

Regards, Heiner

> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0951d1c..e9835f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq,
> struct rq_flags *rf)
>   sd = rcu_dereference_check_sched_domain(this_rq->sd);
>   if (sd)
>   update_next_balance(sd, _balance);
> - rcu_read_unlock();
> 
>   nohz_newidle_balance(this_rq);
> + rcu_read_unlock();
> 
>   goto out;
>   }
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-09 Thread Heiner Kallweit
Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
> Hi Heiner,
> 
> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>>> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
 Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>>
>
> I'm finally not so sure that i have the right set up to reproduce the
> problem as I haven't been able to reproduce it since.
>
> Heiner,
>
> How fast the problem happens on your board ?
> Are you doing anything specific on the console that trigger the problem ?
>
 Hi Vincent,

 the lag when working on the console is constantly there, the "rcu_preempt
 detected stalls" happens after several hours (so far always within 24h)
 w/o any triggering event I would be aware of. It occured also when the
 system was idle at that point in time.
>>>
>>> Ok, so I don't have the problem on my hikey as the console never lag
>>> on my setup.
>>>
>>> Can you send me the config of  your kernel ? I'd like to check if you
>>> have enable something that could trigger such problem
>>>
>> Sure, he we go. I also add a system log.
> 
> Thanks for the config. I have used it for my setup but I can't
> reproduce your regression. My platforms stay stable so I probably
> missing something. Are you facing similar problem with other platforms
> or only this celeron based platform ?
> 
> I have reviewed the code but don't see any obvious place in the patch
> that can generate the problem. Nevertheless, would you mind to try the
> patch below ? It's a blind test to try to narrow the problem.
> 
> Thanks
> 
Hi Vincent,

I tried again with today's linux-next and it's much better. The lag isn't
completely gone but it's much less annoying. Every ~30 secs the console
hangs for about half a second, that's much less frequent than before.

I saw some patches from Rafael have been merged in the last days.
Maybe they improved the situation.

Regards, Heiner

> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0951d1c..e9835f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq,
> struct rq_flags *rf)
>   sd = rcu_dereference_check_sched_domain(this_rq->sd);
>   if (sd)
>   update_next_balance(sd, _balance);
> - rcu_read_unlock();
> 
>   nohz_newidle_balance(this_rq);
> + rcu_read_unlock();
> 
>   goto out;
>   }
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-06 Thread Heiner Kallweit
Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
> Hi Heiner,
> 
> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>>> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
 Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>>
>
> I'm finally not so sure that i have the right set up to reproduce the
> problem as I haven't been able to reproduce it since.
>
> Heiner,
>
> How fast the problem happens on your board ?
> Are you doing anything specific on the console that trigger the problem ?
>
 Hi Vincent,

 the lag when working on the console is constantly there, the "rcu_preempt
 detected stalls" happens after several hours (so far always within 24h)
 w/o any triggering event I would be aware of. It occured also when the
 system was idle at that point in time.
>>>
>>> Ok, so I don't have the problem on my hikey as the console never lag
>>> on my setup.
>>>
>>> Can you send me the config of  your kernel ? I'd like to check if you
>>> have enable something that could trigger such problem
>>>
>> Sure, he we go. I also add a system log.
> 
> Thanks for the config. I have used it for my setup but I can't
> reproduce your regression. My platforms stay stable so I probably
> missing something. Are you facing similar problem with other platforms
> or only this celeron based platform ?
> 
Really appreciate your efforts.
Latest linux-next works fine on a Odroid-C2 (arm64, 4 cores) I have.
So the issue may be dual-core and/or platform-specific.
Another possibility could be that it occurs only on relatively slow
CPU's like this Celeron.

> I have reviewed the code but don't see any obvious place in the patch
> that can generate the problem. Nevertheless, would you mind to try the
> patch below ? It's a blind test to try to narrow the problem.
> 
I tried your patch, system behavior (with the lagging console) is as before.

Regards, Heiner

> Thanks
> 
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0951d1c..e9835f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq,
> struct rq_flags *rf)
>   sd = rcu_dereference_check_sched_domain(this_rq->sd);
>   if (sd)
>   update_next_balance(sd, _balance);
> - rcu_read_unlock();
> 
>   nohz_newidle_balance(this_rq);
> + rcu_read_unlock();
> 
>   goto out;
>   }
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-06 Thread Heiner Kallweit
Am 06.04.2018 um 18:03 schrieb Vincent Guittot:
> Hi Heiner,
> 
> On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
>> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>>> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
 Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>>
>
> I'm finally not so sure that i have the right set up to reproduce the
> problem as I haven't been able to reproduce it since.
>
> Heiner,
>
> How fast the problem happens on your board ?
> Are you doing anything specific on the console that trigger the problem ?
>
 Hi Vincent,

 the lag when working on the console is constantly there, the "rcu_preempt
 detected stalls" happens after several hours (so far always within 24h)
 w/o any triggering event I would be aware of. It occured also when the
 system was idle at that point in time.
>>>
>>> Ok, so I don't have the problem on my hikey as the console never lag
>>> on my setup.
>>>
>>> Can you send me the config of  your kernel ? I'd like to check if you
>>> have enable something that could trigger such problem
>>>
>> Sure, he we go. I also add a system log.
> 
> Thanks for the config. I have used it for my setup but I can't
> reproduce your regression. My platforms stay stable so I probably
> missing something. Are you facing similar problem with other platforms
> or only this celeron based platform ?
> 
Really appreciate your efforts.
Latest linux-next works fine on a Odroid-C2 (arm64, 4 cores) I have.
So the issue may be dual-core and/or platform-specific.
Another possibility could be that it occurs only on relatively slow
CPU's like this Celeron.

> I have reviewed the code but don't see any obvious place in the patch
> that can generate the problem. Nevertheless, would you mind to try the
> patch below ? It's a blind test to try to narrow the problem.
> 
I tried your patch, system behavior (with the lagging console) is as before.

Regards, Heiner

> Thanks
> 
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0951d1c..e9835f2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq,
> struct rq_flags *rf)
>   sd = rcu_dereference_check_sched_domain(this_rq->sd);
>   if (sd)
>   update_next_balance(sd, _balance);
> - rcu_read_unlock();
> 
>   nohz_newidle_balance(this_rq);
> + rcu_read_unlock();
> 
>   goto out;
>   }
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-06 Thread Vincent Guittot
Hi Heiner,

On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
>>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>

 I'm finally not so sure that i have the right set up to reproduce the
 problem as I haven't been able to reproduce it since.

 Heiner,

 How fast the problem happens on your board ?
 Are you doing anything specific on the console that trigger the problem ?

>>> Hi Vincent,
>>>
>>> the lag when working on the console is constantly there, the "rcu_preempt
>>> detected stalls" happens after several hours (so far always within 24h)
>>> w/o any triggering event I would be aware of. It occured also when the
>>> system was idle at that point in time.
>>
>> Ok, so I don't have the problem on my hikey as the console never lag
>> on my setup.
>>
>> Can you send me the config of  your kernel ? I'd like to check if you
>> have enable something that could trigger such problem
>>
> Sure, he we go. I also add a system log.

Thanks for the config. I have used it for my setup but I can't
reproduce your regression. My platforms stay stable so I probably
missing something. Are you facing similar problem with other platforms
or only this celeron based platform ?

I have reviewed the code but don't see any obvious place in the patch
that can generate the problem. Nevertheless, would you mind to try the
patch below ? It's a blind test to try to narrow the problem.

Thanks

---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0951d1c..e9835f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq,
struct rq_flags *rf)
  sd = rcu_dereference_check_sched_domain(this_rq->sd);
  if (sd)
  update_next_balance(sd, _balance);
- rcu_read_unlock();

  nohz_newidle_balance(this_rq);
+ rcu_read_unlock();

  goto out;
  }
-- 
2.7.4

>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 4.16.0-rc7 Kernel Configuration
> #

[snip]


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-04-06 Thread Vincent Guittot
Hi Heiner,

On 30 March 2018 at 10:37, Heiner Kallweit  wrote:
> Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
>> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
>>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
>>

 I'm finally not so sure that i have the right set up to reproduce the
 problem as I haven't been able to reproduce it since.

 Heiner,

 How fast the problem happens on your board ?
 Are you doing anything specific on the console that trigger the problem ?

>>> Hi Vincent,
>>>
>>> the lag when working on the console is constantly there, the "rcu_preempt
>>> detected stalls" happens after several hours (so far always within 24h)
>>> w/o any triggering event I would be aware of. It occured also when the
>>> system was idle at that point in time.
>>
>> Ok, so I don't have the problem on my hikey as the console never lag
>> on my setup.
>>
>> Can you send me the config of  your kernel ? I'd like to check if you
>> have enable something that could trigger such problem
>>
> Sure, he we go. I also add a system log.

Thanks for the config. I have used it for my setup but I can't
reproduce your regression. My platforms stay stable so I probably
missing something. Are you facing similar problem with other platforms
or only this celeron based platform ?

I have reviewed the code but don't see any obvious place in the patch
that can generate the problem. Nevertheless, would you mind to try the
patch below ? It's a blind test to try to narrow the problem.

Thanks

---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0951d1c..e9835f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9794,9 +9794,9 @@ static int idle_balance(struct rq *this_rq,
struct rq_flags *rf)
  sd = rcu_dereference_check_sched_domain(this_rq->sd);
  if (sd)
  update_next_balance(sd, _balance);
- rcu_read_unlock();

  nohz_newidle_balance(this_rq);
+ rcu_read_unlock();

  goto out;
  }
-- 
2.7.4

>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 4.16.0-rc7 Kernel Configuration
> #

[snip]


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-30 Thread Heiner Kallweit
Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
> 
>>>
>>> I'm finally not so sure that i have the right set up to reproduce the
>>> problem as I haven't been able to reproduce it since.
>>>
>>> Heiner,
>>>
>>> How fast the problem happens on your board ?
>>> Are you doing anything specific on the console that trigger the problem ?
>>>
>> Hi Vincent,
>>
>> the lag when working on the console is constantly there, the "rcu_preempt
>> detected stalls" happens after several hours (so far always within 24h)
>> w/o any triggering event I would be aware of. It occured also when the
>> system was idle at that point in time.
> 
> Ok, so I don't have the problem on my hikey as the console never lag
> on my setup.
> 
> Can you send me the config of  your kernel ? I'd like to check if you
> have enable something that could trigger such problem
> 
Sure, he we go. I also add a system log.

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.16.0-rc7 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_NUMA_BALANCING is not set
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y

Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-30 Thread Heiner Kallweit
Am 30.03.2018 um 08:50 schrieb Vincent Guittot:
> On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
>> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
> 
>>>
>>> I'm finally not so sure that i have the right set up to reproduce the
>>> problem as I haven't been able to reproduce it since.
>>>
>>> Heiner,
>>>
>>> How fast the problem happens on your board ?
>>> Are you doing anything specific on the console that trigger the problem ?
>>>
>> Hi Vincent,
>>
>> the lag when working on the console is constantly there, the "rcu_preempt
>> detected stalls" happens after several hours (so far always within 24h)
>> w/o any triggering event I would be aware of. It occured also when the
>> system was idle at that point in time.
> 
> Ok, so I don't have the problem on my hikey as the console never lag
> on my setup.
> 
> Can you send me the config of  your kernel ? I'd like to check if you
> have enable something that could trigger such problem
> 
Sure, he we go. I also add a system log.

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.16.0-rc7 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_NUMA_BALANCING is not set
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# 

Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-30 Thread Vincent Guittot
On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:

>>
>> I'm finally not so sure that i have the right set up to reproduce the
>> problem as I haven't been able to reproduce it since.
>>
>> Heiner,
>>
>> How fast the problem happens on your board ?
>> Are you doing anything specific on the console that trigger the problem ?
>>
> Hi Vincent,
>
> the lag when working on the console is constantly there, the "rcu_preempt
> detected stalls" happens after several hours (so far always within 24h)
> w/o any triggering event I would be aware of. It occured also when the
> system was idle at that point in time.

Ok, so I don't have the problem on my hikey as the console never lag
on my setup.

Can you send me the config of  your kernel ? I'd like to check if you
have enable something that could trigger such problem

Thanks,
Vincent

>
> Rgds, Heiner
>
>> Regards,
>> Vincent
>>

>>> Bisecting the issue resulted in:
>>>
>>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
>>> commit 31e77c93e432dec79c7d90b888bbfc3652592741
>>> Author: Vincent Guittot 
>>> Date:   Wed Feb 14 16:26:46 2018 +0100
>>>
>>>  sched/fair: Update blocked load when newly idle
>>>
>>>  When NEWLY_IDLE load balance is not triggered, we might need to
>>> update the
>>>  blocked load anyway. We can kick an ilb so an idle CPU will take
>>> care of
>>>  updating blocked load or we can try to update them locally before
>>> entering
>>>  idle. In the latter case, we reuse part of the nohz_idle_balance.
>>>
>>> After reversing this commit at least the issue with the freezing console
>>> is gone. The second one appeared only sporadically, I still have to see
>>> whether it pops up again.


 [...]

>>
>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-30 Thread Vincent Guittot
On 29 March 2018 at 19:40, Heiner Kallweit  wrote:
> Am 29.03.2018 um 09:41 schrieb Vincent Guittot:

>>
>> I'm finally not so sure that i have the right set up to reproduce the
>> problem as I haven't been able to reproduce it since.
>>
>> Heiner,
>>
>> How fast the problem happens on your board ?
>> Are you doing anything specific on the console that trigger the problem ?
>>
> Hi Vincent,
>
> the lag when working on the console is constantly there, the "rcu_preempt
> detected stalls" happens after several hours (so far always within 24h)
> w/o any triggering event I would be aware of. It occured also when the
> system was idle at that point in time.

Ok, so I don't have the problem on my hikey as the console never lag
on my setup.

Can you send me the config of  your kernel ? I'd like to check if you
have enable something that could trigger such problem

Thanks,
Vincent

>
> Rgds, Heiner
>
>> Regards,
>> Vincent
>>

>>> Bisecting the issue resulted in:
>>>
>>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
>>> commit 31e77c93e432dec79c7d90b888bbfc3652592741
>>> Author: Vincent Guittot 
>>> Date:   Wed Feb 14 16:26:46 2018 +0100
>>>
>>>  sched/fair: Update blocked load when newly idle
>>>
>>>  When NEWLY_IDLE load balance is not triggered, we might need to
>>> update the
>>>  blocked load anyway. We can kick an ilb so an idle CPU will take
>>> care of
>>>  updating blocked load or we can try to update them locally before
>>> entering
>>>  idle. In the latter case, we reuse part of the nohz_idle_balance.
>>>
>>> After reversing this commit at least the issue with the freezing console
>>> is gone. The second one appeared only sporadically, I still have to see
>>> whether it pops up again.


 [...]

>>
>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-29 Thread Heiner Kallweit
Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
> On 28 March 2018 at 16:01, Vincent Guittot  wrote:
>> Hi,
>>
>> On 28 March 2018 at 12:37, Dietmar Eggemann  wrote:
>>> Hi,
>>>
>>> On 03/24/2018 01:47 PM, Heiner Kallweit wrote:

 Am 24.03.2018 um 07:46 schrieb Vincent Guittot:
>
> Hi Heiner,
>
> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
>>
>> Recently I started to get the following problems with linux-next:
>>
>> - When working via Putty/SSH on the system the console frequently
>> freezes
>>for few seconds. Sometimes only opening a second console makes the
>>first one react again.
>>
>> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>>described in [1].
>>
>>>
>>> I can't catch this issue on my Juno r0 (arm64 big.Little).
>>>
>>> root@juno:~# uname -r
>>> 4.16.0-rc4-00198-g31e77c93e432
>>>
>>> I'm using openssh-client and openssh-server though.
>>
>> I think that I have finally been able to reproduce it on my hikey
>> (octo cortex-A53) after unplugging 6 cores and waiting for almost 2
>> hours
>> This seems to happen only on dual core system as I haven't faced that
>> before on the hikey which I have used for my tests
>>
> 
> I'm finally not so sure that i have the right set up to reproduce the
> problem as I haven't been able to reproduce it since.
> 
> Heiner,
> 
> How fast the problem happens on your board ?
> Are you doing anything specific on the console that trigger the problem ?
> 
Hi Vincent,

the lag when working on the console is constantly there, the "rcu_preempt
detected stalls" happens after several hours (so far always within 24h)
w/o any triggering event I would be aware of. It occured also when the
system was idle at that point in time.

Rgds, Heiner

> Regards,
> Vincent
> 
>>>
>> Bisecting the issue resulted in:
>>
>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
>> commit 31e77c93e432dec79c7d90b888bbfc3652592741
>> Author: Vincent Guittot 
>> Date:   Wed Feb 14 16:26:46 2018 +0100
>>
>>  sched/fair: Update blocked load when newly idle
>>
>>  When NEWLY_IDLE load balance is not triggered, we might need to
>> update the
>>  blocked load anyway. We can kick an ilb so an idle CPU will take
>> care of
>>  updating blocked load or we can try to update them locally before
>> entering
>>  idle. In the latter case, we reuse part of the nohz_idle_balance.
>>
>> After reversing this commit at least the issue with the freezing console
>> is gone. The second one appeared only sporadically, I still have to see
>> whether it pops up again.
>>>
>>>
>>> [...]
>>>
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-29 Thread Heiner Kallweit
Am 29.03.2018 um 09:41 schrieb Vincent Guittot:
> On 28 March 2018 at 16:01, Vincent Guittot  wrote:
>> Hi,
>>
>> On 28 March 2018 at 12:37, Dietmar Eggemann  wrote:
>>> Hi,
>>>
>>> On 03/24/2018 01:47 PM, Heiner Kallweit wrote:

 Am 24.03.2018 um 07:46 schrieb Vincent Guittot:
>
> Hi Heiner,
>
> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
>>
>> Recently I started to get the following problems with linux-next:
>>
>> - When working via Putty/SSH on the system the console frequently
>> freezes
>>for few seconds. Sometimes only opening a second console makes the
>>first one react again.
>>
>> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>>described in [1].
>>
>>>
>>> I can't catch this issue on my Juno r0 (arm64 big.Little).
>>>
>>> root@juno:~# uname -r
>>> 4.16.0-rc4-00198-g31e77c93e432
>>>
>>> I'm using openssh-client and openssh-server though.
>>
>> I think that I have finally been able to reproduce it on my hikey
>> (octo cortex-A53) after unplugging 6 cores and waiting for almost 2
>> hours
>> This seems to happen only on dual core system as I haven't faced that
>> before on the hikey which I have used for my tests
>>
> 
> I'm finally not so sure that i have the right set up to reproduce the
> problem as I haven't been able to reproduce it since.
> 
> Heiner,
> 
> How fast the problem happens on your board ?
> Are you doing anything specific on the console that trigger the problem ?
> 
Hi Vincent,

the lag when working on the console is constantly there, the "rcu_preempt
detected stalls" happens after several hours (so far always within 24h)
w/o any triggering event I would be aware of. It occured also when the
system was idle at that point in time.

Rgds, Heiner

> Regards,
> Vincent
> 
>>>
>> Bisecting the issue resulted in:
>>
>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
>> commit 31e77c93e432dec79c7d90b888bbfc3652592741
>> Author: Vincent Guittot 
>> Date:   Wed Feb 14 16:26:46 2018 +0100
>>
>>  sched/fair: Update blocked load when newly idle
>>
>>  When NEWLY_IDLE load balance is not triggered, we might need to
>> update the
>>  blocked load anyway. We can kick an ilb so an idle CPU will take
>> care of
>>  updating blocked load or we can try to update them locally before
>> entering
>>  idle. In the latter case, we reuse part of the nohz_idle_balance.
>>
>> After reversing this commit at least the issue with the freezing console
>> is gone. The second one appeared only sporadically, I still have to see
>> whether it pops up again.
>>>
>>>
>>> [...]
>>>
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-29 Thread Vincent Guittot
On 28 March 2018 at 16:01, Vincent Guittot  wrote:
> Hi,
>
> On 28 March 2018 at 12:37, Dietmar Eggemann  wrote:
>> Hi,
>>
>> On 03/24/2018 01:47 PM, Heiner Kallweit wrote:
>>>
>>> Am 24.03.2018 um 07:46 schrieb Vincent Guittot:

 Hi Heiner,

 Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
>
> Recently I started to get the following problems with linux-next:
>
> - When working via Putty/SSH on the system the console frequently
> freezes
>for few seconds. Sometimes only opening a second console makes the
>first one react again.
>
> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>described in [1].
>
>>
>> I can't catch this issue on my Juno r0 (arm64 big.Little).
>>
>> root@juno:~# uname -r
>> 4.16.0-rc4-00198-g31e77c93e432
>>
>> I'm using openssh-client and openssh-server though.
>
> I think that I have finally been able to reproduce it on my hikey
> (octo cortex-A53) after unplugging 6 cores and waiting for almost 2
> hours
> This seems to happen only on dual core system as I haven't faced that
> before on the hikey which I have used for my tests
>

I'm finally not so sure that i have the right set up to reproduce the
problem as I haven't been able to reproduce it since.

Heiner,

How fast the problem happens on your board ?
Are you doing anything specific on the console that trigger the problem ?

Regards,
Vincent

>>
> Bisecting the issue resulted in:
>
> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
> commit 31e77c93e432dec79c7d90b888bbfc3652592741
> Author: Vincent Guittot 
> Date:   Wed Feb 14 16:26:46 2018 +0100
>
>  sched/fair: Update blocked load when newly idle
>
>  When NEWLY_IDLE load balance is not triggered, we might need to
> update the
>  blocked load anyway. We can kick an ilb so an idle CPU will take
> care of
>  updating blocked load or we can try to update them locally before
> entering
>  idle. In the latter case, we reuse part of the nohz_idle_balance.
>
> After reversing this commit at least the issue with the freezing console
> is gone. The second one appeared only sporadically, I still have to see
> whether it pops up again.
>>
>>
>> [...]
>>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-29 Thread Vincent Guittot
On 28 March 2018 at 16:01, Vincent Guittot  wrote:
> Hi,
>
> On 28 March 2018 at 12:37, Dietmar Eggemann  wrote:
>> Hi,
>>
>> On 03/24/2018 01:47 PM, Heiner Kallweit wrote:
>>>
>>> Am 24.03.2018 um 07:46 schrieb Vincent Guittot:

 Hi Heiner,

 Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
>
> Recently I started to get the following problems with linux-next:
>
> - When working via Putty/SSH on the system the console frequently
> freezes
>for few seconds. Sometimes only opening a second console makes the
>first one react again.
>
> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>described in [1].
>
>>
>> I can't catch this issue on my Juno r0 (arm64 big.Little).
>>
>> root@juno:~# uname -r
>> 4.16.0-rc4-00198-g31e77c93e432
>>
>> I'm using openssh-client and openssh-server though.
>
> I think that I have finally been able to reproduce it on my hikey
> (octo cortex-A53) after unplugging 6 cores and waiting for almost 2
> hours
> This seems to happen only on dual core system as I haven't faced that
> before on the hikey which I have used for my tests
>

I'm finally not so sure that i have the right set up to reproduce the
problem as I haven't been able to reproduce it since.

Heiner,

How fast the problem happens on your board ?
Are you doing anything specific on the console that trigger the problem ?

Regards,
Vincent

>>
> Bisecting the issue resulted in:
>
> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
> commit 31e77c93e432dec79c7d90b888bbfc3652592741
> Author: Vincent Guittot 
> Date:   Wed Feb 14 16:26:46 2018 +0100
>
>  sched/fair: Update blocked load when newly idle
>
>  When NEWLY_IDLE load balance is not triggered, we might need to
> update the
>  blocked load anyway. We can kick an ilb so an idle CPU will take
> care of
>  updating blocked load or we can try to update them locally before
> entering
>  idle. In the latter case, we reuse part of the nohz_idle_balance.
>
> After reversing this commit at least the issue with the freezing console
> is gone. The second one appeared only sporadically, I still have to see
> whether it pops up again.
>>
>>
>> [...]
>>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-28 Thread Vincent Guittot
Hi,

On 28 March 2018 at 12:37, Dietmar Eggemann  wrote:
> Hi,
>
> On 03/24/2018 01:47 PM, Heiner Kallweit wrote:
>>
>> Am 24.03.2018 um 07:46 schrieb Vincent Guittot:
>>>
>>> Hi Heiner,
>>>
>>> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :

 Recently I started to get the following problems with linux-next:

 - When working via Putty/SSH on the system the console frequently
 freezes
for few seconds. Sometimes only opening a second console makes the
first one react again.

 - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
described in [1].

>
> I can't catch this issue on my Juno r0 (arm64 big.Little).
>
> root@juno:~# uname -r
> 4.16.0-rc4-00198-g31e77c93e432
>
> I'm using openssh-client and openssh-server though.

I think that I have finally been able to reproduce it on my hikey
(octo cortex-A53) after unplugging 6 cores and waiting for almost 2
hours
This seems to happen only on dual core system as I haven't faced that
before on the hikey which I have used for my tests

[  191.365730] CPU2: shutdown
[  191.368482] psci: CPU2 killed.
[  195.601017] CPU3: shutdown
[  195.603767] psci: CPU3 killed.
[  199.037500] CPU4: shutdown
[  199.040251] psci: CPU4 killed.
[  201.813237] CPU5: shutdown
[  201.815996] psci: CPU5 killed.
[  204.624902] CPU6: shutdown
[  204.627646] psci: CPU6 killed.
[  207.652478] CPU7: shutdown
[  207.655204] psci: CPU7 killed.
[ 6017.160463] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 6017.166151] 1-...!: (4 GPs behind) idle=e20/0/0 softirq=10820/10864 fqs=0
[ 6017.173113] (detected by 0, t=20705 jiffies, g=1389, c=1388, q=27)
[ 6017.179386] Task dump for CPU 1:
[ 6017.182612] swapper/1   R  running task0 0  1 0x
[ 6017.189666] Call trace:
[ 6017.192120]  __switch_to+0x8c/0xd0
[ 6017.195524]  cpuidle_enter_state+0x64/0x360
[ 6017.199706]  cpuidle_enter+0x18/0x20
[ 6017.203282]  call_cpuidle+0x18/0x30
[ 6017.206771]  do_idle+0x1a4/0x1e0
[ 6017.20]  cpu_startup_entry+0x20/0x28
[ 6017.213923]  secondary_start_kernel+0x188/0x1c8
[ 6017.218457] rcu_preempt kthread starved for 20705 jiffies! g1389
c1388 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[ 6017.228985] rcu_preempt I0 8  2 0x
[ 6017.234474] Call trace:
[ 6017.236918]  __switch_to+0x8c/0xd0
[ 6017.240322]  __schedule+0x1b8/0x730
[ 6017.243810]  schedule+0x38/0xa0
[ 6017.246952]  schedule_timeout+0x194/0x428
[ 6017.250964]  rcu_gp_kthread+0x4d4/0x780
[ 6017.254802]  kthread+0xfc/0x128
[ 6017.257942]  ret_from_fork+0x10/0x18
[ 6066.541736] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 6066.547428] 1-...!: (5 GPs behind) idle=e28/0/0 softirq=10820/10864 fqs=0
[ 6066.554392] (detected by 0, t=12345 jiffies, g=1390, c=1389, q=48)
[ 6066.560666] Task dump for CPU 1:
[ 6066.563893] swapper/1   R  running task0 0  1 0x
[ 6066.570948] Call trace:
[ 6066.573404]  __switch_to+0x8c/0xd0
[ 6066.576809]  cpuidle_enter_state+0x64/0x360
[ 6066.580992]  cpuidle_enter+0x18/0x20
[ 6066.584568]  call_cpuidle+0x18/0x30
[ 6066.588056]  do_idle+0x1a4/0x1e0
[ 6066.591284]  cpu_startup_entry+0x20/0x28
[ 6066.595208]  secondary_start_kernel+0x188/0x1c8
[ 6066.599742] rcu_preempt kthread starved for 12345 jiffies! g1390
c1389 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[ 6066.610270] rcu_preempt I0 8  2 0x
[ 6066.615758] Call trace:
[ 6066.618203]  __switch_to+0x8c/0xd0
[ 6066.621607]  __schedule+0x1b8/0x730
[ 6066.625095]  schedule+0x38/0xa0
[ 6066.628236]  schedule_timeout+0x194/0x428
[ 6066.632249]  rcu_gp_kthread+0x4d4/0x780
[ 6066.636087]  kthread+0xfc/0x128

>
 Bisecting the issue resulted in:

 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
 commit 31e77c93e432dec79c7d90b888bbfc3652592741
 Author: Vincent Guittot 
 Date:   Wed Feb 14 16:26:46 2018 +0100

  sched/fair: Update blocked load when newly idle

  When NEWLY_IDLE load balance is not triggered, we might need to
 update the
  blocked load anyway. We can kick an ilb so an idle CPU will take
 care of
  updating blocked load or we can try to update them locally before
 entering
  idle. In the latter case, we reuse part of the nohz_idle_balance.

 After reversing this commit at least the issue with the freezing console
 is gone. The second one appeared only sporadically, I still have to see
 whether it pops up again.
>
>
> [...]
>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-28 Thread Vincent Guittot
Hi,

On 28 March 2018 at 12:37, Dietmar Eggemann  wrote:
> Hi,
>
> On 03/24/2018 01:47 PM, Heiner Kallweit wrote:
>>
>> Am 24.03.2018 um 07:46 schrieb Vincent Guittot:
>>>
>>> Hi Heiner,
>>>
>>> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :

 Recently I started to get the following problems with linux-next:

 - When working via Putty/SSH on the system the console frequently
 freezes
for few seconds. Sometimes only opening a second console makes the
first one react again.

 - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
described in [1].

>
> I can't catch this issue on my Juno r0 (arm64 big.Little).
>
> root@juno:~# uname -r
> 4.16.0-rc4-00198-g31e77c93e432
>
> I'm using openssh-client and openssh-server though.

I think that I have finally been able to reproduce it on my hikey
(octo cortex-A53) after unplugging 6 cores and waiting for almost 2
hours
This seems to happen only on dual core system as I haven't faced that
before on the hikey which I have used for my tests

[  191.365730] CPU2: shutdown
[  191.368482] psci: CPU2 killed.
[  195.601017] CPU3: shutdown
[  195.603767] psci: CPU3 killed.
[  199.037500] CPU4: shutdown
[  199.040251] psci: CPU4 killed.
[  201.813237] CPU5: shutdown
[  201.815996] psci: CPU5 killed.
[  204.624902] CPU6: shutdown
[  204.627646] psci: CPU6 killed.
[  207.652478] CPU7: shutdown
[  207.655204] psci: CPU7 killed.
[ 6017.160463] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 6017.166151] 1-...!: (4 GPs behind) idle=e20/0/0 softirq=10820/10864 fqs=0
[ 6017.173113] (detected by 0, t=20705 jiffies, g=1389, c=1388, q=27)
[ 6017.179386] Task dump for CPU 1:
[ 6017.182612] swapper/1   R  running task0 0  1 0x
[ 6017.189666] Call trace:
[ 6017.192120]  __switch_to+0x8c/0xd0
[ 6017.195524]  cpuidle_enter_state+0x64/0x360
[ 6017.199706]  cpuidle_enter+0x18/0x20
[ 6017.203282]  call_cpuidle+0x18/0x30
[ 6017.206771]  do_idle+0x1a4/0x1e0
[ 6017.20]  cpu_startup_entry+0x20/0x28
[ 6017.213923]  secondary_start_kernel+0x188/0x1c8
[ 6017.218457] rcu_preempt kthread starved for 20705 jiffies! g1389
c1388 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[ 6017.228985] rcu_preempt I0 8  2 0x
[ 6017.234474] Call trace:
[ 6017.236918]  __switch_to+0x8c/0xd0
[ 6017.240322]  __schedule+0x1b8/0x730
[ 6017.243810]  schedule+0x38/0xa0
[ 6017.246952]  schedule_timeout+0x194/0x428
[ 6017.250964]  rcu_gp_kthread+0x4d4/0x780
[ 6017.254802]  kthread+0xfc/0x128
[ 6017.257942]  ret_from_fork+0x10/0x18
[ 6066.541736] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 6066.547428] 1-...!: (5 GPs behind) idle=e28/0/0 softirq=10820/10864 fqs=0
[ 6066.554392] (detected by 0, t=12345 jiffies, g=1390, c=1389, q=48)
[ 6066.560666] Task dump for CPU 1:
[ 6066.563893] swapper/1   R  running task0 0  1 0x
[ 6066.570948] Call trace:
[ 6066.573404]  __switch_to+0x8c/0xd0
[ 6066.576809]  cpuidle_enter_state+0x64/0x360
[ 6066.580992]  cpuidle_enter+0x18/0x20
[ 6066.584568]  call_cpuidle+0x18/0x30
[ 6066.588056]  do_idle+0x1a4/0x1e0
[ 6066.591284]  cpu_startup_entry+0x20/0x28
[ 6066.595208]  secondary_start_kernel+0x188/0x1c8
[ 6066.599742] rcu_preempt kthread starved for 12345 jiffies! g1390
c1389 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[ 6066.610270] rcu_preempt I0 8  2 0x
[ 6066.615758] Call trace:
[ 6066.618203]  __switch_to+0x8c/0xd0
[ 6066.621607]  __schedule+0x1b8/0x730
[ 6066.625095]  schedule+0x38/0xa0
[ 6066.628236]  schedule_timeout+0x194/0x428
[ 6066.632249]  rcu_gp_kthread+0x4d4/0x780
[ 6066.636087]  kthread+0xfc/0x128

>
 Bisecting the issue resulted in:

 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
 commit 31e77c93e432dec79c7d90b888bbfc3652592741
 Author: Vincent Guittot 
 Date:   Wed Feb 14 16:26:46 2018 +0100

  sched/fair: Update blocked load when newly idle

  When NEWLY_IDLE load balance is not triggered, we might need to
 update the
  blocked load anyway. We can kick an ilb so an idle CPU will take
 care of
  updating blocked load or we can try to update them locally before
 entering
  idle. In the latter case, we reuse part of the nohz_idle_balance.

 After reversing this commit at least the issue with the freezing console
 is gone. The second one appeared only sporadically, I still have to see
 whether it pops up again.
>
>
> [...]
>


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-28 Thread Dietmar Eggemann

Hi,

On 03/24/2018 01:47 PM, Heiner Kallweit wrote:

Am 24.03.2018 um 07:46 schrieb Vincent Guittot:

Hi Heiner,

Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :

Recently I started to get the following problems with linux-next:

- When working via Putty/SSH on the system the console frequently freezes
   for few seconds. Sometimes only opening a second console makes the
   first one react again.

- I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
   described in [1].



I can't catch this issue on my Juno r0 (arm64 big.Little).

root@juno:~# uname -r
4.16.0-rc4-00198-g31e77c93e432

I'm using openssh-client and openssh-server though.


Bisecting the issue resulted in:

31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
commit 31e77c93e432dec79c7d90b888bbfc3652592741
Author: Vincent Guittot 
Date:   Wed Feb 14 16:26:46 2018 +0100

 sched/fair: Update blocked load when newly idle

 When NEWLY_IDLE load balance is not triggered, we might need to update the
 blocked load anyway. We can kick an ilb so an idle CPU will take care of
 updating blocked load or we can try to update them locally before entering
 idle. In the latter case, we reuse part of the nohz_idle_balance.

After reversing this commit at least the issue with the freezing console
is gone. The second one appeared only sporadically, I still have to see
whether it pops up again.


[...]



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-28 Thread Dietmar Eggemann

Hi,

On 03/24/2018 01:47 PM, Heiner Kallweit wrote:

Am 24.03.2018 um 07:46 schrieb Vincent Guittot:

Hi Heiner,

Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :

Recently I started to get the following problems with linux-next:

- When working via Putty/SSH on the system the console frequently freezes
   for few seconds. Sometimes only opening a second console makes the
   first one react again.

- I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
   described in [1].



I can't catch this issue on my Juno r0 (arm64 big.Little).

root@juno:~# uname -r
4.16.0-rc4-00198-g31e77c93e432

I'm using openssh-client and openssh-server though.


Bisecting the issue resulted in:

31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
commit 31e77c93e432dec79c7d90b888bbfc3652592741
Author: Vincent Guittot 
Date:   Wed Feb 14 16:26:46 2018 +0100

 sched/fair: Update blocked load when newly idle

 When NEWLY_IDLE load balance is not triggered, we might need to update the
 blocked load anyway. We can kick an ilb so an idle CPU will take care of
 updating blocked load or we can try to update them locally before entering
 idle. In the latter case, we reuse part of the nohz_idle_balance.

After reversing this commit at least the issue with the freezing console
is gone. The second one appeared only sporadically, I still have to see
whether it pops up again.


[...]



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-24 Thread Heiner Kallweit
Am 24.03.2018 um 07:46 schrieb Vincent Guittot:
> Hi Heiner,
> 
> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
>> Recently I started to get the following problems with linux-next:
>>
>> - When working via Putty/SSH on the system the console frequently freezes
>>   for few seconds. Sometimes only opening a second console makes the
>>   first one react again.
>>
>> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>>   described in [1].
>>
>> Bisecting the issue resulted in:
>>
>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
>> commit 31e77c93e432dec79c7d90b888bbfc3652592741
>> Author: Vincent Guittot 
>> Date:   Wed Feb 14 16:26:46 2018 +0100
>>
>> sched/fair: Update blocked load when newly idle
>>
>> When NEWLY_IDLE load balance is not triggered, we might need to update 
>> the
>> blocked load anyway. We can kick an ilb so an idle CPU will take care of
>> updating blocked load or we can try to update them locally before 
>> entering
>> idle. In the latter case, we reuse part of the nohz_idle_balance.
>>
>> After reversing this commit at least the issue with the freezing console
>> is gone. The second one appeared only sporadically, I still have to see
>> whether it pops up again.
>>
> 
> Can you check if the change below fix the problem ? 
> 
Thanks for the quick feedback. The change however didn't fix the probem, I 
didn't
notice any changed behavior.

> ---
>  kernel/sched/fair.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3582117..672f212 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
> unsigned int flags,
>  
>   has_blocked_load |= update_nohz_stats(rq, true);
>  
> + if (flags == NOHZ_STATS_KICK)
> + continue;
> +
>   /*
>* If time for next balance is due,
>* do the balance.
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-24 Thread Heiner Kallweit
Am 24.03.2018 um 07:46 schrieb Vincent Guittot:
> Hi Heiner,
> 
> Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
>> Recently I started to get the following problems with linux-next:
>>
>> - When working via Putty/SSH on the system the console frequently freezes
>>   for few seconds. Sometimes only opening a second console makes the
>>   first one react again.
>>
>> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>>   described in [1].
>>
>> Bisecting the issue resulted in:
>>
>> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
>> commit 31e77c93e432dec79c7d90b888bbfc3652592741
>> Author: Vincent Guittot 
>> Date:   Wed Feb 14 16:26:46 2018 +0100
>>
>> sched/fair: Update blocked load when newly idle
>>
>> When NEWLY_IDLE load balance is not triggered, we might need to update 
>> the
>> blocked load anyway. We can kick an ilb so an idle CPU will take care of
>> updating blocked load or we can try to update them locally before 
>> entering
>> idle. In the latter case, we reuse part of the nohz_idle_balance.
>>
>> After reversing this commit at least the issue with the freezing console
>> is gone. The second one appeared only sporadically, I still have to see
>> whether it pops up again.
>>
> 
> Can you check if the change below fix the problem ? 
> 
Thanks for the quick feedback. The change however didn't fix the probem, I 
didn't
notice any changed behavior.

> ---
>  kernel/sched/fair.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3582117..672f212 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
> unsigned int flags,
>  
>   has_blocked_load |= update_nohz_stats(rq, true);
>  
> + if (flags == NOHZ_STATS_KICK)
> + continue;
> +
>   /*
>* If time for next balance is due,
>* do the balance.
> 



Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-24 Thread Vincent Guittot
Hi Heiner,

Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
> Recently I started to get the following problems with linux-next:
> 
> - When working via Putty/SSH on the system the console frequently freezes
>   for few seconds. Sometimes only opening a second console makes the
>   first one react again.
> 
> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>   described in [1].
> 
> Bisecting the issue resulted in:
> 
> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
> commit 31e77c93e432dec79c7d90b888bbfc3652592741
> Author: Vincent Guittot 
> Date:   Wed Feb 14 16:26:46 2018 +0100
> 
> sched/fair: Update blocked load when newly idle
> 
> When NEWLY_IDLE load balance is not triggered, we might need to update the
> blocked load anyway. We can kick an ilb so an idle CPU will take care of
> updating blocked load or we can try to update them locally before entering
> idle. In the latter case, we reuse part of the nohz_idle_balance.
>
> After reversing this commit at least the issue with the freezing console
> is gone. The second one appeared only sporadically, I still have to see
> whether it pops up again.
>

Can you check if the change below fix the problem ? 

---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3582117..672f212 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
unsigned int flags,
 
has_blocked_load |= update_nohz_stats(rq, true);
 
+   if (flags == NOHZ_STATS_KICK)
+   continue;
+
/*
 * If time for next balance is due,
 * do the balance.
-- 

> System is a Zotac CI321 mini PC with Intel Celeron 2961Y CPU.
> If you need more details, please let me know.
> 
> Regards, Heiner
> 
> [1] https://lkml.org/lkml/2018/3/22/605


Re: Problem with commit 31e77c93e432 "sched/fair: Update blocked load when newly idle"

2018-03-24 Thread Vincent Guittot
Hi Heiner,

Le Friday 23 Mar 2018 à 22:28:09 (+0100), Heiner Kallweit a écrit :
> Recently I started to get the following problems with linux-next:
> 
> - When working via Putty/SSH on the system the console frequently freezes
>   for few seconds. Sometimes only opening a second console makes the
>   first one react again.
> 
> - I get "INFO: rcu_sched detected stalls on CPUs/tasks:" warnings as
>   described in [1].
> 
> Bisecting the issue resulted in:
> 
> 31e77c93e432dec79c7d90b888bbfc3652592741 is the first bad commit
> commit 31e77c93e432dec79c7d90b888bbfc3652592741
> Author: Vincent Guittot 
> Date:   Wed Feb 14 16:26:46 2018 +0100
> 
> sched/fair: Update blocked load when newly idle
> 
> When NEWLY_IDLE load balance is not triggered, we might need to update the
> blocked load anyway. We can kick an ilb so an idle CPU will take care of
> updating blocked load or we can try to update them locally before entering
> idle. In the latter case, we reuse part of the nohz_idle_balance.
>
> After reversing this commit at least the issue with the freezing console
> is gone. The second one appeared only sporadically, I still have to see
> whether it pops up again.
>

Can you check if the change below fix the problem ? 

---
 kernel/sched/fair.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3582117..672f212 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9430,6 +9430,9 @@ static bool _nohz_idle_balance(struct rq *this_rq, 
unsigned int flags,
 
has_blocked_load |= update_nohz_stats(rq, true);
 
+   if (flags == NOHZ_STATS_KICK)
+   continue;
+
/*
 * If time for next balance is due,
 * do the balance.
-- 

> System is a Zotac CI321 mini PC with Intel Celeron 2961Y CPU.
> If you need more details, please let me know.
> 
> Regards, Heiner
> 
> [1] https://lkml.org/lkml/2018/3/22/605