Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-04-17 Thread Nicholas Piggin
Thiago Jung Bauermann's on April 18, 2019 11:00 am:
> 
> Hello Nick,
> 
> Thank you very much for reviewing this patch!
> 
> Nicholas Piggin  writes:
> 
>> Thiago Jung Bauermann's on April 11, 2019 9:08 am:
>>>
>>> Thiago Jung Bauermann  writes:
>>>
 diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
 b/arch/powerpc/platforms/pseries/hotplug-cpu.c
 index 97feb6e79f1a..ac6dc35ab829 100644
 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
 +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
 @@ -214,13 +214,22 @@ static void pseries_cpu_die(unsigned int cpu)
msleep(1);
}
} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
 +  /*
 +   * If the current state is not offline yet, it means that the
 +   * dying CPU (which is either in pseries_mach_cpu_die() or in
 +   * the process of getting there) didn't have a chance yet to
 +   * call rtas_stop_self() and therefore it's too early to query
 +   * if the CPU is stopped.
 +   */
 +  spin_event_timeout(get_cpu_current_state(cpu) == 
 CPU_STATE_OFFLINE,
 + 10, 100);
>>
>> If the CPU state does not go to offline here, you should give up and
>> return online, right? Otherwise I think query-cpu-stopped-state can
>> get confused by CPUs in idle and you get a false positive.
> 
> Can it get confused? My impression from reading the definition for
> query-cpu-stopped-state in the PAPR is that it will simply return a
> CPU_status value of 2 in that case, meaning that "the processor thread
> is not in the RTAS stopped state", but I don't know much about this.

In QEMU (non-KVM) mode, qcss I think may get confused between H_CEDE
and rtas-stop-self. KVM mode may be okay because H_CEDE is handled in
the kernel.

>> That race can still happen, we would really need a sequence count check
>> over current CPU state to ensure we got a race-free qcss value, but at
>> least a check here should make the race implausible to hit.
> 
> Actually, since rtas_stop_self() panics if the processor fails to stop
> and also since callers of pseries_cpu_die()¹ already assume that it is
> going to succeed in stopping the CPU (given that the function returns
> void and can't signal an error), a more straightforward way of
> eliminating the race is to simply do this:
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 97feb6e79f1a..2331a609f48f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -215,7 +215,7 @@ static void pseries_cpu_die(unsigned int cpu)
> }
> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
> -   for (tries = 0; tries < 25; tries++) {
> +   while (true) {
> cpu_status = smp_query_cpu_stopped(pcpu);
> if (cpu_status == QCSS_STOPPED ||
> cpu_status == QCSS_HARDWARE_ERROR)
> 
> 
> What do you think?

Yeah I think that may be a good idea, just makes things much simpler.

Thanks,
Nick



Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-04-17 Thread Thiago Jung Bauermann


Hello Nick,

Thank you very much for reviewing this patch!

Nicholas Piggin  writes:

> Thiago Jung Bauermann's on April 11, 2019 9:08 am:
>>
>> Thiago Jung Bauermann  writes:
>>
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> index 97feb6e79f1a..ac6dc35ab829 100644
>>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>>> @@ -214,13 +214,22 @@ static void pseries_cpu_die(unsigned int cpu)
>>> msleep(1);
>>> }
>>> } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>>> +   /*
>>> +* If the current state is not offline yet, it means that the
>>> +* dying CPU (which is either in pseries_mach_cpu_die() or in
>>> +* the process of getting there) didn't have a chance yet to
>>> +* call rtas_stop_self() and therefore it's too early to query
>>> +* if the CPU is stopped.
>>> +*/
>>> +   spin_event_timeout(get_cpu_current_state(cpu) == 
>>> CPU_STATE_OFFLINE,
>>> +  10, 100);
>
> If the CPU state does not go to offline here, you should give up and
> return online, right? Otherwise I think query-cpu-stopped-state can
> get confused by CPUs in idle and you get a false positive.

Can it get confused? My impression from reading the definition for
query-cpu-stopped-state in the PAPR is that it will simply return a
CPU_status value of 2 in that case, meaning that "the processor thread
is not in the RTAS stopped state", but I don't know much about this.

> That race can still happen, we would really need a sequence count check
> over current CPU state to ensure we got a race-free qcss value, but at
> least a check here should make the race implausible to hit.

Actually, since rtas_stop_self() panics if the processor fails to stop
and also since callers of pseries_cpu_die()¹ already assume that it is
going to succeed in stopping the CPU (given that the function returns
void and can't signal an error), a more straightforward way of
eliminating the race is to simply do this:

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..2331a609f48f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -215,7 +215,7 @@ static void pseries_cpu_die(unsigned int cpu)
}
} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {

-   for (tries = 0; tries < 25; tries++) {
+   while (true) {
cpu_status = smp_query_cpu_stopped(pcpu);
if (cpu_status == QCSS_STOPPED ||
cpu_status == QCSS_HARDWARE_ERROR)


What do you think?

--
Thiago Jung Bauermann
IBM Linux Technology Center

¹ dlpar_offline_cpu() and takedown_cpu() in generic code



Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-04-15 Thread Nicholas Piggin
Thiago Jung Bauermann's on April 11, 2019 9:08 am:
> 
> Hello,
> 
> Ping?
> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 
> Thiago Jung Bauermann  writes:
> 
>> When testing DLPAR CPU add/remove on a system under stress,
>> pseries_cpu_die() doesn't wait long enough for a CPU to die:
>>
>> [  446.983944] cpu 148 (hwid 148) Ready to die...
>> [  446.984062] cpu 149 (hwid 149) Ready to die...
>> [  446.993518] cpu 150 (hwid 150) Ready to die...
>> [  446.993543] Querying DEAD? cpu 150 (150) shows 2
>> [  446.994098] cpu 151 (hwid 151) Ready to die...
>> [  447.133726] cpu 136 (hwid 136) Ready to die...
>> [  447.403532] cpu 137 (hwid 137) Ready to die...
>> [  447.403772] cpu 138 (hwid 138) Ready to die...
>> [  447.403839] cpu 139 (hwid 139) Ready to die...
>> [  447.403887] cpu 140 (hwid 140) Ready to die...
>> [  447.403937] cpu 141 (hwid 141) Ready to die...
>> [  447.403979] cpu 142 (hwid 142) Ready to die...
>> [  447.404038] cpu 143 (hwid 143) Ready to die...
>> [  447.513546] cpu 128 (hwid 128) Ready to die...
>> [  447.693533] cpu 129 (hwid 129) Ready to die...
>> [  447.693999] cpu 130 (hwid 130) Ready to die...
>> [  447.703530] cpu 131 (hwid 131) Ready to die...
>> [  447.704087] Querying DEAD? cpu 132 (132) shows 2
>> [  447.704102] cpu 132 (hwid 132) Ready to die...
>> [  447.713534] cpu 133 (hwid 133) Ready to die...
>> [  447.714064] Querying DEAD? cpu 134 (134) shows 2
>>
>> This is a race between one CPU stopping and another one calling
>> pseries_cpu_die() to wait for it to stop. That function does a short busy
>> loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
>> that it is stopped, but I think there's a lot for the stopping CPU to do
>> which may take longer than this loop allows.
>>
>> As can be seen in the dmesg right before or after the "Querying DEAD?"
>> messages, if pseries_cpu_die() waited a little longer it would have seen
>> the CPU in the stopped state.
>>
>> What I think is going on is that CPU 134 was inactive at the time it was
>> unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
>> immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
>> and start the process of stopping itself. The busy loop is not long enough
>> to allow for the CPU to wake up and complete the stopping process.
>>
>> This can be a problem because if the busy loop finishes too early, then the
>> kernel may offline another CPU before the previous one finished dying,
>> which would lead to two concurrent calls to rtas-stop-self, which is
>> prohibited by the PAPR.
>>
>> We can make the race a lot more even if we only start querying if the CPU
>> is stopped when the stopping CPU is close to call rtas_stop_self(). Since
>> pseries_mach_cpu_die() sets the CPU current state to offline almost
>> immediately before calling rtas_stop_self(), we use that as a signal that
>> it is either already stopped or very close to that point, and we can start
>> the busy loop.
>>
>> As suggested by Michael Ellerman, this patch also changes the busy loop to
>> wait for a fixed amount of wall time. Based on the measurements that
>> Gautham did on a POWER9 system, in successful cases of
>> smp_query_cpu_stopped(cpu) returning affirmative, the maximum time spent
>> inside the loop was was 10 ms. This patch loops for 20 ms just be sure.
>>
>> Signed-off-by: Thiago Jung Bauermann 
>> Analyzed-by: Gautham R Shenoy 
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> I have seen this problem since v4.8. Should this patch go to stable as
>> well?
>>
>> Changes since v2:
>> - Increaded busy loop to 200 iterations so that it can last up to 20 ms
>>   (suggested by Gautham).
>> - Changed commit message to include Gautham's remarks.
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
>> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..ac6dc35ab829 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,22 @@ static void pseries_cpu_die(unsigned int cpu)
>>  msleep(1);
>>  }
>>  } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> +/*
>> + * If the current state is not offline yet, it means that the
>> + * dying CPU (which is either in pseries_mach_cpu_die() or in
>> + * the process of getting there) didn't have a chance yet to
>> + * call rtas_stop_self() and therefore it's too early to query
>> + * if the CPU is stopped.
>> + */
>> +spin_event_timeout(get_cpu_current_state(cpu) == 
>> CPU_STATE_OFFLINE,
>> +   10, 100);

If the CPU state does not go to offline here, you should give up and
return online, right? Otherwise I think query-cpu-stopped-state can
get confused by CPUs in 

Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-04-10 Thread Thiago Jung Bauermann


Hello,

Ping?

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Thiago Jung Bauermann  writes:

> When testing DLPAR CPU add/remove on a system under stress,
> pseries_cpu_die() doesn't wait long enough for a CPU to die:
>
> [  446.983944] cpu 148 (hwid 148) Ready to die...
> [  446.984062] cpu 149 (hwid 149) Ready to die...
> [  446.993518] cpu 150 (hwid 150) Ready to die...
> [  446.993543] Querying DEAD? cpu 150 (150) shows 2
> [  446.994098] cpu 151 (hwid 151) Ready to die...
> [  447.133726] cpu 136 (hwid 136) Ready to die...
> [  447.403532] cpu 137 (hwid 137) Ready to die...
> [  447.403772] cpu 138 (hwid 138) Ready to die...
> [  447.403839] cpu 139 (hwid 139) Ready to die...
> [  447.403887] cpu 140 (hwid 140) Ready to die...
> [  447.403937] cpu 141 (hwid 141) Ready to die...
> [  447.403979] cpu 142 (hwid 142) Ready to die...
> [  447.404038] cpu 143 (hwid 143) Ready to die...
> [  447.513546] cpu 128 (hwid 128) Ready to die...
> [  447.693533] cpu 129 (hwid 129) Ready to die...
> [  447.693999] cpu 130 (hwid 130) Ready to die...
> [  447.703530] cpu 131 (hwid 131) Ready to die...
> [  447.704087] Querying DEAD? cpu 132 (132) shows 2
> [  447.704102] cpu 132 (hwid 132) Ready to die...
> [  447.713534] cpu 133 (hwid 133) Ready to die...
> [  447.714064] Querying DEAD? cpu 134 (134) shows 2
>
> This is a race between one CPU stopping and another one calling
> pseries_cpu_die() to wait for it to stop. That function does a short busy
> loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
> that it is stopped, but I think there's a lot for the stopping CPU to do
> which may take longer than this loop allows.
>
> As can be seen in the dmesg right before or after the "Querying DEAD?"
> messages, if pseries_cpu_die() waited a little longer it would have seen
> the CPU in the stopped state.
>
> What I think is going on is that CPU 134 was inactive at the time it was
> unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
> immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
> and start the process of stopping itself. The busy loop is not long enough
> to allow for the CPU to wake up and complete the stopping process.
>
> This can be a problem because if the busy loop finishes too early, then the
> kernel may offline another CPU before the previous one finished dying,
> which would lead to two concurrent calls to rtas-stop-self, which is
> prohibited by the PAPR.
>
> We can make the race a lot more even if we only start querying if the CPU
> is stopped when the stopping CPU is close to call rtas_stop_self(). Since
> pseries_mach_cpu_die() sets the CPU current state to offline almost
> immediately before calling rtas_stop_self(), we use that as a signal that
> it is either already stopped or very close to that point, and we can start
> the busy loop.
>
> As suggested by Michael Ellerman, this patch also changes the busy loop to
> wait for a fixed amount of wall time. Based on the measurements that
> Gautham did on a POWER9 system, in successful cases of
> smp_query_cpu_stopped(cpu) returning affirmative, the maximum time spent
> inside the loop was was 10 ms. This patch loops for 20 ms just be sure.
>
> Signed-off-by: Thiago Jung Bauermann 
> Analyzed-by: Gautham R Shenoy 
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> I have seen this problem since v4.8. Should this patch go to stable as
> well?
>
> Changes since v2:
> - Increaded busy loop to 200 iterations so that it can last up to 20 ms
>   (suggested by Gautham).
> - Changed commit message to include Gautham's remarks.
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 97feb6e79f1a..ac6dc35ab829 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,13 +214,22 @@ static void pseries_cpu_die(unsigned int cpu)
>   msleep(1);
>   }
>   } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> + /*
> +  * If the current state is not offline yet, it means that the
> +  * dying CPU (which is either in pseries_mach_cpu_die() or in
> +  * the process of getting there) didn't have a chance yet to
> +  * call rtas_stop_self() and therefore it's too early to query
> +  * if the CPU is stopped.
> +  */
> + spin_event_timeout(get_cpu_current_state(cpu) == 
> CPU_STATE_OFFLINE,
> +10, 100);
>  
> - for (tries = 0; tries < 25; tries++) {
> + for (tries = 0; tries < 200; tries++) {
>   cpu_status = smp_query_cpu_stopped(pcpu);
>   if (cpu_status == QCSS_STOPPED ||
>   cpu_status == QCSS_HARDWARE_ERROR)
> 

Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-03-12 Thread Thiago Jung Bauermann


Gautham R Shenoy  writes:

>> Signed-off-by: Thiago Jung Bauermann 
>
> Thanks for this version. I have tested the patch and we no longer see
> the "Querying DEAD? cpu X (Y) shows 2" message.
>
>
> Tested-and-Reviewed-by: Gautham R. Shenoy 

Thanks for reviewing and testing the patch!

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-03-12 Thread Gautham R Shenoy
Hello Thiago,

On Mon, Mar 11, 2019 at 04:35:17PM -0300, Thiago Jung Bauermann wrote:
> When testing DLPAR CPU add/remove on a system under stress,
> pseries_cpu_die() doesn't wait long enough for a CPU to die:
> 
> [  446.983944] cpu 148 (hwid 148) Ready to die...
> [  446.984062] cpu 149 (hwid 149) Ready to die...
> [  446.993518] cpu 150 (hwid 150) Ready to die...
> [  446.993543] Querying DEAD? cpu 150 (150) shows 2
> [  446.994098] cpu 151 (hwid 151) Ready to die...
> [  447.133726] cpu 136 (hwid 136) Ready to die...
> [  447.403532] cpu 137 (hwid 137) Ready to die...
> [  447.403772] cpu 138 (hwid 138) Ready to die...
> [  447.403839] cpu 139 (hwid 139) Ready to die...
> [  447.403887] cpu 140 (hwid 140) Ready to die...
> [  447.403937] cpu 141 (hwid 141) Ready to die...
> [  447.403979] cpu 142 (hwid 142) Ready to die...
> [  447.404038] cpu 143 (hwid 143) Ready to die...
> [  447.513546] cpu 128 (hwid 128) Ready to die...
> [  447.693533] cpu 129 (hwid 129) Ready to die...
> [  447.693999] cpu 130 (hwid 130) Ready to die...
> [  447.703530] cpu 131 (hwid 131) Ready to die...
> [  447.704087] Querying DEAD? cpu 132 (132) shows 2
> [  447.704102] cpu 132 (hwid 132) Ready to die...
> [  447.713534] cpu 133 (hwid 133) Ready to die...
> [  447.714064] Querying DEAD? cpu 134 (134) shows 2
> 
> This is a race between one CPU stopping and another one calling
> pseries_cpu_die() to wait for it to stop. That function does a short busy
> loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
> that it is stopped, but I think there's a lot for the stopping CPU to do
> which may take longer than this loop allows.
> 
> As can be seen in the dmesg right before or after the "Querying DEAD?"
> messages, if pseries_cpu_die() waited a little longer it would have seen
> the CPU in the stopped state.
> 
> What I think is going on is that CPU 134 was inactive at the time it was
> unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
> immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
> and start the process of stopping itself. The busy loop is not long enough
> to allow for the CPU to wake up and complete the stopping process.
> 
> This can be a problem because if the busy loop finishes too early, then the
> kernel may offline another CPU before the previous one finished dying,
> which would lead to two concurrent calls to rtas-stop-self, which is
> prohibited by the PAPR.
> 
> We can make the race a lot more even if we only start querying if the CPU
> is stopped when the stopping CPU is close to call rtas_stop_self(). Since
> pseries_mach_cpu_die() sets the CPU current state to offline almost
> immediately before calling rtas_stop_self(), we use that as a signal that
> it is either already stopped or very close to that point, and we can start
> the busy loop.
> 
> As suggested by Michael Ellerman, this patch also changes the busy loop to
> wait for a fixed amount of wall time. Based on the measurements that
> Gautham did on a POWER9 system, in successful cases of
> smp_query_cpu_stopped(cpu) returning affirmative, the maximum time spent
> inside the loop was was 10 ms. This patch loops for 20 ms just be sure.
> 
> Signed-off-by: Thiago Jung Bauermann 

Thanks for this version. I have tested the patch and we no longer see
the "Querying DEAD? cpu X (Y) shows 2" message.


Tested-and-Reviewed-by: Gautham R. Shenoy 


--
Thanks and Regards
gautham.