Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-09 Thread HATAYAMA Daisuke

(2013/09/04 15:12), Borislav Petkov wrote:

On Mon, Sep 02, 2013 at 06:42:44PM +0900, HATAYAMA Daisuke wrote:

The reason why I don't lookup BSP flag in MSR is that it's impossible.
To read MSR of some CPU, we need to use rdmsr instruction on the CPU.
However, in case of this issue, the BSP is halting or running in
the kdump 1st kernel.


Yes, and on the AP, that flag would be cleared which makes it not a BSP.


A whole explanation is written in the patch description.


Those tend to get lost in git history when a bunch of whitespace jerk
offs appear ontop. So a nicely written comment in the code could be very
helpful.

Thanks.



Yes, that's the point this patch series misses. I'll describe the explanation
in Documentation/kexec/kexec.txt and point it in the comment of boot-up code.

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-09 Thread HATAYAMA Daisuke

(2013/09/04 15:12), Borislav Petkov wrote:

On Mon, Sep 02, 2013 at 06:42:44PM +0900, HATAYAMA Daisuke wrote:

The reason why I don't lookup BSP flag in MSR is that it's impossible.
To read MSR of some CPU, we need to use rdmsr instruction on the CPU.
However, in case of this issue, the BSP is halting or running in
the kdump 1st kernel.


Yes, and on the AP, that flag would be cleared which makes it not a BSP.


A whole explanation is written in the patch description.


Those tend to get lost in git history when a bunch of whitespace jerk
offs appear ontop. So a nicely written comment in the code could be very
helpful.

Thanks.



Yes, that's the point this patch series misses. I'll describe the explanation
in Documentation/kexec/kexec.txt and point it in the comment of boot-up code.

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-04 Thread Borislav Petkov
On Mon, Sep 02, 2013 at 06:42:44PM +0900, HATAYAMA Daisuke wrote:
> The reason why I don't lookup BSP flag in MSR is that it's impossible.
> To read MSR of some CPU, we need to use rdmsr instruction on the CPU.
> However, in case of this issue, the BSP is halting or running in
> the kdump 1st kernel.

Yes, and on the AP, that flag would be cleared which makes it not a BSP.

> A whole explanation is written in the patch description.

Those tend to get lost in git history when a bunch of whitespace jerk
offs appear ontop. So a nicely written comment in the code could be very
helpful.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-04 Thread Borislav Petkov
On Mon, Sep 02, 2013 at 06:42:44PM +0900, HATAYAMA Daisuke wrote:
 The reason why I don't lookup BSP flag in MSR is that it's impossible.
 To read MSR of some CPU, we need to use rdmsr instruction on the CPU.
 However, in case of this issue, the BSP is halting or running in
 the kdump 1st kernel.

Yes, and on the AP, that flag would be cleared which makes it not a BSP.

 A whole explanation is written in the patch description.

Those tend to get lost in git history when a bunch of whitespace jerk
offs appear ontop. So a nicely written comment in the code could be very
helpful.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-02 Thread HATAYAMA Daisuke

(2013/09/02 16:13), Borislav Petkov wrote:

On Mon, Sep 02, 2013 at 11:32:59AM +0900, HATAYAMA Daisuke wrote:

As you suggest, boot_cpu seems more understandable also to me. BTW,
please notice that it doesn't denote that the CPU we're booting on
currently, but that the CPU with BSP flag set.


Hmm, by "BSP flag set" you mean it is the first LAPIC entry in the MADT,
correct? At least this is the case when you set isbsp to true. Because,
there's also the BSC flag in APIC_BAR (MSR 0x1b) which denotes the
bootstrapping core on node 0.



The reason why I don't lookup BSP flag in MSR is that it's impossible.
To read MSR of some CPU, we need to use rdmsr instruction on the CPU.
However, in case of this issue, the BSP is halting or running in
the kdump 1st kernel.

A whole explanation is written in the patch description.

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-02 Thread Borislav Petkov
On Mon, Sep 02, 2013 at 11:32:59AM +0900, HATAYAMA Daisuke wrote:
> As you suggest, boot_cpu seems more understandable also to me. BTW,
> please notice that it doesn't denote that the CPU we're booting on
> currently, but that the CPU with BSP flag set.

Hmm, by "BSP flag set" you mean it is the first LAPIC entry in the MADT,
correct? At least this is the case when you set isbsp to true. Because,
there's also the BSC flag in APIC_BAR (MSR 0x1b) which denotes the
bootstrapping core on node 0.

> In general, current code uses many terms to denote the cpu that is run
> at kernel boot-up processing such as boot cpu, bsp, cpu0 and possibly
> others since in usual situation, boot cpu is always BSP and assigned
> to cpu0. But it is not the case in case of kexec. I'm using the word
> bsp purposely in the isbsp to mean the CPU with BSP flag set.
>
> So I think it's better to use bsp_cpu here to denote the CPU with BSP
> flag set.

Right.

> For the comment, how about the following one?
> 
> /*
>  * In this case, boot cpu is AP. This can happen on
>  * kexec/kdump. Consider the case that crash happens on some
>  * AP and enters kdump 2nd kernel with the AP.
>  *
>  * Then, there's issue that if we send INIT to BSP, due to x86
>  * hardware specification, it is forced to jump at BIOS init
>  * code and system hangs or resets immediately.
>  *
>  * To avoid the issue, we disable BSP. Then, there's no longer
>  * possbility to send INIT to BSP.
>  */

Yes, much better. Especially when looking down the road and people have
forgotten what the whole fuss was about, a nice detailed comment is
priceless.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-02 Thread Borislav Petkov
On Mon, Sep 02, 2013 at 11:32:59AM +0900, HATAYAMA Daisuke wrote:
 As you suggest, boot_cpu seems more understandable also to me. BTW,
 please notice that it doesn't denote that the CPU we're booting on
 currently, but that the CPU with BSP flag set.

Hmm, by BSP flag set you mean it is the first LAPIC entry in the MADT,
correct? At least this is the case when you set isbsp to true. Because,
there's also the BSC flag in APIC_BAR (MSR 0x1b) which denotes the
bootstrapping core on node 0.

 In general, current code uses many terms to denote the cpu that is run
 at kernel boot-up processing such as boot cpu, bsp, cpu0 and possibly
 others since in usual situation, boot cpu is always BSP and assigned
 to cpu0. But it is not the case in case of kexec. I'm using the word
 bsp purposely in the isbsp to mean the CPU with BSP flag set.

 So I think it's better to use bsp_cpu here to denote the CPU with BSP
 flag set.

Right.

 For the comment, how about the following one?
 
 /*
  * In this case, boot cpu is AP. This can happen on
  * kexec/kdump. Consider the case that crash happens on some
  * AP and enters kdump 2nd kernel with the AP.
  *
  * Then, there's issue that if we send INIT to BSP, due to x86
  * hardware specification, it is forced to jump at BIOS init
  * code and system hangs or resets immediately.
  *
  * To avoid the issue, we disable BSP. Then, there's no longer
  * possbility to send INIT to BSP.
  */

Yes, much better. Especially when looking down the road and people have
forgotten what the whole fuss was about, a nice detailed comment is
priceless.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-02 Thread HATAYAMA Daisuke

(2013/09/02 16:13), Borislav Petkov wrote:

On Mon, Sep 02, 2013 at 11:32:59AM +0900, HATAYAMA Daisuke wrote:

As you suggest, boot_cpu seems more understandable also to me. BTW,
please notice that it doesn't denote that the CPU we're booting on
currently, but that the CPU with BSP flag set.


Hmm, by BSP flag set you mean it is the first LAPIC entry in the MADT,
correct? At least this is the case when you set isbsp to true. Because,
there's also the BSC flag in APIC_BAR (MSR 0x1b) which denotes the
bootstrapping core on node 0.



The reason why I don't lookup BSP flag in MSR is that it's impossible.
To read MSR of some CPU, we need to use rdmsr instruction on the CPU.
However, in case of this issue, the BSP is halting or running in
the kdump 1st kernel.

A whole explanation is written in the patch description.

--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-01 Thread HATAYAMA Daisuke

(2013/08/31 14:22), Borislav Petkov wrote:

On Thu, Aug 29, 2013 at 06:28:04PM +0900, HATAYAMA Daisuke wrote:

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 66cab35..fd969d1 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2113,13 +2113,29 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
  }

-void generic_processor_info(int apicid, int version)
+void generic_processor_info(int apicid, bool isbsp, int version)
  {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

/*
+* If boot cpu is AP, we now don't have any way to initialize
+* BSP. To save memory consumed, we disable BSP this case and


I don't think we disable the BSP just so that we save memory and rather
because we hang in the kdump kernel otherwise, right?



Thanks for your reviewing.

Yes, primary reason of disabling BSP is to avoid hang/reset in the kdump
2nd kernel. Saving memory is the secondary merit compared with
the user-space workaround by specifying nr_cpus=1 or maxcpus=1 and
waking up APs later except for the BSP. I will not write this in the
next version.


+* use (N-1)-cpus.
+*/
+   if (isbsp && !boot_cpu_is_bsp) {


This variable naming looks confusing, IMHO. It would probably be more
understandable if 'isbsp' was called 'boot_cpu' to denote that this is
the CPU we're booting on currently. The comment above it then explains
that it is an AP and it might also refer to the issue why we're doing
that.



As you suggest, boot_cpu seems more understandable also to me. BTW, please
notice that it doesn't denote that the CPU we're booting on currently,
but that the CPU with BSP flag set.

In general, current code uses many terms to denote the cpu that is run
at kernel boot-up processing such as boot cpu, bsp, cpu0 and possibly others
since in usual situation, boot cpu is always BSP and assigned to cpu0.
But it is not the case in case of kexec. I'm using the word bsp purposely
in the isbsp to mean the CPU with BSP flag set.

So I think it's better to use bsp_cpu here to denote the CPU with BSP flag set.

For the comment, how about the following one?

/*
 * In this case, boot cpu is AP. This can happen on
 * kexec/kdump. Consider the case that crash happens on some
 * AP and enters kdump 2nd kernel with the AP.
 *
 * Then, there's issue that if we send INIT to BSP, due to x86
 * hardware specification, it is forced to jump at BIOS init
 * code and system hangs or resets immediately.
 *
 * To avoid the issue, we disable BSP. Then, there's no longer
 * possbility to send INIT to BSP.
 */


+   int thiscpu = num_processors + disabled_cpus;
+
+   pr_warning("ACPI: The boot cpu is not BSP. "
+  "The BSP Processor %d/0x%x ignored.\n",
+  thiscpu, apicid);


Visible comment, so needs a bit of correcting:

"ACPI: We're not booting on the BSP; BSP %d/0x%x ignored."



Yes, I'll use this message in the next patch.


+
+   disabled_cpus++;
+   return;
+   }
+
+   /*
 * If boot cpu has not been detected yet, then only allow upto
 * nr_cpu_ids - 1 processors and keep one slot free for boot cpu
 */


Thanks.




--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-09-01 Thread HATAYAMA Daisuke

(2013/08/31 14:22), Borislav Petkov wrote:

On Thu, Aug 29, 2013 at 06:28:04PM +0900, HATAYAMA Daisuke wrote:

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 66cab35..fd969d1 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2113,13 +2113,29 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
  }

-void generic_processor_info(int apicid, int version)
+void generic_processor_info(int apicid, bool isbsp, int version)
  {
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
phys_cpu_present_map);

/*
+* If boot cpu is AP, we now don't have any way to initialize
+* BSP. To save memory consumed, we disable BSP this case and


I don't think we disable the BSP just so that we save memory and rather
because we hang in the kdump kernel otherwise, right?



Thanks for your reviewing.

Yes, primary reason of disabling BSP is to avoid hang/reset in the kdump
2nd kernel. Saving memory is the secondary merit compared with
the user-space workaround by specifying nr_cpus=1 or maxcpus=1 and
waking up APs later except for the BSP. I will not write this in the
next version.


+* use (N-1)-cpus.
+*/
+   if (isbsp  !boot_cpu_is_bsp) {


This variable naming looks confusing, IMHO. It would probably be more
understandable if 'isbsp' was called 'boot_cpu' to denote that this is
the CPU we're booting on currently. The comment above it then explains
that it is an AP and it might also refer to the issue why we're doing
that.



As you suggest, boot_cpu seems more understandable also to me. BTW, please
notice that it doesn't denote that the CPU we're booting on currently,
but that the CPU with BSP flag set.

In general, current code uses many terms to denote the cpu that is run
at kernel boot-up processing such as boot cpu, bsp, cpu0 and possibly others
since in usual situation, boot cpu is always BSP and assigned to cpu0.
But it is not the case in case of kexec. I'm using the word bsp purposely
in the isbsp to mean the CPU with BSP flag set.

So I think it's better to use bsp_cpu here to denote the CPU with BSP flag set.

For the comment, how about the following one?

/*
 * In this case, boot cpu is AP. This can happen on
 * kexec/kdump. Consider the case that crash happens on some
 * AP and enters kdump 2nd kernel with the AP.
 *
 * Then, there's issue that if we send INIT to BSP, due to x86
 * hardware specification, it is forced to jump at BIOS init
 * code and system hangs or resets immediately.
 *
 * To avoid the issue, we disable BSP. Then, there's no longer
 * possbility to send INIT to BSP.
 */


+   int thiscpu = num_processors + disabled_cpus;
+
+   pr_warning(ACPI: The boot cpu is not BSP. 
+  The BSP Processor %d/0x%x ignored.\n,
+  thiscpu, apicid);


Visible comment, so needs a bit of correcting:

ACPI: We're not booting on the BSP; BSP %d/0x%x ignored.



Yes, I'll use this message in the next patch.


+
+   disabled_cpus++;
+   return;
+   }
+
+   /*
 * If boot cpu has not been detected yet, then only allow upto
 * nr_cpu_ids - 1 processors and keep one slot free for boot cpu
 */


Thanks.




--
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-08-30 Thread Borislav Petkov
On Thu, Aug 29, 2013 at 06:28:04PM +0900, HATAYAMA Daisuke wrote:
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 66cab35..fd969d1 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2113,13 +2113,29 @@ void disconnect_bsp_APIC(int virt_wire_setup)
>   apic_write(APIC_LVT1, value);
>  }
>  
> -void generic_processor_info(int apicid, int version)
> +void generic_processor_info(int apicid, bool isbsp, int version)
>  {
>   int cpu, max = nr_cpu_ids;
>   bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
>   phys_cpu_present_map);
>  
>   /*
> +  * If boot cpu is AP, we now don't have any way to initialize
> +  * BSP. To save memory consumed, we disable BSP this case and

I don't think we disable the BSP just so that we save memory and rather
because we hang in the kdump kernel otherwise, right?

> +  * use (N-1)-cpus.
> +  */
> + if (isbsp && !boot_cpu_is_bsp) {

This variable naming looks confusing, IMHO. It would probably be more
understandable if 'isbsp' was called 'boot_cpu' to denote that this is
the CPU we're booting on currently. The comment above it then explains
that it is an AP and it might also refer to the issue why we're doing
that.

> + int thiscpu = num_processors + disabled_cpus;
> +
> + pr_warning("ACPI: The boot cpu is not BSP. "
> +"The BSP Processor %d/0x%x ignored.\n",
> +thiscpu, apicid);

Visible comment, so needs a bit of correcting:

"ACPI: We're not booting on the BSP; BSP %d/0x%x ignored."

> +
> + disabled_cpus++;
> + return;
> + }
> +
> + /*
>* If boot cpu has not been detected yet, then only allow upto
>* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
>*/

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-08-30 Thread Borislav Petkov
On Thu, Aug 29, 2013 at 06:28:04PM +0900, HATAYAMA Daisuke wrote:
 diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
 index 66cab35..fd969d1 100644
 --- a/arch/x86/kernel/apic/apic.c
 +++ b/arch/x86/kernel/apic/apic.c
 @@ -2113,13 +2113,29 @@ void disconnect_bsp_APIC(int virt_wire_setup)
   apic_write(APIC_LVT1, value);
  }
  
 -void generic_processor_info(int apicid, int version)
 +void generic_processor_info(int apicid, bool isbsp, int version)
  {
   int cpu, max = nr_cpu_ids;
   bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
   phys_cpu_present_map);
  
   /*
 +  * If boot cpu is AP, we now don't have any way to initialize
 +  * BSP. To save memory consumed, we disable BSP this case and

I don't think we disable the BSP just so that we save memory and rather
because we hang in the kdump kernel otherwise, right?

 +  * use (N-1)-cpus.
 +  */
 + if (isbsp  !boot_cpu_is_bsp) {

This variable naming looks confusing, IMHO. It would probably be more
understandable if 'isbsp' was called 'boot_cpu' to denote that this is
the CPU we're booting on currently. The comment above it then explains
that it is an AP and it might also refer to the issue why we're doing
that.

 + int thiscpu = num_processors + disabled_cpus;
 +
 + pr_warning(ACPI: The boot cpu is not BSP. 
 +The BSP Processor %d/0x%x ignored.\n,
 +thiscpu, apicid);

Visible comment, so needs a bit of correcting:

ACPI: We're not booting on the BSP; BSP %d/0x%x ignored.

 +
 + disabled_cpus++;
 + return;
 + }
 +
 + /*
* If boot cpu has not been detected yet, then only allow upto
* nr_cpu_ids - 1 processors and keep one slot free for boot cpu
*/

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-08-29 Thread HATAYAMA Daisuke
Currently, on x86 architecture, if crash happens on AP in the kdump
1st kernel, the 2nd kernel fails to wake up multiple CPUs. The typical
behaviour we actually see is immediate system reset or hang.

This comes from the hardware specification that the processor with BSP
flag is jumped at BIOS init code when receiving INIT; the behaviour we
then see depends on the init code.

This never happens if we use only one cpu in the 2nd kernel. So, we
have avoided the issue by the workaround that specifying maxcpus=1 or
nr_cpus=1 in kernel parameter of the 2nd kernel.

In order to address the issue, this patch disables BSP if boot cpu is
AP. Then, there's no longer the BSP. There's no longer possibility to
send INIT to the BSP.

Before this idea we discussed the following two ideas but we cannot
adopt them in each reasons:

  1. Switch CPU from AP to BSP via IPI NMI at crash in the 1st kernel

This is done in the kdump crash path where logic is in
inconsistent state. Any part of memory can be corrupted, including
hardware-related table being accessed for example when paging is
performed or interruption is performed.

  2. Unset BSP flag of the boot cpu in the 1st kernel

Unsetting BSP flag can affect some real world firmware badly. For
example, Ma verified that some HP systems fail to reboot under
this configuration. See:
http://lkml.indiana.edu/hypermail/linux/kernel/1308.1/03574.html

Due to the idea 1, we have to address the issue in the 2nd kernel on
AP. Then, it's impossible to know which CPU is BSP by rdmsr
instruction because the CPU is the one we are now trying to wake
up. From the same reason, it's also impossible to unset BSP flag of
the BSP by wrmsr instruction.

Next, due to the idea 2, BSP is halting in the 1st kernel while
keeping BSP flag set (or possibly could be running somewhere in
catastrophic state.) In generall, CPUs except for the boot cpu in the
2nd kernel -- the cpu under which crash happened --- can be thought of
as remaining in any inconsistent state in the 1st kernel. For APs,
it's possible to recover sane state by initiating INIT to them; see
3.7.3 Processor-specific INIT in MultiProcessor
specification. However, there's no way for BSP. Therefore, there's no
other way to disable BSP.

My motivation is to generate crash dump quickly on the system with
huge memory. We can assume such system also has a lot of N-cpus and
(N-1)-cpus are still available.

To identify which CPU is BSP, we lookup ACPI table or MP table. One
concern is that ACPI guidlines BIOS *should* list the BSP in the first
MADT LAPIC entry; not *must*. In this sense, this logic relis on BIOS
following ACPI's guideline. On the other hand, we don't need to worry
about this in MP table case because it has explit BSP flag.

To avoid any undesirable bahaviour caused by any broken BIOS that
doesn't conform to the guideline, it's enough to limit the number of
cpus to 1 by specifying maxcpu=1 or nr_cpus=1, as is currently done in
default kdump configuration. (But of course, it's problematic in
maxcpu=1 case if trying to wake up other cpus later in user space.)

SFI and devicetree doesn't provide BSP information, so there's no
functionality change in their codes, only assigning false for all the
entries, keeping interface uniform.

Signed-off-by: HATAYAMA Daisuke 
---
 arch/x86/include/asm/mpspec.h |2 +-
 arch/x86/kernel/acpi/boot.c   |   11 ++-
 arch/x86/kernel/apic/apic.c   |   18 +-
 arch/x86/kernel/devicetree.c  |1 +
 arch/x86/kernel/mpparse.c |   15 +--
 arch/x86/platform/sfi/sfi.c   |2 +-
 6 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index a8a4338..d96f409 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -97,7 +97,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #define default_get_smp_config x86_init_uint_noop
 #endif
 
-void generic_processor_info(int apicid, int version);
+void generic_processor_info(int apicid, bool isbsp, int version);
 #ifdef CONFIG_ACPI
 extern void mp_register_ioapic(int id, u32 address, u32 gsi_base);
 extern void mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger,
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2627a81..78d95ec 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -198,6 +198,7 @@ static int __init acpi_parse_madt(struct acpi_table_header 
*table)
 static void acpi_register_lapic(int id, u8 enabled)
 {
unsigned int ver = 0;
+   bool isbsp = false;
 
if (id >= (MAX_LOCAL_APIC-1)) {
printk(KERN_INFO PREFIX "skipped apicid that is too big\n");
@@ -212,7 +213,15 @@ static void acpi_register_lapic(int id, u8 enabled)
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   generic_processor_info(id, ver);
+   /*
+* 

[PATCH 2/2] x86, apic: Disable BSP if boot cpu is AP

2013-08-29 Thread HATAYAMA Daisuke
Currently, on x86 architecture, if crash happens on AP in the kdump
1st kernel, the 2nd kernel fails to wake up multiple CPUs. The typical
behaviour we actually see is immediate system reset or hang.

This comes from the hardware specification that the processor with BSP
flag is jumped at BIOS init code when receiving INIT; the behaviour we
then see depends on the init code.

This never happens if we use only one cpu in the 2nd kernel. So, we
have avoided the issue by the workaround that specifying maxcpus=1 or
nr_cpus=1 in kernel parameter of the 2nd kernel.

In order to address the issue, this patch disables BSP if boot cpu is
AP. Then, there's no longer the BSP. There's no longer possibility to
send INIT to the BSP.

Before this idea we discussed the following two ideas but we cannot
adopt them in each reasons:

  1. Switch CPU from AP to BSP via IPI NMI at crash in the 1st kernel

This is done in the kdump crash path where logic is in
inconsistent state. Any part of memory can be corrupted, including
hardware-related table being accessed for example when paging is
performed or interruption is performed.

  2. Unset BSP flag of the boot cpu in the 1st kernel

Unsetting BSP flag can affect some real world firmware badly. For
example, Ma verified that some HP systems fail to reboot under
this configuration. See:
http://lkml.indiana.edu/hypermail/linux/kernel/1308.1/03574.html

Due to the idea 1, we have to address the issue in the 2nd kernel on
AP. Then, it's impossible to know which CPU is BSP by rdmsr
instruction because the CPU is the one we are now trying to wake
up. From the same reason, it's also impossible to unset BSP flag of
the BSP by wrmsr instruction.

Next, due to the idea 2, BSP is halting in the 1st kernel while
keeping BSP flag set (or possibly could be running somewhere in
catastrophic state.) In generall, CPUs except for the boot cpu in the
2nd kernel -- the cpu under which crash happened --- can be thought of
as remaining in any inconsistent state in the 1st kernel. For APs,
it's possible to recover sane state by initiating INIT to them; see
3.7.3 Processor-specific INIT in MultiProcessor
specification. However, there's no way for BSP. Therefore, there's no
other way to disable BSP.

My motivation is to generate crash dump quickly on the system with
huge memory. We can assume such system also has a lot of N-cpus and
(N-1)-cpus are still available.

To identify which CPU is BSP, we lookup ACPI table or MP table. One
concern is that ACPI guidlines BIOS *should* list the BSP in the first
MADT LAPIC entry; not *must*. In this sense, this logic relis on BIOS
following ACPI's guideline. On the other hand, we don't need to worry
about this in MP table case because it has explit BSP flag.

To avoid any undesirable bahaviour caused by any broken BIOS that
doesn't conform to the guideline, it's enough to limit the number of
cpus to 1 by specifying maxcpu=1 or nr_cpus=1, as is currently done in
default kdump configuration. (But of course, it's problematic in
maxcpu=1 case if trying to wake up other cpus later in user space.)

SFI and devicetree doesn't provide BSP information, so there's no
functionality change in their codes, only assigning false for all the
entries, keeping interface uniform.

Signed-off-by: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
---
 arch/x86/include/asm/mpspec.h |2 +-
 arch/x86/kernel/acpi/boot.c   |   11 ++-
 arch/x86/kernel/apic/apic.c   |   18 +-
 arch/x86/kernel/devicetree.c  |1 +
 arch/x86/kernel/mpparse.c |   15 +--
 arch/x86/platform/sfi/sfi.c   |2 +-
 6 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index a8a4338..d96f409 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -97,7 +97,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
 #define default_get_smp_config x86_init_uint_noop
 #endif
 
-void generic_processor_info(int apicid, int version);
+void generic_processor_info(int apicid, bool isbsp, int version);
 #ifdef CONFIG_ACPI
 extern void mp_register_ioapic(int id, u32 address, u32 gsi_base);
 extern void mp_override_legacy_irq(u8 bus_irq, u8 polarity, u8 trigger,
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2627a81..78d95ec 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -198,6 +198,7 @@ static int __init acpi_parse_madt(struct acpi_table_header 
*table)
 static void acpi_register_lapic(int id, u8 enabled)
 {
unsigned int ver = 0;
+   bool isbsp = false;
 
if (id = (MAX_LOCAL_APIC-1)) {
printk(KERN_INFO PREFIX skipped apicid that is too big\n);
@@ -212,7 +213,15 @@ static void acpi_register_lapic(int id, u8 enabled)
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
 
-   generic_processor_info(id, ver);