Re: [PATCH] x86: fix cpu-hotplug regression
On Fri, Nov 09, 2007 at 12:16:34PM -0800, Andrew Morton wrote: > On Wed, 7 Nov 2007 23:15:38 +0900 > "Akinobu Mita" <[EMAIL PROTECTED]> wrote: > > > > [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on > > > CPU_UP_PREPARE) > > > > > > Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc > > > ("cpu hotplug: mce: fix cpu hotplug error handling"). > > > > > > For CPUs not brought up during boot (using maxcpus and additional_cpus > > > parameters) we don't know whether mce is supported or not at > > > "CPU_UP_PREPARE"-time. > > > Thus mce_cpu_callback should be called after the CPU is online. > > > > Thank you for finding and fixing the problem. > > I added two fixes to your patch: > > > > - Avoid mce_remove_device() for the CPU that is not correctly initialized > > by mce_create_device() failure. > > > > - make CPU_ONLINE callback always return NOTIFY_OK. > > Because CPU_ONLINE callback return value is always ignored. > > > > > Signed-off-by: Andreas Herrmann <[EMAIL PROTECTED]> > > > > [EMAIL PROTECTED]: make CPU_ONLINE callback always return NOTIFY_OK] > > [EMAIL PROTECTED]: avoid mce_remove_device() for not initialized device] > > > > Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]> > > Andreas, could you please review and preferably runtime-test this new > version? I'll test it today ... > Assuming that all goes well could one of you please prepare a final patch > with a complete changelog? ... and can send a final patch after successful testing. Regards, Andreas -- Operating | AMD Saxony Limited Liability Company & Co. KG, System | Wilschdorfer Landstr. 101, 01109 Dresden, Germany Research | Register Court Dresden: HRA 4896, General Partner authorized Center | to represent: AMD Saxony LLC (Wilmington, Delaware, US) (OSRC) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Fri, Nov 09, 2007 at 12:16:34PM -0800, Andrew Morton wrote: On Wed, 7 Nov 2007 23:15:38 +0900 Akinobu Mita [EMAIL PROTECTED] wrote: [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on CPU_UP_PREPARE) Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc (cpu hotplug: mce: fix cpu hotplug error handling). For CPUs not brought up during boot (using maxcpus and additional_cpus parameters) we don't know whether mce is supported or not at CPU_UP_PREPARE-time. Thus mce_cpu_callback should be called after the CPU is online. Thank you for finding and fixing the problem. I added two fixes to your patch: - Avoid mce_remove_device() for the CPU that is not correctly initialized by mce_create_device() failure. - make CPU_ONLINE callback always return NOTIFY_OK. Because CPU_ONLINE callback return value is always ignored. Signed-off-by: Andreas Herrmann [EMAIL PROTECTED] [EMAIL PROTECTED]: make CPU_ONLINE callback always return NOTIFY_OK] [EMAIL PROTECTED]: avoid mce_remove_device() for not initialized device] Signed-off-by: Akinobu Mita [EMAIL PROTECTED] Andreas, could you please review and preferably runtime-test this new version? I'll test it today ... Assuming that all goes well could one of you please prepare a final patch with a complete changelog? ... and can send a final patch after successful testing. Regards, Andreas -- Operating | AMD Saxony Limited Liability Company Co. KG, System | Wilschdorfer Landstr. 101, 01109 Dresden, Germany Research | Register Court Dresden: HRA 4896, General Partner authorized Center | to represent: AMD Saxony LLC (Wilmington, Delaware, US) (OSRC) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Wed, 7 Nov 2007 23:15:38 +0900 "Akinobu Mita" <[EMAIL PROTECTED]> wrote: > > [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on > > CPU_UP_PREPARE) > > > > Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc > > ("cpu hotplug: mce: fix cpu hotplug error handling"). > > > > For CPUs not brought up during boot (using maxcpus and additional_cpus > > parameters) we don't know whether mce is supported or not at > > "CPU_UP_PREPARE"-time. > > Thus mce_cpu_callback should be called after the CPU is online. > > Thank you for finding and fixing the problem. > I added two fixes to your patch: > > - Avoid mce_remove_device() for the CPU that is not correctly initialized > by mce_create_device() failure. > > - make CPU_ONLINE callback always return NOTIFY_OK. > Because CPU_ONLINE callback return value is always ignored. > > > Signed-off-by: Andreas Herrmann <[EMAIL PROTECTED]> > > [EMAIL PROTECTED]: make CPU_ONLINE callback always return NOTIFY_OK] > [EMAIL PROTECTED]: avoid mce_remove_device() for not initialized device] > > Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]> Andreas, could you please review and preferably runtime-test this new version? Assuming that all goes well could one of you please prepare a final patch with a complete changelog? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Wed, 7 Nov 2007 23:15:38 +0900 Akinobu Mita [EMAIL PROTECTED] wrote: [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on CPU_UP_PREPARE) Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc (cpu hotplug: mce: fix cpu hotplug error handling). For CPUs not brought up during boot (using maxcpus and additional_cpus parameters) we don't know whether mce is supported or not at CPU_UP_PREPARE-time. Thus mce_cpu_callback should be called after the CPU is online. Thank you for finding and fixing the problem. I added two fixes to your patch: - Avoid mce_remove_device() for the CPU that is not correctly initialized by mce_create_device() failure. - make CPU_ONLINE callback always return NOTIFY_OK. Because CPU_ONLINE callback return value is always ignored. Signed-off-by: Andreas Herrmann [EMAIL PROTECTED] [EMAIL PROTECTED]: make CPU_ONLINE callback always return NOTIFY_OK] [EMAIL PROTECTED]: avoid mce_remove_device() for not initialized device] Signed-off-by: Akinobu Mita [EMAIL PROTECTED] Andreas, could you please review and preferably runtime-test this new version? Assuming that all goes well could one of you please prepare a final patch with a complete changelog? Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
> [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on > CPU_UP_PREPARE) > > Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc > ("cpu hotplug: mce: fix cpu hotplug error handling"). > > For CPUs not brought up during boot (using maxcpus and additional_cpus > parameters) we don't know whether mce is supported or not at > "CPU_UP_PREPARE"-time. > Thus mce_cpu_callback should be called after the CPU is online. Thank you for finding and fixing the problem. I added two fixes to your patch: - Avoid mce_remove_device() for the CPU that is not correctly initialized by mce_create_device() failure. - make CPU_ONLINE callback always return NOTIFY_OK. Because CPU_ONLINE callback return value is always ignored. > Signed-off-by: Andreas Herrmann <[EMAIL PROTECTED]> [EMAIL PROTECTED]: make CPU_ONLINE callback always return NOTIFY_OK] [EMAIL PROTECTED]: avoid mce_remove_device() for not initialized device] Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]> --- arch/x86/kernel/cpu/mcheck/mce_64.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) Index: 2.6-git/arch/x86/kernel/cpu/mcheck/mce_64.c === --- 2.6-git.orig/arch/x86/kernel/cpu/mcheck/mce_64.c +++ 2.6-git/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -802,6 +802,8 @@ static struct sysdev_attribute *mce_attr NULL }; +static cpumask_t mce_device_initialized = CPU_MASK_NONE; + /* Per cpu sysdev init. All of the cpus still share the same ctl bank */ static __cpuinit int mce_create_device(unsigned int cpu) { @@ -825,6 +827,7 @@ static __cpuinit int mce_create_device(u if (err) goto error; } + cpu_set(cpu, mce_device_initialized); return 0; error: @@ -841,10 +844,14 @@ static void mce_remove_device(unsigned i { int i; + if (!cpu_isset(cpu, mce_device_initialized)) + return; + for (i = 0; mce_attributes[i]; i++) sysdev_remove_file(_cpu(device_mce,cpu), mce_attributes[i]); sysdev_unregister(_cpu(device_mce,cpu)); + cpu_clear(cpu, mce_device_initialized); } /* Get notified when a cpu comes on/off. Be hotplug friendly. */ @@ -852,21 +859,18 @@ static int mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; - int err = 0; switch (action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: - err = mce_create_device(cpu); + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + mce_create_device(cpu); break; - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: case CPU_DEAD: case CPU_DEAD_FROZEN: mce_remove_device(cpu); break; } - return err ? NOTIFY_BAD : NOTIFY_OK; + return NOTIFY_OK; } static struct notifier_block mce_cpu_notifier = { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Wed, Nov 07, 2007 at 03:35:43AM +0100, Andi Kleen wrote: > On Wednesday 07 November 2007 02:12, Andreas Herrmann wrote: > > > In cases where not all CPUs are brought up during > > boot (e.g. using maxcpus and additional_cpus parameters) > > mce_cpu_callback now returns NOTFIY_BAD because > > for such CPUs cpu_data is not completely filled when > > the notifier is called. Thus mce_create_device fails right > > at its beginning: > > > > if (!mce_available(_data[cpu])) > > return -EIO; > > > > As a quick fix I suggest to check boot_cpu_data for MCE. > > I guess it would be better to just move the device creation > to after the CPU has booted. AKA call mce_create_dev() on CPU_ONLINE > instead. Yes, and this was the old behaviour. The mentioned patch changed it - ("do mce_create_device in CPU_UP_PREPARE instead of CPU_ONLINE"). Thinking twice about the problem it seems obvious that this part of the patch should just be reverted. Attached is a new fix (diff against 2.6.24-rc2). Regards, Andreas -- [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on CPU_UP_PREPARE) Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc ("cpu hotplug: mce: fix cpu hotplug error handling"). For CPUs not brought up during boot (using maxcpus and additional_cpus parameters) we don't know whether mce is supported or not at "CPU_UP_PREPARE"-time. Thus mce_cpu_callback should be called after the CPU is online. Signed-off-by: Andreas Herrmann <[EMAIL PROTECTED]> --- arch/x86/kernel/cpu/mcheck/mce_64.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c index b9f802e..8e83070 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_64.c +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -855,12 +855,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) int err = 0; switch (action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: err = mce_create_device(cpu); break; - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: case CPU_DEAD: case CPU_DEAD_FROZEN: mce_remove_device(cpu); -- 1.5.3.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
[PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on CPU_UP_PREPARE) Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc (cpu hotplug: mce: fix cpu hotplug error handling). For CPUs not brought up during boot (using maxcpus and additional_cpus parameters) we don't know whether mce is supported or not at CPU_UP_PREPARE-time. Thus mce_cpu_callback should be called after the CPU is online. Thank you for finding and fixing the problem. I added two fixes to your patch: - Avoid mce_remove_device() for the CPU that is not correctly initialized by mce_create_device() failure. - make CPU_ONLINE callback always return NOTIFY_OK. Because CPU_ONLINE callback return value is always ignored. Signed-off-by: Andreas Herrmann [EMAIL PROTECTED] [EMAIL PROTECTED]: make CPU_ONLINE callback always return NOTIFY_OK] [EMAIL PROTECTED]: avoid mce_remove_device() for not initialized device] Signed-off-by: Akinobu Mita [EMAIL PROTECTED] --- arch/x86/kernel/cpu/mcheck/mce_64.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) Index: 2.6-git/arch/x86/kernel/cpu/mcheck/mce_64.c === --- 2.6-git.orig/arch/x86/kernel/cpu/mcheck/mce_64.c +++ 2.6-git/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -802,6 +802,8 @@ static struct sysdev_attribute *mce_attr NULL }; +static cpumask_t mce_device_initialized = CPU_MASK_NONE; + /* Per cpu sysdev init. All of the cpus still share the same ctl bank */ static __cpuinit int mce_create_device(unsigned int cpu) { @@ -825,6 +827,7 @@ static __cpuinit int mce_create_device(u if (err) goto error; } + cpu_set(cpu, mce_device_initialized); return 0; error: @@ -841,10 +844,14 @@ static void mce_remove_device(unsigned i { int i; + if (!cpu_isset(cpu, mce_device_initialized)) + return; + for (i = 0; mce_attributes[i]; i++) sysdev_remove_file(per_cpu(device_mce,cpu), mce_attributes[i]); sysdev_unregister(per_cpu(device_mce,cpu)); + cpu_clear(cpu, mce_device_initialized); } /* Get notified when a cpu comes on/off. Be hotplug friendly. */ @@ -852,21 +859,18 @@ static int mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; - int err = 0; switch (action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: - err = mce_create_device(cpu); + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + mce_create_device(cpu); break; - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: case CPU_DEAD: case CPU_DEAD_FROZEN: mce_remove_device(cpu); break; } - return err ? NOTIFY_BAD : NOTIFY_OK; + return NOTIFY_OK; } static struct notifier_block mce_cpu_notifier = { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Wed, Nov 07, 2007 at 03:35:43AM +0100, Andi Kleen wrote: On Wednesday 07 November 2007 02:12, Andreas Herrmann wrote: In cases where not all CPUs are brought up during boot (e.g. using maxcpus and additional_cpus parameters) mce_cpu_callback now returns NOTFIY_BAD because for such CPUs cpu_data is not completely filled when the notifier is called. Thus mce_create_device fails right at its beginning: if (!mce_available(cpu_data[cpu])) return -EIO; As a quick fix I suggest to check boot_cpu_data for MCE. I guess it would be better to just move the device creation to after the CPU has booted. AKA call mce_create_dev() on CPU_ONLINE instead. Yes, and this was the old behaviour. The mentioned patch changed it - (do mce_create_device in CPU_UP_PREPARE instead of CPU_ONLINE). Thinking twice about the problem it seems obvious that this part of the patch should just be reverted. Attached is a new fix (diff against 2.6.24-rc2). Regards, Andreas -- [PATCH] x86: fix cpu hotplug regression (don't call mce_create_device on CPU_UP_PREPARE) Fix regression introduced with d435d862baca3e25e5eec236762a43251b1e7ffc (cpu hotplug: mce: fix cpu hotplug error handling). For CPUs not brought up during boot (using maxcpus and additional_cpus parameters) we don't know whether mce is supported or not at CPU_UP_PREPARE-time. Thus mce_cpu_callback should be called after the CPU is online. Signed-off-by: Andreas Herrmann [EMAIL PROTECTED] --- arch/x86/kernel/cpu/mcheck/mce_64.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c index b9f802e..8e83070 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_64.c +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -855,12 +855,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) int err = 0; switch (action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: err = mce_create_device(cpu); break; - case CPU_UP_CANCELED: - case CPU_UP_CANCELED_FROZEN: case CPU_DEAD: case CPU_DEAD_FROZEN: mce_remove_device(cpu); -- 1.5.3.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Wednesday 07 November 2007 02:12, Andreas Herrmann wrote: > In cases where not all CPUs are brought up during > boot (e.g. using maxcpus and additional_cpus parameters) > mce_cpu_callback now returns NOTFIY_BAD because > for such CPUs cpu_data is not completely filled when > the notifier is called. Thus mce_create_device fails right > at its beginning: > > if (!mce_available(_data[cpu])) > return -EIO; > > As a quick fix I suggest to check boot_cpu_data for MCE. I guess it would be better to just move the device creation to after the CPU has booted. AKA call mce_create_dev() on CPU_ONLINE instead. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: fix cpu-hotplug regression
On Wednesday 07 November 2007 02:12, Andreas Herrmann wrote: In cases where not all CPUs are brought up during boot (e.g. using maxcpus and additional_cpus parameters) mce_cpu_callback now returns NOTFIY_BAD because for such CPUs cpu_data is not completely filled when the notifier is called. Thus mce_create_device fails right at its beginning: if (!mce_available(cpu_data[cpu])) return -EIO; As a quick fix I suggest to check boot_cpu_data for MCE. I guess it would be better to just move the device creation to after the CPU has booted. AKA call mce_create_dev() on CPU_ONLINE instead. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/