RE: [PATCH]PCIE ASPM support - takes 3
> > >Hi! > >> v3->v2, fixed the issues Matthew Wilcox raised. >> >> PCI Express ASPM defines a protocol for PCI Express components in the D0 >> state to reduce Link power by placing their Links into a low power state >> and instructing the other end of the Link to do likewise. This >> capability allows hardware-autonomous, dynamic Link power reduction >> beyond what is achievable by software-only controlled power management. >> However, The device should be configured by software appropriately. >> Enabling ASPM will save power, but will introduce device latency. > >How big is the latency? 1msec? 10msec? 100usec? Haven't accurate number, but in one device, it declaims L0s latency is < 128ns, L1 latency is < 64us. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH]PCIE ASPM support - takes 3
Hi! v3-v2, fixed the issues Matthew Wilcox raised. PCI Express ASPM defines a protocol for PCI Express components in the D0 state to reduce Link power by placing their Links into a low power state and instructing the other end of the Link to do likewise. This capability allows hardware-autonomous, dynamic Link power reduction beyond what is achievable by software-only controlled power management. However, The device should be configured by software appropriately. Enabling ASPM will save power, but will introduce device latency. How big is the latency? 1msec? 10msec? 100usec? Haven't accurate number, but in one device, it declaims L0s latency is 128ns, L1 latency is 64us. Thanks, Shaohua -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
>-Original Message- >From: Robert Hancock [mailto:[EMAIL PROTECTED] >Sent: Friday, June 29, 2007 8:59 AM >To: Zan Lynx >Cc: Andrew Morton; linux-kernel@vger.kernel.org; Raj, Ashok; Li, Shaohua; >Keshavamurthy, Anil S >Subject: Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64 > >Zan Lynx wrote: >> On Thu, 2007-06-28 at 03:43 -0700, Andrew Morton wrote: >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22- >rc6/2.6.22-rc6-mm1/ >> >>> +intel-iommu-dmar-detection-and-parsing-logic.patch >>> +intel-iommu-pci-generic-helper-function.patch >>> +intel-iommu-pci-generic-helper-function-fix.patch >>> +intel-iommu-clflush_cache_range-now-takes-size-param.patch >>> +intel-iommu-iova-allocation-and-management-routines.patch >>> +intel-iommu-iova-allocation-and-management-routines-fix.patch >>> +intel-iommu-iova-allocation-and-management-routines-fix-2.patch >>> +intel-iommu-intel-iommu-driver.patch >>> +intel-iommu-intel-iommu-driver-fix.patch >>> +intel-iommu-intel-iommu-driver-fix-2.patch >>> +intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch >>> +intel-iommu-intel-iommu-cmdline-option-forcedac.patch >>> +intel-iommu-dmar-fault-handling-support.patch >>> +intel-iommu-iommu-gfx-workaround.patch >>> +intel-iommu-iommu-floppy-workaround.patch >>> +intel-iommu-iommu-floppy-workaround-fix.patch >>> +intel-iommu-iommu-floppy-workaround-fix-fix.patch >>> >>> Intel IOMMU support >> >> I believe the above patch set is causing the problem. On my first try >> with rc6-mm1 I said Yes to the CONFIG_DMAR options. (I'm nearly as good >> as random option selection :-) >> >> The system panicked during boot, I believe it was trying to detect an >> Intel IOMMU. Later when I have a camera, I will try to post a >> screenshot of the backtrace. (I can't seem to get netconsole to work on >> boot, only in a module). >> >> When I recompiled without DMAR set, things seem to be working great. I >> seem to be getting better disk read throughput than rc3-mm1, by the way. >> >> This laptop is an AMD Athlon64 on a NForce3 running a 64-bit Gentoo >> build. >> >> I'll provide more details on request, and when I get the chance. This >> is a heads-up on the BUG in case someone has an "ah ha!" moment. > >I took a picture of it, looks like the backtrace is: > >NULL pointer dereference at 024 >EIP:dmar_table_init+0x11 >intel_iommu_init+0x30 >pci_iommu_init+0xe >kernel_init+0x16e > >Presumably something is NULL in dmar_table_init that wasn't expected to >be.. I would guess it likely crashes on any system without an Intel >IOMMU in it. How about something like below? int __init dmar_table_init(void) { + if (!dmar_tbl) + return -ENODEV; parse_dmar_table(); if (list_empty(_drhd_units)) { printk(KERN_ERR PREFIX "No DMAR devices found\n"); return -ENODEV; } return 0; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
-Original Message- From: Robert Hancock [mailto:[EMAIL PROTECTED] Sent: Friday, June 29, 2007 8:59 AM To: Zan Lynx Cc: Andrew Morton; linux-kernel@vger.kernel.org; Raj, Ashok; Li, Shaohua; Keshavamurthy, Anil S Subject: Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64 Zan Lynx wrote: On Thu, 2007-06-28 at 03:43 -0700, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22- rc6/2.6.22-rc6-mm1/ +intel-iommu-dmar-detection-and-parsing-logic.patch +intel-iommu-pci-generic-helper-function.patch +intel-iommu-pci-generic-helper-function-fix.patch +intel-iommu-clflush_cache_range-now-takes-size-param.patch +intel-iommu-iova-allocation-and-management-routines.patch +intel-iommu-iova-allocation-and-management-routines-fix.patch +intel-iommu-iova-allocation-and-management-routines-fix-2.patch +intel-iommu-intel-iommu-driver.patch +intel-iommu-intel-iommu-driver-fix.patch +intel-iommu-intel-iommu-driver-fix-2.patch +intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch +intel-iommu-intel-iommu-cmdline-option-forcedac.patch +intel-iommu-dmar-fault-handling-support.patch +intel-iommu-iommu-gfx-workaround.patch +intel-iommu-iommu-floppy-workaround.patch +intel-iommu-iommu-floppy-workaround-fix.patch +intel-iommu-iommu-floppy-workaround-fix-fix.patch Intel IOMMU support I believe the above patch set is causing the problem. On my first try with rc6-mm1 I said Yes to the CONFIG_DMAR options. (I'm nearly as good as random option selection :-) The system panicked during boot, I believe it was trying to detect an Intel IOMMU. Later when I have a camera, I will try to post a screenshot of the backtrace. (I can't seem to get netconsole to work on boot, only in a module). When I recompiled without DMAR set, things seem to be working great. I seem to be getting better disk read throughput than rc3-mm1, by the way. This laptop is an AMD Athlon64 on a NForce3 running a 64-bit Gentoo build. I'll provide more details on request, and when I get the chance. This is a heads-up on the BUG in case someone has an ah ha! moment. I took a picture of it, looks like the backtrace is: NULL pointer dereference at 024 EIP:dmar_table_init+0x11 intel_iommu_init+0x30 pci_iommu_init+0xe kernel_init+0x16e Presumably something is NULL in dmar_table_init that wasn't expected to be.. I would guess it likely crashes on any system without an Intel IOMMU in it. How about something like below? int __init dmar_table_init(void) { + if (!dmar_tbl) + return -ENODEV; parse_dmar_table(); if (list_empty(dmar_drhd_units)) { printk(KERN_ERR PREFIX No DMAR devices found\n); return -ENODEV; } return 0; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
Hi, >> > If you think it is a linux bug, can you produce small test case doing >> > just the sigwait, and post it on l-k with big title "sigwait() breaks >> > when straced, and on suspend"? >> > >> > That way it is going to get some attetion, and you'll get either >> > documentation or kernel fixed. >> Looks like a linux bug to me. The refrigerator fake signal waked the >> task up and without restart for the sigwait case. How about below >> patch: > >Is there chance to fix strace case, too? sigwait() is broken in more >than one way it seems... Not sure about it. strace shows sigwait using sigtimedwait, which doesn't say it can't return error. >> linux-2.6.13-rc4-root/kernel/signal.c | 11 ++- >> 1 files changed, 10 insertions(+), 1 deletion(-) >> >> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c >> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08- >01 14:00:39.089460688 +0800 >> +++ linux-2.6.13-rc4-root/kernel/signal.c2005-08-01 >14:30:13.821660384 +0800 >> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use >> struct timespec ts; >> siginfo_t info; >> long timeout = 0; >> +int recover = 0; >> >> /* XXX: Don't preclude handling different sized sigset_t's. */ >> if (sigsetsize != sizeof(sigset_t)) >> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use >> * be awakened when they arrive. */ >> current->real_blocked = current->blocked; >> sigandsets(>blocked, >blocked, ); >> +do_recover: >> recalc_sigpending(); >> spin_unlock_irq(>sighand->siglock); >> >> current->state = TASK_INTERRUPTIBLE; >> timeout = schedule_timeout(timeout); >> >> -try_to_freeze(); >> +if (try_to_freeze()) >> +recover = 1; > >Can't you just goto do_recover here? Not sure again. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
Hi, If you think it is a linux bug, can you produce small test case doing just the sigwait, and post it on l-k with big title sigwait() breaks when straced, and on suspend? That way it is going to get some attetion, and you'll get either documentation or kernel fixed. Looks like a linux bug to me. The refrigerator fake signal waked the task up and without restart for the sigwait case. How about below patch: Is there chance to fix strace case, too? sigwait() is broken in more than one way it seems... Not sure about it. strace shows sigwait using sigtimedwait, which doesn't say it can't return error. linux-2.6.13-rc4-root/kernel/signal.c | 11 ++- 1 files changed, 10 insertions(+), 1 deletion(-) diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08- 01 14:00:39.089460688 +0800 +++ linux-2.6.13-rc4-root/kernel/signal.c2005-08-01 14:30:13.821660384 +0800 @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use struct timespec ts; siginfo_t info; long timeout = 0; +int recover = 0; /* XXX: Don't preclude handling different sized sigset_t's. */ if (sigsetsize != sizeof(sigset_t)) @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use * be awakened when they arrive. */ current-real_blocked = current-blocked; sigandsets(current-blocked, current-blocked, these); +do_recover: recalc_sigpending(); spin_unlock_irq(current-sighand-siglock); current-state = TASK_INTERRUPTIBLE; timeout = schedule_timeout(timeout); -try_to_freeze(); +if (try_to_freeze()) +recover = 1; Can't you just goto do_recover here? Not sure again. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6]cpu state clean after hot remove
On Tue, 2005-04-12 at 13:31, Li Shaohua wrote: > @@ -1052,7 +1086,7 @@ static void __init smp_boot_cpus(unsigne > if (max_cpus <= cpucount+1) > continue; > > - if (do_boot_cpu(apicid)) > + if ((cpu = alloc_cpu_id() > 0) && do_boot_cpu(apicid, cpu)) > printk("CPU #%d not responding - cannot use it.\n", > apicid); > else Oops, there is a typo in the patch. Andrew, please apply below patch against above patch. Sorry for the inconvenience. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/i386/kernel/smpboot.c~smpboot arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~smpboot 2005-04-21 11:27:53.913041424 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-21 11:28:44.103411328 +0800 @@ -1166,7 +1166,7 @@ static void __init smp_boot_cpus(unsigne if (max_cpus <= cpucount+1) continue; - if ((cpu = alloc_cpu_id() > 0) && do_boot_cpu(apicid, cpu)) + if (((cpu = alloc_cpu_id()) <= 0) || do_boot_cpu(apicid, cpu)) printk("CPU #%d not responding - cannot use it.\n", apicid); else _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/6]cpu state clean after hot remove
On Tue, 2005-04-12 at 13:31, Li Shaohua wrote: @@ -1052,7 +1086,7 @@ static void __init smp_boot_cpus(unsigne if (max_cpus = cpucount+1) continue; - if (do_boot_cpu(apicid)) + if ((cpu = alloc_cpu_id() 0) do_boot_cpu(apicid, cpu)) printk(CPU #%d not responding - cannot use it.\n, apicid); else Oops, there is a typo in the patch. Andrew, please apply below patch against above patch. Sorry for the inconvenience. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/i386/kernel/smpboot.c~smpboot arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~smpboot 2005-04-21 11:27:53.913041424 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-21 11:28:44.103411328 +0800 @@ -1166,7 +1166,7 @@ static void __init smp_boot_cpus(unsigne if (max_cpus = cpucount+1) continue; - if ((cpu = alloc_cpu_id() 0) do_boot_cpu(apicid, cpu)) + if (((cpu = alloc_cpu_id()) = 0) || do_boot_cpu(apicid, cpu)) printk(CPU #%d not responding - cannot use it.\n, apicid); else _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6]suspend/resume SMP support
On Thu, 2005-04-14 at 16:27, Li Shaohua wrote: > On Wed, 2005-04-13 at 16:32, Pavel Machek wrote: > > [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25 > > [] activate_task+0x1/0xa0 > > [] resched_task+0x68/0x90 > > [] try_to_wake_up+0x2aa/0x2f0 > > [] fbcon_cursor+0x19a/0x270 > > [] hide_cursor+0x18/0x30 > > [] vt_console_print+0x24f/0x260 > > [] vt_console_print+0x0/0x260 > > [] __call_console_drivers+0x57/0x60 > > [] call_console_drivers+0x80/0x110 > > [] release_console_sem+0x4e/0xc0 > > [] vprintk+0x192/0x240 > > [] preempt_schedule_irq+0x51/0x80 > > [] acpi_processor_idle+0x0/0x265 > > [] need_resched+0x1f/0x21 > > [] acpi_processor_idle+0x0/0x265 > > [] printk+0x17/0x20 > > [] cpu_init+0x73/0x360 > > [] start_secondary+0x6/0x170 > > Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e > > 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 > > a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89 > > <0>Kernel panic - not syncing: Attempted to kill the idle task! > > Stuck ?? > > Inquiring remote APIC #0... > > ... APIC #0 ID: > > ... APIC #0 VERSION: 00040011 > > ... APIC #0 SPIV: 00ff > > [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# > Andrew, > Below patch fixed Pavel's oops. But strange is the 'system_state' check > is added for CPU hotplug by Rusty. This really makes me confused. Could > you please look at it. > This can be reproduced 100% with radeonfb driver load. Attached is the > dmesg of an oops. It seems the 'objp' parameter for > 'cache_alloc_debugcheck_after' is invalid. Looks the per-cpu array_cache isn't initialized. It's initialized in a cpuhotplug callback. So before the CPU call cpu_up, all kmalloc will failed. Isn't it? Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6]suspend/resume SMP support
On Wed, 2005-04-13 at 16:32, Pavel Machek wrote: > [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25 > [] activate_task+0x1/0xa0 > [] resched_task+0x68/0x90 > [] try_to_wake_up+0x2aa/0x2f0 > [] fbcon_cursor+0x19a/0x270 > [] hide_cursor+0x18/0x30 > [] vt_console_print+0x24f/0x260 > [] vt_console_print+0x0/0x260 > [] __call_console_drivers+0x57/0x60 > [] call_console_drivers+0x80/0x110 > [] release_console_sem+0x4e/0xc0 > [] vprintk+0x192/0x240 > [] preempt_schedule_irq+0x51/0x80 > [] acpi_processor_idle+0x0/0x265 > [] need_resched+0x1f/0x21 > [] acpi_processor_idle+0x0/0x265 > [] printk+0x17/0x20 > [] cpu_init+0x73/0x360 > [] start_secondary+0x6/0x170 > Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e > 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 > a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89 > <0>Kernel panic - not syncing: Attempted to kill the idle task! > Stuck ?? > Inquiring remote APIC #0... > ... APIC #0 ID: > ... APIC #0 VERSION: 00040011 > ... APIC #0 SPIV: 00ff > [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# Andrew, Below patch fixed Pavel's oops. But strange is the 'system_state' check is added for CPU hotplug by Rusty. This really makes me confused. Could you please look at it. This can be reproduced 100% with radeonfb driver load. Attached is the dmesg of an oops. It seems the 'objp' parameter for 'cache_alloc_debugcheck_after' is invalid. Thanks, Shaohua --- a/kernel/printk.c 2005-04-12 10:12:19.0 +0800 +++ b/kernel/printk.c 2005-04-13 17:22:40.912897328 +0800 @@ -624,8 +624,7 @@ asmlinkage int vprintk(const char *fmt, log_level_unknown = 1; } - if (!cpu_online(smp_processor_id()) && - system_state != SYSTEM_RUNNING) { + if (!cpu_online(smp_processor_id())) { /* * Some console drivers may assume that per-cpu resources have * been allocated. So don't allow them to be called by this CPU0 attaching NULL sched-domain. CPU1 attaching NULL sched-domain. CPU0 attaching NULL sched-domain. Booting processor 1/1 eip 3000 Initializing CPU#1 masked ExtINT on CPU#1 Unable to handle kernel paging request at virtual address f000acb2 printing eip: c014e4cc *pde = Oops: [#1] PREEMPT SMP Modules linked in: CPU:1 EIP:0060:[]Not tainted VLI EFLAGS: 00010097 (2.6.12-rc2-mm3) EIP is at check_poison_obj+0x4c/0x1e0 eax: 006b ebx: 005a ecx: dff6e080 edx: dff6e480 esi: edi: f000acb2 ebp: 0080 esp: c14fdcd4 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c14fc000 task=dff42560) Stack: dff6e480 5a5a5a5a 5a5a5a5a 007f 5a5a5a5a 005a f000acae dff6e480 c021192e c0150031 dff6e480 f000acae 5a5a5a5a 5a5a5a5a dff6e480 0046 0020 0010 c015044b dff6e480 0020 f000acae c021192e Call Trace: [] soft_cursor+0x5e/0x260 [] cache_alloc_debugcheck_after+0x181/0x1a0 [] __kmalloc+0x9b/0xd0 [] soft_cursor+0x5e/0x260 [] soft_cursor+0x5e/0x260 [] bit_cursor+0x339/0x540 [] recalc_task_prio+0x88/0x150 [] fbcon_cursor+0x1a2/0x270 [] hide_cursor+0x25/0x40 [] vt_console_print+0x2aa/0x2b0 [] __call_console_drivers+0x62/0x70 [] call_console_drivers+0x96/0x130 [] release_console_sem+0x51/0xc0 [] vprintk+0x19f/0x250 [] __do_softirq+0xd6/0xf0 [] preempt_schedule_irq+0x4b/0x80 [] printk+0x17/0x20 [] setup_local_APIC+0xe2/0x1d0 [] smp_callin+0x7a/0x120 [] start_secondary+0xe/0x190 Code: 24 30 89 14 24 01 c7 e8 13 f8 ff ff 39 44 24 14 89 c5 0f 8d b7 00 00 00 8d 40 ff 89 44 24 0c 3b 74 24 0c b0 6b 0f 84 8c 01 00 00 <38> 04 3e 74 46 8b 44 24 14 85 c0 0f 84 48 01 00 00 89 3c 24 83 <0>Kernel panic - not syncing: Attempted to kill the idle task! Stuck ?? Inquiring remote APIC #1... ... APIC #1 ID: failed ... APIC #1 VERSION: failed ... APIC #1 SPIV: failed
Re: [PATCH 6/6]suspend/resume SMP support
On Wed, 2005-04-13 at 16:32, Pavel Machek wrote: [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25 [c011d001] activate_task+0x1/0xa0 [c011d128] resched_task+0x68/0x90 [c011d8ba] try_to_wake_up+0x2aa/0x2f0 [c025ed7a] fbcon_cursor+0x19a/0x270 [c02c4958] hide_cursor+0x18/0x30 [c02c758f] vt_console_print+0x24f/0x260 [c02c7340] vt_console_print+0x0/0x260 [c01247e7] __call_console_drivers+0x57/0x60 [c01248e0] call_console_drivers+0x80/0x110 [c0124d8e] release_console_sem+0x4e/0xc0 [c0124c12] vprintk+0x192/0x240 [c0528891] preempt_schedule_irq+0x51/0x80 [c02adeca] acpi_processor_idle+0x0/0x265 [c010325e] need_resched+0x1f/0x21 [c02adeca] acpi_processor_idle+0x0/0x265 [c0124a77] printk+0x17/0x20 [c010b583] cpu_init+0x73/0x360 [c0117bd6] start_secondary+0x6/0x170 Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 f3 a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89 0Kernel panic - not syncing: Attempted to kill the idle task! Stuck ?? Inquiring remote APIC #0... ... APIC #0 ID: ... APIC #0 VERSION: 00040011 ... APIC #0 SPIV: 00ff [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# Andrew, Below patch fixed Pavel's oops. But strange is the 'system_state' check is added for CPU hotplug by Rusty. This really makes me confused. Could you please look at it. This can be reproduced 100% with radeonfb driver load. Attached is the dmesg of an oops. It seems the 'objp' parameter for 'cache_alloc_debugcheck_after' is invalid. Thanks, Shaohua --- a/kernel/printk.c 2005-04-12 10:12:19.0 +0800 +++ b/kernel/printk.c 2005-04-13 17:22:40.912897328 +0800 @@ -624,8 +624,7 @@ asmlinkage int vprintk(const char *fmt, log_level_unknown = 1; } - if (!cpu_online(smp_processor_id()) - system_state != SYSTEM_RUNNING) { + if (!cpu_online(smp_processor_id())) { /* * Some console drivers may assume that per-cpu resources have * been allocated. So don't allow them to be called by this CPU0 attaching NULL sched-domain. CPU1 attaching NULL sched-domain. CPU0 attaching NULL sched-domain. Booting processor 1/1 eip 3000 Initializing CPU#1 masked ExtINT on CPU#1 Unable to handle kernel paging request at virtual address f000acb2 printing eip: c014e4cc *pde = Oops: [#1] PREEMPT SMP Modules linked in: CPU:1 EIP:0060:[c014e4cc]Not tainted VLI EFLAGS: 00010097 (2.6.12-rc2-mm3) EIP is at check_poison_obj+0x4c/0x1e0 eax: 006b ebx: 005a ecx: dff6e080 edx: dff6e480 esi: edi: f000acb2 ebp: 0080 esp: c14fdcd4 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c14fc000 task=dff42560) Stack: dff6e480 5a5a5a5a 5a5a5a5a 007f 5a5a5a5a 005a f000acae dff6e480 c021192e c0150031 dff6e480 f000acae 5a5a5a5a 5a5a5a5a dff6e480 0046 0020 0010 c015044b dff6e480 0020 f000acae c021192e Call Trace: [c021192e] soft_cursor+0x5e/0x260 [c0150031] cache_alloc_debugcheck_after+0x181/0x1a0 [c015044b] __kmalloc+0x9b/0xd0 [c021192e] soft_cursor+0x5e/0x260 [c021192e] soft_cursor+0x5e/0x260 [c020a459] bit_cursor+0x339/0x540 [c0118998] recalc_task_prio+0x88/0x150 [c0205c32] fbcon_cursor+0x1a2/0x270 [c0265475] hide_cursor+0x25/0x40 [c026843a] vt_console_print+0x2aa/0x2b0 [c0120d32] __call_console_drivers+0x62/0x70 [c0120e66] call_console_drivers+0x96/0x130 [c0121361] release_console_sem+0x51/0xc0 [c01211df] vprintk+0x19f/0x250 [c01264b6] __do_softirq+0xd6/0xf0 [c0438f5b] preempt_schedule_irq+0x4b/0x80 [c0121037] printk+0x17/0x20 [c0114132] setup_local_APIC+0xe2/0x1d0 [c01130ba] smp_callin+0x7a/0x120 [c011316e] start_secondary+0xe/0x190 Code: 24 30 89 14 24 01 c7 e8 13 f8 ff ff 39 44 24 14 89 c5 0f 8d b7 00 00 00 8d 40 ff 89 44 24 0c 3b 74 24 0c b0 6b 0f 84 8c 01 00 00 38 04 3e 74 46 8b 44 24 14 85 c0 0f 84 48 01 00 00 89 3c 24 83 0Kernel panic - not syncing: Attempted to kill the idle task! Stuck ?? Inquiring remote APIC #1... ... APIC #1 ID: failed ... APIC #1 VERSION: failed ... APIC #1 SPIV: failed
Re: [PATCH 6/6]suspend/resume SMP support
On Thu, 2005-04-14 at 16:27, Li Shaohua wrote: On Wed, 2005-04-13 at 16:32, Pavel Machek wrote: [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25 [c011d001] activate_task+0x1/0xa0 [c011d128] resched_task+0x68/0x90 [c011d8ba] try_to_wake_up+0x2aa/0x2f0 [c025ed7a] fbcon_cursor+0x19a/0x270 [c02c4958] hide_cursor+0x18/0x30 [c02c758f] vt_console_print+0x24f/0x260 [c02c7340] vt_console_print+0x0/0x260 [c01247e7] __call_console_drivers+0x57/0x60 [c01248e0] call_console_drivers+0x80/0x110 [c0124d8e] release_console_sem+0x4e/0xc0 [c0124c12] vprintk+0x192/0x240 [c0528891] preempt_schedule_irq+0x51/0x80 [c02adeca] acpi_processor_idle+0x0/0x265 [c010325e] need_resched+0x1f/0x21 [c02adeca] acpi_processor_idle+0x0/0x265 [c0124a77] printk+0x17/0x20 [c010b583] cpu_init+0x73/0x360 [c0117bd6] start_secondary+0x6/0x170 Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 f3 a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89 0Kernel panic - not syncing: Attempted to kill the idle task! Stuck ?? Inquiring remote APIC #0... ... APIC #0 ID: ... APIC #0 VERSION: 00040011 ... APIC #0 SPIV: 00ff [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# Andrew, Below patch fixed Pavel's oops. But strange is the 'system_state' check is added for CPU hotplug by Rusty. This really makes me confused. Could you please look at it. This can be reproduced 100% with radeonfb driver load. Attached is the dmesg of an oops. It seems the 'objp' parameter for 'cache_alloc_debugcheck_after' is invalid. Looks the per-cpu array_cache isn't initialized. It's initialized in a cpuhotplug callback. So before the CPU call cpu_up, all kmalloc will failed. Isn't it? Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6]physical CPU hot add
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote: > On Tue, 12 Apr 2005, Li Shaohua wrote: > > > #ifdef CONFIG_HOTPLUG_CPU > > +int __attribute__ ((weak)) smp_prepare_cpu(int cpu) > > +{ > > + return 0; > > +} > > + > > Any way for you to avoid using weak attribute? Replace weak attribute with define method as suggested. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 112 --- linux-2.6.11-root/drivers/base/cpu.c |7 + linux-2.6.11-root/include/asm-i386/smp.h |3 3 files changed, 93 insertions(+), 29 deletions(-) diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu 2005-04-13 10:58:37.152081456 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-13 10:58:37.159080392 +0800 @@ -80,6 +80,12 @@ cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; static cpumask_t smp_commenced_mask; +/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there + * is no way to resync one AP against BP. TBD: for prescott and above, we + * should use IA64's algorithm + */ +static int __devinitdata tsc_sync_disabled; + /* Per CPU bogomips and other parameters */ struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned; @@ -416,7 +422,7 @@ static void __devinit smp_callin(void) /* * Synchronize the TSC with the BP */ - if (cpu_has_tsc && cpu_khz) + if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled) synchronize_tsc_ap(); } @@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void) return cpu; } +#ifdef CONFIG_HOTPLUG_CPU +static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS]; +static inline struct task_struct * alloc_idle_task(int cpu) +{ + struct task_struct *idle; + + if ((idle = cpu_idle_tasks[cpu]) != NULL) { + /* initialize thread_struct. we really want to avoid destroy +* idle tread +*/ + idle->thread.esp = (unsigned long)(((struct pt_regs *) + (THREAD_SIZE + (unsigned long) idle->thread_info)) - 1); + init_idle(idle, cpu); + return idle; + } + idle = fork_idle(cpu); + + if (!IS_ERR(idle)) + cpu_idle_tasks[cpu] = idle; + return idle; +} +#else +#define alloc_idle_task(cpu) fork_idle(cpu) +#endif + static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad @@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api * We can't use kernel_thread since we must avoid to * reschedule the child. */ - idle = fork_idle(cpu); + idle = alloc_idle_task(cpu); if (IS_ERR(idle)) panic("failed fork for CPU %d", cpu); idle->thread.eip = (unsigned long) start_secondary; @@ -931,6 +962,55 @@ void cpu_exit_clear(void) cpu_clear(cpu, smp_commenced_mask); unmap_cpu_to_logical_apicid(cpu); } + +struct warm_boot_cpu_info { + struct completion *complete; + int apicid; + int cpu; +}; + +static void __devinit do_warm_boot_cpu(void *p) +{ + struct warm_boot_cpu_info *info = p; + do_boot_cpu(info->apicid, info->cpu); + complete(info->complete); +} + +int __devinit smp_prepare_cpu(int cpu) +{ + DECLARE_COMPLETION(done); + struct warm_boot_cpu_info info; + struct work_struct task; + int apicid, ret; + + lock_cpu_hotplug(); + apicid = x86_cpu_to_apicid[cpu]; + if (apicid == BAD_APICID) { + ret = -ENODEV; + goto exit; + } + + info.complete = + info.apicid = apicid; + info.cpu = cpu; + INIT_WORK(, do_warm_boot_cpu, ); + + tsc_sync_disabled = 1; + + /* init low mem mapping */ + memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, + sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS); + flush_tlb_all(); + schedule_work(); + wait_for_completion(); + + tsc_sync_disabled = 0; + zap_low_mappings(); + ret = 0; +exit: + unlock_cpu_hotplug(); + return ret; +} #endif static void smp_tune_scheduling (void) @@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void } #ifdef CONFIG_HOTPLUG_CPU - -/* must be called with the cpucontrol mutex held */ -static int __devinit cpu_enable(unsigned int cpu) -{ - /* get the target out of its holding state */ - per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; - wmb(); - - /* wait for the processor to ack it. timeout? */ - while (!cpu_online(cpu)) - cpu_relax(); - - fixup_irqs(cpu_online_map); - /* counter the disable in fixup_irqs() */ -
Re: [PATCH 3/6]init call cleanup
On Tue, 2005-04-12 at 17:32, Rolf Eike Beer wrote: > Li Shaohua wrote: > > Trival patch for CPU hotplug. In CPU identify part, only did cleaup > for > > intel CPUs. Need do for other CPUs if they support S3 SMP. > > > > @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void) > > apic_write_around(APIC_LVT1, value); > > } > > > > -void __init setup_local_APIC (void) > > +void __devinit setup_local_APIC (void) > ^ > > > { > > unsigned long oldvalue, value, ver, maxlvt; > > > > Please remove this space while you are at it. > > > @@ -556,7 +556,7 @@ void __init early_cpu_init(void) > > * and IDT. We reload them nevertheless, this function acts as a > > * 'CPU state barrier', nothing should get across. > > */ > > -void __init cpu_init (void) > > +void __devinit cpu_init (void) > > { > > int cpu = smp_processor_id(); > > struct tss_struct * t = _cpu(init_tss, cpu); > > This one too. Removed the space at two places as suggested. Thanks, Shaohua Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. --- linux-2.6.11-root/arch/i386/kernel/apic.c| 14 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 30 +++ linux-2.6.11-root/arch/i386/kernel/cpu/intel.c | 12 +++--- linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c |2 - linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c |2 - linux-2.6.11-root/arch/i386/kernel/process.c |2 - linux-2.6.11-root/arch/i386/kernel/setup.c |2 - linux-2.6.11-root/arch/i386/kernel/smpboot.c | 18 - linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 - 12 files changed, 48 insertions(+), 48 deletions(-) diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c --- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup 2005-04-12 10:37:07.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/apic.c 2005-04-13 10:57:55.817365288 +0800 @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void) apic_write_around(APIC_LVT1, value); } -void __init setup_local_APIC (void) +void __devinit setup_local_APIC(void) { unsigned long oldvalue, value, ver, maxlvt; @@ -676,7 +676,7 @@ static struct sys_device device_lapic = .cls= _sysclass, }; -static void __init apic_pm_activate(void) +static void __devinit apic_pm_activate(void) { apic_pm_state.active = 1; } @@ -877,7 +877,7 @@ fake_ioapic_page: * but we do not accept timer interrupts yet. We only allow the BP * to calibrate. */ -static unsigned int __init get_8254_timer_count(void) +static unsigned int __devinit get_8254_timer_count(void) { extern spinlock_t i8253_lock; unsigned long flags; @@ -896,7 +896,7 @@ static unsigned int __init get_8254_time } /* next tick in 8254 can be caught by catching timer wraparound */ -static void __init wait_8254_wraparound(void) +static void __devinit wait_8254_wraparound(void) { unsigned int curr_count, prev_count; @@ -916,7 +916,7 @@ static void __init wait_8254_wraparound( * Default initialization for 8254 timers. If we use other timers like HPET, * we override this later */ -void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound; +void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound; /* * This function sets up the local APIC timer, with a timeout of @@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR); } -static void __init setup_APIC_timer(unsigned int clocks) +static void __devinit setup_APIC_timer(unsigned int clocks) { unsigned long flags; @@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void) local_irq_enable(); } -void __init setup_secondary_APIC_clock(void) +void __devinit setup_secondary_APIC_clock(void) { setup_APIC_timer(calibration_result); } diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup 2005-04-12 10:37:07.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-13 10:58:25.777810608 +0800 @@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table); DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]); EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack); -static int cachesize_override __initdata = -1; -static int disable_x86_fxsr __initdata = 0; -static
RE: [PATCH 1/6]sep initializing rework
On Wed, 2005-04-13 at 01:57, Protasevich, Natalie wrote: > Hello, > This is a hotplug CPU patch for i386, done against 2.6.12-rc2-mm3. > Somewhat alternative to the one posted by Li Shaohua, but not really > (and I didn't mean that :). If you look closer, our patches are > different and can complement each other I think. Li did great job on > sep, after-offline cleanup, __devinit etc., and I have some radical > changes in the AP bringup mechanism. I left alone __init to __devinit > part (I was going through it lately, but I think even though I had few > more than Li did, he covered it sufficiently perhaps). I started > having > doubts in free_initmem() vs __devinit because look how many of > __init's > left! just a few :). Looks quite smart, but people will argue it will keep all __init sections in this way. I'd like we keep the default behavior of __init. > I got rid of do_boot_cpu loop in smpboot.c because > the loop > static void __init smp_init(void) > { > unsigned int i; > > /* FIXME: This should be done in userspace --RR */ > for_each_present_cpu(i) { > if (num_online_cpus() >= max_cpus) > break; > if (!cpu_online(i)) > cpu_up(i); > } > ... > does it again so why leave it in smpboot.c to boot AP's twice. This is what IA64 does. In this way, you must clean up the bogomips message, TSC synchronization. And CPU_UP could be called in user context, so fork_idle possibly should be in workqueue. And please make sure it doesn't break other things like check_nmi_watchdog. I just select an easy way (add smp_prepare_cpu) and it doesn't break anything. > I also > found that my system fails sooner or later when I try not to synch > runtime booted processor with others, so I changed tsc synchronization > to only sync between booting CPU and the one that boots it. IA64 also does like this. It synchronizes one AP's ITC against BP's one time. But in IA32, TSC's upper 32 bits can be written only on prescott and above. In earlier CPU, upper 32 bits will become 0 after any write. > The patch > works for me on Intel 8x generic box, and on ES7000. I was asked to > separate my patch into smaller ones by the theme, but I'm posting the > entire patch for now, because I think it is probably not the final > one. > I think (I hope) I will sync up with Li later on. > My idea was that if we find a CPU core in ACPI (enabled or disabled), > we > encounter for it in sibling map and create a sysfs node accordingly, > and > cpu_possible_map will reflect that. We take processors up/down > depending > on physical presence using the existing node. That's the scenario > implemented on ES7000 that reports all possible cores in ACPI marking > absent processors as disabled. Runtime enablement/disablement depends > on > sysfs only and the driving agent can be anything (ACPI or user) that > triggers sysfs node for this processor. You possibly can refer to IA64's implementation. The goal of my patches are to support suspend/resume, which actually doesn't really hotremove a CPU, so I just ignored the sysfs/ACPI issues. Thanks, Shaohua > > -Original Message- > From: Zwane Mwaikambo [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 12, 2005 6:08 AM > To: Li Shaohua > Cc: lkml; ACPI-DEV; Len Brown; Pavel Machek; Andrew Morton; > Protasevich, > Natalie; Ryan Harper > Subject: Re: [PATCH 1/6]sep initializing rework > > Hello Shaohua, > > On Tue, 12 Apr 2005, Li Shaohua wrote: > > > These patches (together with 5 patches followed this one) are > updated > > suspend/resume SMP patches. The patches fixed some bugs and do clean > > up as suggested. Now they work for both suspend-to-ram and > suspend-to-disk. > > Patches are against 2.6.12-rc2-mm3. > > These patches look good and i think we should go ahead with them. I've > also cross checked with physical hotplug cpu patches for ES7xxx from > Natalie (added to Cc) and it does indeed look like a lot of the code > will work for her too, but i'd appreciate it if she also does a double > check. > Obviously this won't work for other upcoming users of hotplug cpu like > Xen (Ryan added to Cc) but i think we can abstract things later on to > cover other special users. > > Thanks Shaohua, > Zwane > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6]physical CPU hot add
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote: > On Tue, 12 Apr 2005, Li Shaohua wrote: > > > #ifdef CONFIG_HOTPLUG_CPU > > +int __attribute__ ((weak)) smp_prepare_cpu(int cpu) > > +{ > > + return 0; > > +} > > + > > Any way for you to avoid using weak attribute? Just want to avoid more 'ifdef' or 'define empty routine for other archs' staffs. Someone prefer 'weak' attribute. Either way is ok to me, but if you think the former is better, I'd change it. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6]suspend/resume SMP support
On Tue, 2005-04-12 at 18:51, Pavel Machek wrote: > > Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use > > disable/enable_nonboot_cpus API. The S4 part is based on Pavel's > > original S4 SMP patch. > > I tested it on 2x PII(?) 550MHz system. Suspend went ok, resume loaded > image from disk, but then I got > > Thawing cpus > Booting processor 1/0 eip 3000 > > ...and very funny effect on keyboard leds. They started to blink > (panic-like), but with very wrong frequency. It looked like 2 cpus > doing panic blinks at once... Check if /sys/device/system/cpu/cpu1/online attribute works. If it works, then it's other issue. I only tested the patches in two HT based systems. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6]init call cleanup
Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- linux-2.6.11-root/arch/i386/kernel/apic.c| 14 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 30 +++ linux-2.6.11-root/arch/i386/kernel/cpu/intel.c | 12 +++--- linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c |2 - linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c |2 - linux-2.6.11-root/arch/i386/kernel/process.c |2 - linux-2.6.11-root/arch/i386/kernel/setup.c |2 - linux-2.6.11-root/arch/i386/kernel/smpboot.c | 18 - linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 - 12 files changed, 48 insertions(+), 48 deletions(-) diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c --- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup 2005-04-12 10:37:07.216977888 +0800 +++ linux-2.6.11-root/arch/i386/kernel/apic.c 2005-04-12 10:37:07.243973784 +0800 @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void) apic_write_around(APIC_LVT1, value); } -void __init setup_local_APIC (void) +void __devinit setup_local_APIC (void) { unsigned long oldvalue, value, ver, maxlvt; @@ -676,7 +676,7 @@ static struct sys_device device_lapic = .cls= _sysclass, }; -static void __init apic_pm_activate(void) +static void __devinit apic_pm_activate(void) { apic_pm_state.active = 1; } @@ -877,7 +877,7 @@ fake_ioapic_page: * but we do not accept timer interrupts yet. We only allow the BP * to calibrate. */ -static unsigned int __init get_8254_timer_count(void) +static unsigned int __devinit get_8254_timer_count(void) { extern spinlock_t i8253_lock; unsigned long flags; @@ -896,7 +896,7 @@ static unsigned int __init get_8254_time } /* next tick in 8254 can be caught by catching timer wraparound */ -static void __init wait_8254_wraparound(void) +static void __devinit wait_8254_wraparound(void) { unsigned int curr_count, prev_count; @@ -916,7 +916,7 @@ static void __init wait_8254_wraparound( * Default initialization for 8254 timers. If we use other timers like HPET, * we override this later */ -void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound; +void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound; /* * This function sets up the local APIC timer, with a timeout of @@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR); } -static void __init setup_APIC_timer(unsigned int clocks) +static void __devinit setup_APIC_timer(unsigned int clocks) { unsigned long flags; @@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void) local_irq_enable(); } -void __init setup_secondary_APIC_clock(void) +void __devinit setup_secondary_APIC_clock(void) { setup_APIC_timer(calibration_result); } diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup 2005-04-12 10:37:07.218977584 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 10:37:07.244973632 +0800 @@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table); DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]); EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack); -static int cachesize_override __initdata = -1; -static int disable_x86_fxsr __initdata = 0; -static int disable_x86_serial_nr __initdata = 1; +static int cachesize_override __devinitdata = -1; +static int disable_x86_fxsr __devinitdata = 0; +static int disable_x86_serial_nr __devinitdata = 1; struct cpu_dev * cpu_devs[X86_VENDOR_NUM] = {}; @@ -59,7 +59,7 @@ static int __init cachesize_setup(char * } __setup("cachesize=", cachesize_setup); -int __init get_model_name(struct cpuinfo_x86 *c) +int __devinit get_model_name(struct cpuinfo_x86 *c) { unsigned int *v; char *p, *q; @@ -89,7 +89,7 @@ int __init get_model_name(struct cpuinfo } -void __init display_cacheinfo(struct cpuinfo_x86 *c) +void __devinit display_cacheinfo(struct cpuinfo_x86 *c) { unsigned int n, dummy, ecx, edx, l2size; @@ -130,7 +130,7 @@ void __init display_cacheinfo(struct cpu /* in particular, if CPUID levels 0x8002..4 are supported, this isn't used */ /* Look up CPU names by table lookup. */ -static char __init *table_lookup_model(struct cpuinfo_x86 *c) +static char __devinit *table_lookup_model(struct cpuinfo_x86 *c) { struct cpu_model_info *info; @@ -151,7 +151,7 @@ st
[PATCH 1/6]sep initializing rework
Hi, These patches (together with 5 patches followed this one) are updated suspend/resume SMP patches. The patches fixed some bugs and do clean up as suggested. Now they work for both suspend-to-ram and suspend-to-disk. Patches are against 2.6.12-rc2-mm3. Thanks, Shaohua --- Make SEP init per-cpu, so it is hotplug safed. Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ linux-2.6.11-root/arch/i386/kernel/sysenter.c | 12 +++- linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |4 linux-2.6.11-root/arch/i386/power/cpu.c|4 +--- linux-2.6.11-root/include/asm-i386/smp.h |3 +++ 5 files changed, 21 insertions(+), 8 deletions(-) diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-04-12 10:36:00.164171464 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 10:36:00.174169944 +0800 @@ -443,6 +443,9 @@ static void __init start_secondary(void * the local TLBs too. */ local_flush_tlb(); + + /* Note: this must be done before __cpu_up finish */ + enable_sep_cpu(); cpu_set(smp_processor_id(), cpu_online_map); /* We can take interrupts now: we're officially "up". */ @@ -920,6 +923,9 @@ static void __init smp_boot_cpus(unsigne cpus_clear(cpu_core_map[0]); cpu_set(0, cpu_core_map[0]); + sysenter_setup(); + enable_sep_cpu(); + /* * If we couldn't find an SMP configuration at boot time, * get out of here now! diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup arch/i386/kernel/sysenter.c --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup 2005-04-12 10:36:00.165171312 +0800 +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-04-12 10:36:00.174169944 +0800 @@ -21,11 +21,16 @@ extern asmlinkage void sysenter_entry(void); -void enable_sep_cpu(void *info) +void enable_sep_cpu(void) { int cpu = get_cpu(); struct tss_struct *tss = _cpu(init_tss, cpu); + if (!boot_cpu_has(X86_FEATURE_SEP)) { + put_cpu(); + return; + } + tss->ss1 = __KERNEL_CS; tss->esp1 = sizeof(struct tss_struct) + (unsigned long) tss; wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info) extern const char vsyscall_int80_start, vsyscall_int80_end; extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; -static int __init sysenter_setup(void) +int __init sysenter_setup(void) { void *page = (void *)get_zeroed_page(GFP_ATOMIC); @@ -58,8 +63,5 @@ static int __init sysenter_setup(void) _sysenter_start, _sysenter_end - _sysenter_start); - on_each_cpu(enable_sep_cpu, NULL, 1, 1); return 0; } - -__initcall(sysenter_setup); diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup arch/i386/mach-voyager/voyager_smp.c --- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 2005-04-12 10:36:00.167171008 +0800 +++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c 2005-04-12 10:36:00.175169792 +0800 @@ -499,6 +499,7 @@ start_secondary(void *unused) while (!cpu_isset(cpuid, smp_commenced_mask)) rep_nop(); local_irq_enable(); + enable_sep_cpu(); local_flush_tlb(); @@ -696,6 +697,9 @@ smp_boot_cpus(void) printk("CPU%d: ", boot_cpu_id); print_cpu_info(_data[boot_cpu_id]); + sysenter_setup(); + enable_sep_cpu(); + if(is_cpu_quad()) { /* booting on a Quad CPU */ printk("VOYAGER SMP: Boot CPU is Quad\n"); diff -puN arch/i386/power/cpu.c~sep_init_cleanup arch/i386/power/cpu.c --- linux-2.6.11/arch/i386/power/cpu.c~sep_init_cleanup 2005-04-12 10:36:00.168170856 +0800 +++ linux-2.6.11-root/arch/i386/power/cpu.c 2005-04-12 10:36:00.175169792 +0800 @@ -33,8 +33,6 @@ unsigned long saved_context_esp, saved_c unsigned long saved_context_esi, saved_context_edi; unsigned long saved_context_eflags; -extern void enable_sep_cpu(void *); - void __save_processor_state(struct saved_context *ctxt) { kernel_fpu_begin(); @@ -136,7 +134,7 @@ void __restore_processor_state(struct sa * sysenter MSRs */ if (boot_cpu_has(X86_FEATURE_SEP)) - enable_sep_cpu(NULL); + enable_sep_cpu(); fix_processor_context(); do_fpu_end(); diff -puN include/asm-i386/smp.h~sep_init_cleanup include/asm-i386/smp.h --- linux-2.6.11/include/asm-i386/smp.h~sep_init_cleanup2005-04-12 10:36:00.170170552 +0800 +++ linux-2.6.11-root/include/asm-i386/smp.h2005-04-12 10:36:00.176169640 +0800 @@ -37,6 +37,9 @@ extern int smp_num_siblings; e
[PATCH 2/6]sibling map initializing rework
Make sibling map init per-cpu. Hotplug CPU may change the map at runtime. Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 86 ++- 1 files changed, 45 insertions(+), 41 deletions(-) diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 2005-04-12 10:36:34.283984464 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 10:36:34.287983856 +0800 @@ -63,11 +63,16 @@ static int __initdata smp_b_stepping; /* Number of siblings per CPU package */ int smp_num_siblings = 1; -int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */ +/* Package ID of each logical CPU */ +int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID}; EXPORT_SYMBOL(phys_proc_id); -int cpu_core_id[NR_CPUS]; /* Core ID of each logical CPU */ +/* Core ID of each logical CPU */ +int cpu_core_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID}; EXPORT_SYMBOL(cpu_core_id); +cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; +cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned; + /* bitmap of online cpus */ cpumask_t cpu_online_map __cacheline_aligned; @@ -417,6 +422,38 @@ static void __init smp_callin(void) static int cpucount; +static inline void +set_cpu_sibling_map(int cpu) +{ + int i; + + if (smp_num_siblings > 1) { + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_isset(i, cpu_callout_map)) + continue; + if (cpu_core_id[cpu] == cpu_core_id[i]) { + cpu_set(i, cpu_sibling_map[cpu]); + cpu_set(cpu, cpu_sibling_map[i]); + } + } + } else { + cpu_set(cpu, cpu_sibling_map[cpu]); + } + + if (current_cpu_data.x86_num_cores > 1) { + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_isset(i, cpu_callout_map)) + continue; + if (phys_proc_id[cpu] == phys_proc_id[i]) { + cpu_set(i, cpu_core_map[cpu]); + cpu_set(cpu, cpu_core_map[i]); + } + } + } else { + cpu_core_map[cpu] = cpu_sibling_map[cpu]; + } +} + /* * Activate a secondary processor. */ @@ -444,6 +481,10 @@ static void __init start_secondary(void */ local_flush_tlb(); + /* This must be done before setting cpu_online_map */ + set_cpu_sibling_map(_smp_processor_id()); + wmb(); + /* Note: this must be done before __cpu_up finish */ enable_sep_cpu(); cpu_set(smp_processor_id(), cpu_online_map); @@ -896,8 +937,6 @@ static int boot_cpu_logical_apicid; /* Where the IO area was mapped on multiquad, always 0 otherwise */ void *xquad_portio; -cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; -cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned; static void __init smp_boot_cpus(unsigned int max_cpus) { @@ -1064,43 +1103,8 @@ static void __init smp_boot_cpus(unsigne cpus_clear(cpu_sibling_map[cpu]); cpus_clear(cpu_core_map[cpu]); } - - for (cpu = 0; cpu < NR_CPUS; cpu++) { - struct cpuinfo_x86 *c = cpu_data + cpu; - int siblings = 0; - int i; - if (!cpu_isset(cpu, cpu_callout_map)) - continue; - - if (smp_num_siblings > 1) { - for (i = 0; i < NR_CPUS; i++) { - if (!cpu_isset(i, cpu_callout_map)) - continue; - if (cpu_core_id[cpu] == cpu_core_id[i]) { - siblings++; - cpu_set(i, cpu_sibling_map[cpu]); - } - } - } else { - siblings++; - cpu_set(cpu, cpu_sibling_map[cpu]); - } - - if (siblings != smp_num_siblings) - printk(KERN_WARNING "WARNING: %d siblings found for CPU%d, should be %d\n", siblings, cpu, smp_num_siblings); - - if (c->x86_num_cores > 1) { - for (i = 0; i < NR_CPUS; i++) { - if (!cpu_isset(i, cpu_callout_map)) - continue; - if (phys_proc_id[cpu] == phys_proc_id[i]) { - cpu_set(i, cpu_core_map[cpu]); - } - } - } else { - cpu_core_map[cpu] = cpu_sibling_map[cpu]; - }
[PATCH 2/6]sibling map initializing rework
Make sibling map init per-cpu. Hotplug CPU may change the map at runtime. Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 86 ++- 1 files changed, 45 insertions(+), 41 deletions(-) diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 2005-04-12 10:36:34.283984464 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 10:36:34.287983856 +0800 @@ -63,11 +63,16 @@ static int __initdata smp_b_stepping; /* Number of siblings per CPU package */ int smp_num_siblings = 1; -int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */ +/* Package ID of each logical CPU */ +int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID}; EXPORT_SYMBOL(phys_proc_id); -int cpu_core_id[NR_CPUS]; /* Core ID of each logical CPU */ +/* Core ID of each logical CPU */ +int cpu_core_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID}; EXPORT_SYMBOL(cpu_core_id); +cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; +cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned; + /* bitmap of online cpus */ cpumask_t cpu_online_map __cacheline_aligned; @@ -417,6 +422,38 @@ static void __init smp_callin(void) static int cpucount; +static inline void +set_cpu_sibling_map(int cpu) +{ + int i; + + if (smp_num_siblings 1) { + for (i = 0; i NR_CPUS; i++) { + if (!cpu_isset(i, cpu_callout_map)) + continue; + if (cpu_core_id[cpu] == cpu_core_id[i]) { + cpu_set(i, cpu_sibling_map[cpu]); + cpu_set(cpu, cpu_sibling_map[i]); + } + } + } else { + cpu_set(cpu, cpu_sibling_map[cpu]); + } + + if (current_cpu_data.x86_num_cores 1) { + for (i = 0; i NR_CPUS; i++) { + if (!cpu_isset(i, cpu_callout_map)) + continue; + if (phys_proc_id[cpu] == phys_proc_id[i]) { + cpu_set(i, cpu_core_map[cpu]); + cpu_set(cpu, cpu_core_map[i]); + } + } + } else { + cpu_core_map[cpu] = cpu_sibling_map[cpu]; + } +} + /* * Activate a secondary processor. */ @@ -444,6 +481,10 @@ static void __init start_secondary(void */ local_flush_tlb(); + /* This must be done before setting cpu_online_map */ + set_cpu_sibling_map(_smp_processor_id()); + wmb(); + /* Note: this must be done before __cpu_up finish */ enable_sep_cpu(); cpu_set(smp_processor_id(), cpu_online_map); @@ -896,8 +937,6 @@ static int boot_cpu_logical_apicid; /* Where the IO area was mapped on multiquad, always 0 otherwise */ void *xquad_portio; -cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; -cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned; static void __init smp_boot_cpus(unsigned int max_cpus) { @@ -1064,43 +1103,8 @@ static void __init smp_boot_cpus(unsigne cpus_clear(cpu_sibling_map[cpu]); cpus_clear(cpu_core_map[cpu]); } - - for (cpu = 0; cpu NR_CPUS; cpu++) { - struct cpuinfo_x86 *c = cpu_data + cpu; - int siblings = 0; - int i; - if (!cpu_isset(cpu, cpu_callout_map)) - continue; - - if (smp_num_siblings 1) { - for (i = 0; i NR_CPUS; i++) { - if (!cpu_isset(i, cpu_callout_map)) - continue; - if (cpu_core_id[cpu] == cpu_core_id[i]) { - siblings++; - cpu_set(i, cpu_sibling_map[cpu]); - } - } - } else { - siblings++; - cpu_set(cpu, cpu_sibling_map[cpu]); - } - - if (siblings != smp_num_siblings) - printk(KERN_WARNING WARNING: %d siblings found for CPU%d, should be %d\n, siblings, cpu, smp_num_siblings); - - if (c-x86_num_cores 1) { - for (i = 0; i NR_CPUS; i++) { - if (!cpu_isset(i, cpu_callout_map)) - continue; - if (phys_proc_id[cpu] == phys_proc_id[i]) { - cpu_set(i, cpu_core_map[cpu]); - } - } - } else { - cpu_core_map[cpu] = cpu_sibling_map[cpu]; - } - } + cpu_set(0, cpu_sibling_map[0]); + cpu_set
[PATCH 1/6]sep initializing rework
Hi, These patches (together with 5 patches followed this one) are updated suspend/resume SMP patches. The patches fixed some bugs and do clean up as suggested. Now they work for both suspend-to-ram and suspend-to-disk. Patches are against 2.6.12-rc2-mm3. Thanks, Shaohua --- Make SEP init per-cpu, so it is hotplug safed. Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ linux-2.6.11-root/arch/i386/kernel/sysenter.c | 12 +++- linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |4 linux-2.6.11-root/arch/i386/power/cpu.c|4 +--- linux-2.6.11-root/include/asm-i386/smp.h |3 +++ 5 files changed, 21 insertions(+), 8 deletions(-) diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-04-12 10:36:00.164171464 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 10:36:00.174169944 +0800 @@ -443,6 +443,9 @@ static void __init start_secondary(void * the local TLBs too. */ local_flush_tlb(); + + /* Note: this must be done before __cpu_up finish */ + enable_sep_cpu(); cpu_set(smp_processor_id(), cpu_online_map); /* We can take interrupts now: we're officially up. */ @@ -920,6 +923,9 @@ static void __init smp_boot_cpus(unsigne cpus_clear(cpu_core_map[0]); cpu_set(0, cpu_core_map[0]); + sysenter_setup(); + enable_sep_cpu(); + /* * If we couldn't find an SMP configuration at boot time, * get out of here now! diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup arch/i386/kernel/sysenter.c --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup 2005-04-12 10:36:00.165171312 +0800 +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-04-12 10:36:00.174169944 +0800 @@ -21,11 +21,16 @@ extern asmlinkage void sysenter_entry(void); -void enable_sep_cpu(void *info) +void enable_sep_cpu(void) { int cpu = get_cpu(); struct tss_struct *tss = per_cpu(init_tss, cpu); + if (!boot_cpu_has(X86_FEATURE_SEP)) { + put_cpu(); + return; + } + tss-ss1 = __KERNEL_CS; tss-esp1 = sizeof(struct tss_struct) + (unsigned long) tss; wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info) extern const char vsyscall_int80_start, vsyscall_int80_end; extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; -static int __init sysenter_setup(void) +int __init sysenter_setup(void) { void *page = (void *)get_zeroed_page(GFP_ATOMIC); @@ -58,8 +63,5 @@ static int __init sysenter_setup(void) vsyscall_sysenter_start, vsyscall_sysenter_end - vsyscall_sysenter_start); - on_each_cpu(enable_sep_cpu, NULL, 1, 1); return 0; } - -__initcall(sysenter_setup); diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup arch/i386/mach-voyager/voyager_smp.c --- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 2005-04-12 10:36:00.167171008 +0800 +++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c 2005-04-12 10:36:00.175169792 +0800 @@ -499,6 +499,7 @@ start_secondary(void *unused) while (!cpu_isset(cpuid, smp_commenced_mask)) rep_nop(); local_irq_enable(); + enable_sep_cpu(); local_flush_tlb(); @@ -696,6 +697,9 @@ smp_boot_cpus(void) printk(CPU%d: , boot_cpu_id); print_cpu_info(cpu_data[boot_cpu_id]); + sysenter_setup(); + enable_sep_cpu(); + if(is_cpu_quad()) { /* booting on a Quad CPU */ printk(VOYAGER SMP: Boot CPU is Quad\n); diff -puN arch/i386/power/cpu.c~sep_init_cleanup arch/i386/power/cpu.c --- linux-2.6.11/arch/i386/power/cpu.c~sep_init_cleanup 2005-04-12 10:36:00.168170856 +0800 +++ linux-2.6.11-root/arch/i386/power/cpu.c 2005-04-12 10:36:00.175169792 +0800 @@ -33,8 +33,6 @@ unsigned long saved_context_esp, saved_c unsigned long saved_context_esi, saved_context_edi; unsigned long saved_context_eflags; -extern void enable_sep_cpu(void *); - void __save_processor_state(struct saved_context *ctxt) { kernel_fpu_begin(); @@ -136,7 +134,7 @@ void __restore_processor_state(struct sa * sysenter MSRs */ if (boot_cpu_has(X86_FEATURE_SEP)) - enable_sep_cpu(NULL); + enable_sep_cpu(); fix_processor_context(); do_fpu_end(); diff -puN include/asm-i386/smp.h~sep_init_cleanup include/asm-i386/smp.h --- linux-2.6.11/include/asm-i386/smp.h~sep_init_cleanup2005-04-12 10:36:00.170170552 +0800 +++ linux-2.6.11-root/include/asm-i386/smp.h2005-04-12 10:36:00.176169640 +0800 @@ -37,6 +37,9 @@ extern int smp_num_siblings; extern cpumask_t
[PATCH 3/6]init call cleanup
Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- linux-2.6.11-root/arch/i386/kernel/apic.c| 14 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 30 +++ linux-2.6.11-root/arch/i386/kernel/cpu/intel.c | 12 +++--- linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c |2 - linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c |2 - linux-2.6.11-root/arch/i386/kernel/process.c |2 - linux-2.6.11-root/arch/i386/kernel/setup.c |2 - linux-2.6.11-root/arch/i386/kernel/smpboot.c | 18 - linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 - 12 files changed, 48 insertions(+), 48 deletions(-) diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c --- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup 2005-04-12 10:37:07.216977888 +0800 +++ linux-2.6.11-root/arch/i386/kernel/apic.c 2005-04-12 10:37:07.243973784 +0800 @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void) apic_write_around(APIC_LVT1, value); } -void __init setup_local_APIC (void) +void __devinit setup_local_APIC (void) { unsigned long oldvalue, value, ver, maxlvt; @@ -676,7 +676,7 @@ static struct sys_device device_lapic = .cls= lapic_sysclass, }; -static void __init apic_pm_activate(void) +static void __devinit apic_pm_activate(void) { apic_pm_state.active = 1; } @@ -877,7 +877,7 @@ fake_ioapic_page: * but we do not accept timer interrupts yet. We only allow the BP * to calibrate. */ -static unsigned int __init get_8254_timer_count(void) +static unsigned int __devinit get_8254_timer_count(void) { extern spinlock_t i8253_lock; unsigned long flags; @@ -896,7 +896,7 @@ static unsigned int __init get_8254_time } /* next tick in 8254 can be caught by catching timer wraparound */ -static void __init wait_8254_wraparound(void) +static void __devinit wait_8254_wraparound(void) { unsigned int curr_count, prev_count; @@ -916,7 +916,7 @@ static void __init wait_8254_wraparound( * Default initialization for 8254 timers. If we use other timers like HPET, * we override this later */ -void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound; +void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound; /* * This function sets up the local APIC timer, with a timeout of @@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR); } -static void __init setup_APIC_timer(unsigned int clocks) +static void __devinit setup_APIC_timer(unsigned int clocks) { unsigned long flags; @@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void) local_irq_enable(); } -void __init setup_secondary_APIC_clock(void) +void __devinit setup_secondary_APIC_clock(void) { setup_APIC_timer(calibration_result); } diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup 2005-04-12 10:37:07.218977584 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 10:37:07.244973632 +0800 @@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table); DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]); EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack); -static int cachesize_override __initdata = -1; -static int disable_x86_fxsr __initdata = 0; -static int disable_x86_serial_nr __initdata = 1; +static int cachesize_override __devinitdata = -1; +static int disable_x86_fxsr __devinitdata = 0; +static int disable_x86_serial_nr __devinitdata = 1; struct cpu_dev * cpu_devs[X86_VENDOR_NUM] = {}; @@ -59,7 +59,7 @@ static int __init cachesize_setup(char * } __setup(cachesize=, cachesize_setup); -int __init get_model_name(struct cpuinfo_x86 *c) +int __devinit get_model_name(struct cpuinfo_x86 *c) { unsigned int *v; char *p, *q; @@ -89,7 +89,7 @@ int __init get_model_name(struct cpuinfo } -void __init display_cacheinfo(struct cpuinfo_x86 *c) +void __devinit display_cacheinfo(struct cpuinfo_x86 *c) { unsigned int n, dummy, ecx, edx, l2size; @@ -130,7 +130,7 @@ void __init display_cacheinfo(struct cpu /* in particular, if CPUID levels 0x8002..4 are supported, this isn't used */ /* Look up CPU names by table lookup. */ -static char __init *table_lookup_model(struct cpuinfo_x86 *c) +static char __devinit *table_lookup_model(struct cpuinfo_x86 *c) { struct cpu_model_info *info; @@ -151,7 +151,7 @@ static char __init
Re: [PATCH 6/6]suspend/resume SMP support
On Tue, 2005-04-12 at 18:51, Pavel Machek wrote: Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4 SMP patch. I tested it on 2x PII(?) 550MHz system. Suspend went ok, resume loaded image from disk, but then I got Thawing cpus Booting processor 1/0 eip 3000 ...and very funny effect on keyboard leds. They started to blink (panic-like), but with very wrong frequency. It looked like 2 cpus doing panic blinks at once... Check if /sys/device/system/cpu/cpu1/online attribute works. If it works, then it's other issue. I only tested the patches in two HT based systems. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6]physical CPU hot add
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote: On Tue, 12 Apr 2005, Li Shaohua wrote: #ifdef CONFIG_HOTPLUG_CPU +int __attribute__ ((weak)) smp_prepare_cpu(int cpu) +{ + return 0; +} + Any way for you to avoid using weak attribute? Just want to avoid more 'ifdef' or 'define empty routine for other archs' staffs. Someone prefer 'weak' attribute. Either way is ok to me, but if you think the former is better, I'd change it. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/6]sep initializing rework
On Wed, 2005-04-13 at 01:57, Protasevich, Natalie wrote: Hello, This is a hotplug CPU patch for i386, done against 2.6.12-rc2-mm3. Somewhat alternative to the one posted by Li Shaohua, but not really (and I didn't mean that :). If you look closer, our patches are different and can complement each other I think. Li did great job on sep, after-offline cleanup, __devinit etc., and I have some radical changes in the AP bringup mechanism. I left alone __init to __devinit part (I was going through it lately, but I think even though I had few more than Li did, he covered it sufficiently perhaps). I started having doubts in free_initmem() vs __devinit because look how many of __init's left! just a few :). Looks quite smart, but people will argue it will keep all __init sections in this way. I'd like we keep the default behavior of __init. I got rid of do_boot_cpu loop in smpboot.c because the loop static void __init smp_init(void) { unsigned int i; /* FIXME: This should be done in userspace --RR */ for_each_present_cpu(i) { if (num_online_cpus() = max_cpus) break; if (!cpu_online(i)) cpu_up(i); } ... does it again so why leave it in smpboot.c to boot AP's twice. This is what IA64 does. In this way, you must clean up the bogomips message, TSC synchronization. And CPU_UP could be called in user context, so fork_idle possibly should be in workqueue. And please make sure it doesn't break other things like check_nmi_watchdog. I just select an easy way (add smp_prepare_cpu) and it doesn't break anything. I also found that my system fails sooner or later when I try not to synch runtime booted processor with others, so I changed tsc synchronization to only sync between booting CPU and the one that boots it. IA64 also does like this. It synchronizes one AP's ITC against BP's one time. But in IA32, TSC's upper 32 bits can be written only on prescott and above. In earlier CPU, upper 32 bits will become 0 after any write. The patch works for me on Intel 8x generic box, and on ES7000. I was asked to separate my patch into smaller ones by the theme, but I'm posting the entire patch for now, because I think it is probably not the final one. I think (I hope) I will sync up with Li later on. My idea was that if we find a CPU core in ACPI (enabled or disabled), we encounter for it in sibling map and create a sysfs node accordingly, and cpu_possible_map will reflect that. We take processors up/down depending on physical presence using the existing node. That's the scenario implemented on ES7000 that reports all possible cores in ACPI marking absent processors as disabled. Runtime enablement/disablement depends on sysfs only and the driving agent can be anything (ACPI or user) that triggers sysfs node for this processor. You possibly can refer to IA64's implementation. The goal of my patches are to support suspend/resume, which actually doesn't really hotremove a CPU, so I just ignored the sysfs/ACPI issues. Thanks, Shaohua -Original Message- From: Zwane Mwaikambo [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 12, 2005 6:08 AM To: Li Shaohua Cc: lkml; ACPI-DEV; Len Brown; Pavel Machek; Andrew Morton; Protasevich, Natalie; Ryan Harper Subject: Re: [PATCH 1/6]sep initializing rework Hello Shaohua, On Tue, 12 Apr 2005, Li Shaohua wrote: These patches (together with 5 patches followed this one) are updated suspend/resume SMP patches. The patches fixed some bugs and do clean up as suggested. Now they work for both suspend-to-ram and suspend-to-disk. Patches are against 2.6.12-rc2-mm3. These patches look good and i think we should go ahead with them. I've also cross checked with physical hotplug cpu patches for ES7xxx from Natalie (added to Cc) and it does indeed look like a lot of the code will work for her too, but i'd appreciate it if she also does a double check. Obviously this won't work for other upcoming users of hotplug cpu like Xen (Ryan added to Cc) but i think we can abstract things later on to cover other special users. Thanks Shaohua, Zwane - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/6]init call cleanup
On Tue, 2005-04-12 at 17:32, Rolf Eike Beer wrote: Li Shaohua wrote: Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void) apic_write_around(APIC_LVT1, value); } -void __init setup_local_APIC (void) +void __devinit setup_local_APIC (void) ^ { unsigned long oldvalue, value, ver, maxlvt; Please remove this space while you are at it. @@ -556,7 +556,7 @@ void __init early_cpu_init(void) * and IDT. We reload them nevertheless, this function acts as a * 'CPU state barrier', nothing should get across. */ -void __init cpu_init (void) +void __devinit cpu_init (void) { int cpu = smp_processor_id(); struct tss_struct * t = per_cpu(init_tss, cpu); This one too. Removed the space at two places as suggested. Thanks, Shaohua Trival patch for CPU hotplug. In CPU identify part, only did cleaup for intel CPUs. Need do for other CPUs if they support S3 SMP. --- linux-2.6.11-root/arch/i386/kernel/apic.c| 14 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 30 +++ linux-2.6.11-root/arch/i386/kernel/cpu/intel.c | 12 +++--- linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c |2 - linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c |2 - linux-2.6.11-root/arch/i386/kernel/process.c |2 - linux-2.6.11-root/arch/i386/kernel/setup.c |2 - linux-2.6.11-root/arch/i386/kernel/smpboot.c | 18 - linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 - 12 files changed, 48 insertions(+), 48 deletions(-) diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c --- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup 2005-04-12 10:37:07.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/apic.c 2005-04-13 10:57:55.817365288 +0800 @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void) apic_write_around(APIC_LVT1, value); } -void __init setup_local_APIC (void) +void __devinit setup_local_APIC(void) { unsigned long oldvalue, value, ver, maxlvt; @@ -676,7 +676,7 @@ static struct sys_device device_lapic = .cls= lapic_sysclass, }; -static void __init apic_pm_activate(void) +static void __devinit apic_pm_activate(void) { apic_pm_state.active = 1; } @@ -877,7 +877,7 @@ fake_ioapic_page: * but we do not accept timer interrupts yet. We only allow the BP * to calibrate. */ -static unsigned int __init get_8254_timer_count(void) +static unsigned int __devinit get_8254_timer_count(void) { extern spinlock_t i8253_lock; unsigned long flags; @@ -896,7 +896,7 @@ static unsigned int __init get_8254_time } /* next tick in 8254 can be caught by catching timer wraparound */ -static void __init wait_8254_wraparound(void) +static void __devinit wait_8254_wraparound(void) { unsigned int curr_count, prev_count; @@ -916,7 +916,7 @@ static void __init wait_8254_wraparound( * Default initialization for 8254 timers. If we use other timers like HPET, * we override this later */ -void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound; +void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound; /* * This function sets up the local APIC timer, with a timeout of @@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR); } -static void __init setup_APIC_timer(unsigned int clocks) +static void __devinit setup_APIC_timer(unsigned int clocks) { unsigned long flags; @@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void) local_irq_enable(); } -void __init setup_secondary_APIC_clock(void) +void __devinit setup_secondary_APIC_clock(void) { setup_APIC_timer(calibration_result); } diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup 2005-04-12 10:37:07.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-13 10:58:25.777810608 +0800 @@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table); DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]); EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack); -static int cachesize_override __initdata = -1; -static int disable_x86_fxsr __initdata = 0; -static int disable_x86_serial_nr __initdata = 1; +static int cachesize_override __devinitdata = -1; +static int disable_x86_fxsr __devinitdata = 0; +static int disable_x86_serial_nr __devinitdata = 1; struct cpu_dev
Re: [PATCH 5/6]physical CPU hot add
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote: On Tue, 12 Apr 2005, Li Shaohua wrote: #ifdef CONFIG_HOTPLUG_CPU +int __attribute__ ((weak)) smp_prepare_cpu(int cpu) +{ + return 0; +} + Any way for you to avoid using weak attribute? Replace weak attribute with define method as suggested. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 112 --- linux-2.6.11-root/drivers/base/cpu.c |7 + linux-2.6.11-root/include/asm-i386/smp.h |3 3 files changed, 93 insertions(+), 29 deletions(-) diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu 2005-04-13 10:58:37.152081456 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-13 10:58:37.159080392 +0800 @@ -80,6 +80,12 @@ cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; static cpumask_t smp_commenced_mask; +/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there + * is no way to resync one AP against BP. TBD: for prescott and above, we + * should use IA64's algorithm + */ +static int __devinitdata tsc_sync_disabled; + /* Per CPU bogomips and other parameters */ struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned; @@ -416,7 +422,7 @@ static void __devinit smp_callin(void) /* * Synchronize the TSC with the BP */ - if (cpu_has_tsc cpu_khz) + if (cpu_has_tsc cpu_khz !tsc_sync_disabled) synchronize_tsc_ap(); } @@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void) return cpu; } +#ifdef CONFIG_HOTPLUG_CPU +static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS]; +static inline struct task_struct * alloc_idle_task(int cpu) +{ + struct task_struct *idle; + + if ((idle = cpu_idle_tasks[cpu]) != NULL) { + /* initialize thread_struct. we really want to avoid destroy +* idle tread +*/ + idle-thread.esp = (unsigned long)(((struct pt_regs *) + (THREAD_SIZE + (unsigned long) idle-thread_info)) - 1); + init_idle(idle, cpu); + return idle; + } + idle = fork_idle(cpu); + + if (!IS_ERR(idle)) + cpu_idle_tasks[cpu] = idle; + return idle; +} +#else +#define alloc_idle_task(cpu) fork_idle(cpu) +#endif + static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad @@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api * We can't use kernel_thread since we must avoid to * reschedule the child. */ - idle = fork_idle(cpu); + idle = alloc_idle_task(cpu); if (IS_ERR(idle)) panic(failed fork for CPU %d, cpu); idle-thread.eip = (unsigned long) start_secondary; @@ -931,6 +962,55 @@ void cpu_exit_clear(void) cpu_clear(cpu, smp_commenced_mask); unmap_cpu_to_logical_apicid(cpu); } + +struct warm_boot_cpu_info { + struct completion *complete; + int apicid; + int cpu; +}; + +static void __devinit do_warm_boot_cpu(void *p) +{ + struct warm_boot_cpu_info *info = p; + do_boot_cpu(info-apicid, info-cpu); + complete(info-complete); +} + +int __devinit smp_prepare_cpu(int cpu) +{ + DECLARE_COMPLETION(done); + struct warm_boot_cpu_info info; + struct work_struct task; + int apicid, ret; + + lock_cpu_hotplug(); + apicid = x86_cpu_to_apicid[cpu]; + if (apicid == BAD_APICID) { + ret = -ENODEV; + goto exit; + } + + info.complete = done; + info.apicid = apicid; + info.cpu = cpu; + INIT_WORK(task, do_warm_boot_cpu, info); + + tsc_sync_disabled = 1; + + /* init low mem mapping */ + memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, + sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS); + flush_tlb_all(); + schedule_work(task); + wait_for_completion(done); + + tsc_sync_disabled = 0; + zap_low_mappings(); + ret = 0; +exit: + unlock_cpu_hotplug(); + return ret; +} #endif static void smp_tune_scheduling (void) @@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void } #ifdef CONFIG_HOTPLUG_CPU - -/* must be called with the cpucontrol mutex held */ -static int __devinit cpu_enable(unsigned int cpu) -{ - /* get the target out of its holding state */ - per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; - wmb(); - - /* wait for the processor to ack it. timeout? */ - while (!cpu_online(cpu)) - cpu_relax(); - - fixup_irqs(cpu_online_map); - /* counter the disable in fixup_irqs() */ - local_irq_enable(); - return 0; -} - static void remove_siblinginfo(int cpu) { @@ -1270,14 +1332,6
[PATCH 4/6]cpu state clean after hot remove
Clean CPU states in order to reuse smp boot code for CPU hotplug. Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 12 linux-2.6.11-root/arch/i386/kernel/irq.c|5 + linux-2.6.11-root/arch/i386/kernel/process.c| 19 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c| 62 ++-- linux-2.6.11-root/include/asm-i386/irq.h|2 linux-2.6.11-root/include/asm-i386/smp.h|5 + 6 files changed, 89 insertions(+), 16 deletions(-) diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean 2005-04-12 10:37:50.642376224 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 10:37:50.654374400 +0800 @@ -644,3 +644,15 @@ void __devinit cpu_init (void) clear_used_math(); mxcsr_feature_mask_init(); } + +#ifdef CONFIG_HOTPLUG_CPU +void __devinit cpu_uninit(void) +{ + int cpu = _smp_processor_id(); + cpu_clear(cpu, cpu_initialized); + + /* lazy TLB state */ + per_cpu(cpu_tlbstate, cpu).state = 0; + per_cpu(cpu_tlbstate, cpu).active_mm = _mm; +} +#endif diff -puN arch/i386/kernel/irq.c~cpu_state_clean arch/i386/kernel/irq.c --- linux-2.6.11/arch/i386/kernel/irq.c~cpu_state_clean 2005-04-12 10:37:50.643376072 +0800 +++ linux-2.6.11-root/arch/i386/kernel/irq.c2005-04-12 10:37:50.654374400 +0800 @@ -158,6 +158,11 @@ void irq_ctx_init(int cpu) cpu,hardirq_ctx[cpu],softirq_ctx[cpu]); } +void irq_ctx_exit(int cpu) +{ + hardirq_ctx[cpu] = NULL; +} + extern asmlinkage void __do_softirq(void); asmlinkage void do_softirq(void) diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c --- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-04-12 10:37:50.645375768 +0800 +++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-12 10:37:50.655374248 +0800 @@ -148,21 +148,18 @@ static void poll_idle (void) /* We don't actually take CPU down, just spin without interrupts. */ static inline void play_dead(void) { + /* This must be done before dead CPU ack */ + cpu_exit_clear(); + mb(); /* Ack it */ __get_cpu_var(cpu_state) = CPU_DEAD; - /* We shouldn't have to disable interrupts while dead, but -* some interrupts just don't seem to go away, and this makes -* it "work" for testing purposes. */ - /* Death loop */ - while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE) - cpu_relax(); - + /* +* With physical CPU hotplug, we should halt the cpu +*/ local_irq_disable(); - __flush_tlb_all(); - cpu_set(smp_processor_id(), cpu_online_map); - enable_APIC_timer(); - local_irq_enable(); + while (1) + __asm__ __volatile__("hlt":::"memory"); } #else static inline void play_dead(void) diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-04-12 10:37:50.646375616 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 10:37:50.656374096 +0800 @@ -798,8 +798,18 @@ wakeup_secondary_cpu(int phys_apicid, un #endif /* WAKE_SECONDARY_VIA_INIT */ extern cpumask_t cpu_initialized; +static inline int alloc_cpu_id(void) +{ + cpumask_t tmp_map; + int cpu; + cpus_complement(tmp_map, cpu_present_map); + cpu = first_cpu(tmp_map); + if (cpu >= NR_CPUS) + return -ENODEV; + return cpu; +} -static int __devinit do_boot_cpu(int apicid) +static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -808,11 +818,12 @@ static int __devinit do_boot_cpu(int api { struct task_struct *idle; unsigned long boot_error; - int timeout, cpu; + int timeout; unsigned long start_eip; unsigned short nmi_high = 0, nmi_low = 0; - cpu = ++cpucount; + ++cpucount; + /* * We can't use kernel_thread since we must avoid to * reschedule the child. @@ -884,13 +895,16 @@ static int __devinit do_boot_cpu(int api inquire_remote_apic(apicid); } } - x86_cpu_to_apicid[cpu] = apicid; + if (boot_error) { /* Try to put things back the way they were before ... */ unmap_cpu_to_logical_apicid(cpu); cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */ cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */ cpucount--; + } else { + x86_cpu_to_apicid[cpu] = apicid; +
[PATCH 6/6]suspend/resume SMP support
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4 SMP patch. Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- linux-2.6.11-root/drivers/acpi/Kconfig|2 linux-2.6.11-root/include/linux/suspend.h |2 linux-2.6.11-root/kernel/power/Kconfig|2 linux-2.6.11-root/kernel/power/disk.c | 36 ++- linux-2.6.11-root/kernel/power/main.c | 16 +++-- linux-2.6.11-root/kernel/power/smp.c | 91 +++--- linux-2.6.11-root/kernel/power/swsusp.c |2 7 files changed, 69 insertions(+), 82 deletions(-) diff -puN drivers/acpi/Kconfig~smp_sleep drivers/acpi/Kconfig --- linux-2.6.11/drivers/acpi/Kconfig~smp_sleep 2005-04-12 11:11:14.884685080 +0800 +++ linux-2.6.11-root/drivers/acpi/Kconfig 2005-04-12 11:11:14.898682952 +0800 @@ -57,7 +57,7 @@ if ACPI_INTERPRETER config ACPI_SLEEP bool "Sleep States (EXPERIMENTAL)" - depends on X86 + depends on X86 && (!SMP || HOTPLUG_CPU) depends on EXPERIMENTAL default y ---help--- diff -puN include/linux/suspend.h~smp_sleep include/linux/suspend.h --- linux-2.6.11/include/linux/suspend.h~smp_sleep 2005-04-12 11:11:14.885684928 +0800 +++ linux-2.6.11-root/include/linux/suspend.h 2005-04-12 11:11:14.898682952 +0800 @@ -58,7 +58,7 @@ static inline int software_suspend(void) } #endif -#ifdef CONFIG_SMP +#ifdef CONFIG_HOTPLUG_CPU extern void disable_nonboot_cpus(void); extern void enable_nonboot_cpus(void); #else diff -puN kernel/power/disk.c~smp_sleep kernel/power/disk.c --- linux-2.6.11/kernel/power/disk.c~smp_sleep 2005-04-12 11:11:14.887684624 +0800 +++ linux-2.6.11-root/kernel/power/disk.c 2005-04-12 11:11:14.899682800 +0800 @@ -117,8 +117,8 @@ static void finish(void) { device_resume(); platform_finish(); - enable_nonboot_cpus(); thaw_processes(); + enable_nonboot_cpus(); pm_restore_console(); } @@ -131,28 +131,36 @@ static int prepare_processes(void) sys_sync(); + disable_nonboot_cpus(); + if (freeze_processes()) { error = -EBUSY; - return error; + goto enable_cpu; } if (pm_disk_mode == PM_DISK_PLATFORM) { if (pm_ops && pm_ops->prepare) { if ((error = pm_ops->prepare(PM_SUSPEND_DISK))) - return error; + goto thaw; } } /* Free memory before shutting down devices. */ free_some_memory(); - return 0; +thaw: + thaw_processes(); +enable_cpu: + enable_nonboot_cpus(); + pm_restore_console(); + return error; } static void unprepare_processes(void) { - enable_nonboot_cpus(); + platform_finish(); thaw_processes(); + enable_nonboot_cpus(); pm_restore_console(); } @@ -160,15 +168,9 @@ static int prepare_devices(void) { int error; - disable_nonboot_cpus(); - if ((error = device_suspend(PMSG_FREEZE))) { + if ((error = device_suspend(PMSG_FREEZE))) printk("Some devices failed to suspend\n"); - platform_finish(); - enable_nonboot_cpus(); - return error; - } - - return 0; + return error; } /** @@ -185,9 +187,9 @@ int pm_suspend_disk(void) int error; error = prepare_processes(); - if (!error) { - error = prepare_devices(); - } + if (error) + return error; + error = prepare_devices(); if (error) { unprepare_processes(); @@ -250,7 +252,7 @@ static int software_resume(void) if ((error = prepare_processes())) { swsusp_close(); - goto Cleanup; + goto Done; } pr_debug("PM: Reading swsusp image.\n"); diff -puN kernel/power/Kconfig~smp_sleep kernel/power/Kconfig --- linux-2.6.11/kernel/power/Kconfig~smp_sleep 2005-04-12 11:11:14.888684472 +0800 +++ linux-2.6.11-root/kernel/power/Kconfig 2005-04-12 11:11:14.899682800 +0800 @@ -28,7 +28,7 @@ config PM_DEBUG config SOFTWARE_SUSPEND bool "Software Suspend (EXPERIMENTAL)" - depends on EXPERIMENTAL && PM && SWAP + depends on EXPERIMENTAL && PM && SWAP && (HOTPLUG_CPU || !SMP) ---help--- Enable the possibility of suspending the machine. It doesn't need APM. diff -puN kernel/power/main.c~smp_sleep kernel/power/main.c --- linux-2.6.11/kernel/power/main.c~smp_sleep 2005-04-12 11:11:14.890684168 +0800 +++ linux-2.6.11-root/kernel/power/main.c 2005-04-12 11:11:14.899682800 +0800 @@ -59,6 +59,13 @@ static int suspend_prepare(suspend_
[PATCH 5/6]physical CPU hot add
Boot a CPU at runtime. Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 112 --- linux-2.6.11-root/drivers/base/cpu.c |8 + linux-2.6.11-root/include/asm-i386/smp.h |2 3 files changed, 93 insertions(+), 29 deletions(-) diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu 2005-04-12 10:38:16.720411760 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 11:11:09.16040 +0800 @@ -80,6 +80,12 @@ cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; static cpumask_t smp_commenced_mask; +/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there + * is no way to resync one AP against BP. TBD: for prescott and above, we + * should use IA64's algorithm + */ +static int __devinitdata tsc_sync_disabled; + /* Per CPU bogomips and other parameters */ struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned; @@ -416,7 +422,7 @@ static void __devinit smp_callin(void) /* * Synchronize the TSC with the BP */ - if (cpu_has_tsc && cpu_khz) + if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled) synchronize_tsc_ap(); } @@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void) return cpu; } +#ifdef CONFIG_HOTPLUG_CPU +static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS]; +static inline struct task_struct * alloc_idle_task(int cpu) +{ + struct task_struct *idle; + + if ((idle = cpu_idle_tasks[cpu]) != NULL) { + /* initialize thread_struct. we really want to avoid destroy +* idle tread +*/ + idle->thread.esp = (unsigned long)(((struct pt_regs *) + (THREAD_SIZE + (unsigned long) idle->thread_info)) - 1); + init_idle(idle, cpu); + return idle; + } + idle = fork_idle(cpu); + + if (!IS_ERR(idle)) + cpu_idle_tasks[cpu] = idle; + return idle; +} +#else +#define alloc_idle_task(cpu) fork_idle(cpu) +#endif + static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad @@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api * We can't use kernel_thread since we must avoid to * reschedule the child. */ - idle = fork_idle(cpu); + idle = alloc_idle_task(cpu); if (IS_ERR(idle)) panic("failed fork for CPU %d", cpu); idle->thread.eip = (unsigned long) start_secondary; @@ -931,6 +962,55 @@ void cpu_exit_clear(void) cpu_clear(cpu, smp_commenced_mask); unmap_cpu_to_logical_apicid(cpu); } + +struct warm_boot_cpu_info { + struct completion *complete; + int apicid; + int cpu; +}; + +static void __devinit do_warm_boot_cpu(void *p) +{ + struct warm_boot_cpu_info *info = p; + do_boot_cpu(info->apicid, info->cpu); + complete(info->complete); +} + +int __devinit smp_prepare_cpu(int cpu) +{ + DECLARE_COMPLETION(done); + struct warm_boot_cpu_info info; + struct work_struct task; + int apicid, ret; + + lock_cpu_hotplug(); + apicid = x86_cpu_to_apicid[cpu]; + if (apicid == BAD_APICID) { + ret = -ENODEV; + goto exit; + } + + info.complete = + info.apicid = apicid; + info.cpu = cpu; + INIT_WORK(, do_warm_boot_cpu, ); + + tsc_sync_disabled = 1; + + /* init low mem mapping */ + memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, + sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS); + flush_tlb_all(); + schedule_work(); + wait_for_completion(); + + tsc_sync_disabled = 0; + zap_low_mappings(); + ret = 0; +exit: + unlock_cpu_hotplug(); + return ret; +} #endif static void smp_tune_scheduling (void) @@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void } #ifdef CONFIG_HOTPLUG_CPU - -/* must be called with the cpucontrol mutex held */ -static int __devinit cpu_enable(unsigned int cpu) -{ - /* get the target out of its holding state */ - per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; - wmb(); - - /* wait for the processor to ack it. timeout? */ - while (!cpu_online(cpu)) - cpu_relax(); - - fixup_irqs(cpu_online_map); - /* counter the disable in fixup_irqs() */ - local_irq_enable(); - return 0; -} - static void remove_siblinginfo(int cpu) { @@ -1270,14 +1332,6 @@ int __devinit __cpu_up(unsigned int cpu) return -EIO; } -#ifdef CONFIG_HOTPLUG_CPU - /* Already up, and in cpu_quiescent now? */ - if (cpu_isset(cpu, smp_commenced_mask)) {
[PATCH 5/6]physical CPU hot add
Boot a CPU at runtime. Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 112 --- linux-2.6.11-root/drivers/base/cpu.c |8 + linux-2.6.11-root/include/asm-i386/smp.h |2 3 files changed, 93 insertions(+), 29 deletions(-) diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu 2005-04-12 10:38:16.720411760 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 11:11:09.16040 +0800 @@ -80,6 +80,12 @@ cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; static cpumask_t smp_commenced_mask; +/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there + * is no way to resync one AP against BP. TBD: for prescott and above, we + * should use IA64's algorithm + */ +static int __devinitdata tsc_sync_disabled; + /* Per CPU bogomips and other parameters */ struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned; @@ -416,7 +422,7 @@ static void __devinit smp_callin(void) /* * Synchronize the TSC with the BP */ - if (cpu_has_tsc cpu_khz) + if (cpu_has_tsc cpu_khz !tsc_sync_disabled) synchronize_tsc_ap(); } @@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void) return cpu; } +#ifdef CONFIG_HOTPLUG_CPU +static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS]; +static inline struct task_struct * alloc_idle_task(int cpu) +{ + struct task_struct *idle; + + if ((idle = cpu_idle_tasks[cpu]) != NULL) { + /* initialize thread_struct. we really want to avoid destroy +* idle tread +*/ + idle-thread.esp = (unsigned long)(((struct pt_regs *) + (THREAD_SIZE + (unsigned long) idle-thread_info)) - 1); + init_idle(idle, cpu); + return idle; + } + idle = fork_idle(cpu); + + if (!IS_ERR(idle)) + cpu_idle_tasks[cpu] = idle; + return idle; +} +#else +#define alloc_idle_task(cpu) fork_idle(cpu) +#endif + static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad @@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api * We can't use kernel_thread since we must avoid to * reschedule the child. */ - idle = fork_idle(cpu); + idle = alloc_idle_task(cpu); if (IS_ERR(idle)) panic(failed fork for CPU %d, cpu); idle-thread.eip = (unsigned long) start_secondary; @@ -931,6 +962,55 @@ void cpu_exit_clear(void) cpu_clear(cpu, smp_commenced_mask); unmap_cpu_to_logical_apicid(cpu); } + +struct warm_boot_cpu_info { + struct completion *complete; + int apicid; + int cpu; +}; + +static void __devinit do_warm_boot_cpu(void *p) +{ + struct warm_boot_cpu_info *info = p; + do_boot_cpu(info-apicid, info-cpu); + complete(info-complete); +} + +int __devinit smp_prepare_cpu(int cpu) +{ + DECLARE_COMPLETION(done); + struct warm_boot_cpu_info info; + struct work_struct task; + int apicid, ret; + + lock_cpu_hotplug(); + apicid = x86_cpu_to_apicid[cpu]; + if (apicid == BAD_APICID) { + ret = -ENODEV; + goto exit; + } + + info.complete = done; + info.apicid = apicid; + info.cpu = cpu; + INIT_WORK(task, do_warm_boot_cpu, info); + + tsc_sync_disabled = 1; + + /* init low mem mapping */ + memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, + sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS); + flush_tlb_all(); + schedule_work(task); + wait_for_completion(done); + + tsc_sync_disabled = 0; + zap_low_mappings(); + ret = 0; +exit: + unlock_cpu_hotplug(); + return ret; +} #endif static void smp_tune_scheduling (void) @@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void } #ifdef CONFIG_HOTPLUG_CPU - -/* must be called with the cpucontrol mutex held */ -static int __devinit cpu_enable(unsigned int cpu) -{ - /* get the target out of its holding state */ - per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; - wmb(); - - /* wait for the processor to ack it. timeout? */ - while (!cpu_online(cpu)) - cpu_relax(); - - fixup_irqs(cpu_online_map); - /* counter the disable in fixup_irqs() */ - local_irq_enable(); - return 0; -} - static void remove_siblinginfo(int cpu) { @@ -1270,14 +1332,6 @@ int __devinit __cpu_up(unsigned int cpu) return -EIO; } -#ifdef CONFIG_HOTPLUG_CPU - /* Already up, and in cpu_quiescent now? */ - if (cpu_isset(cpu, smp_commenced_mask)) { - cpu_enable(cpu); - return
[PATCH 6/6]suspend/resume SMP support
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4 SMP patch. Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- linux-2.6.11-root/drivers/acpi/Kconfig|2 linux-2.6.11-root/include/linux/suspend.h |2 linux-2.6.11-root/kernel/power/Kconfig|2 linux-2.6.11-root/kernel/power/disk.c | 36 ++- linux-2.6.11-root/kernel/power/main.c | 16 +++-- linux-2.6.11-root/kernel/power/smp.c | 91 +++--- linux-2.6.11-root/kernel/power/swsusp.c |2 7 files changed, 69 insertions(+), 82 deletions(-) diff -puN drivers/acpi/Kconfig~smp_sleep drivers/acpi/Kconfig --- linux-2.6.11/drivers/acpi/Kconfig~smp_sleep 2005-04-12 11:11:14.884685080 +0800 +++ linux-2.6.11-root/drivers/acpi/Kconfig 2005-04-12 11:11:14.898682952 +0800 @@ -57,7 +57,7 @@ if ACPI_INTERPRETER config ACPI_SLEEP bool Sleep States (EXPERIMENTAL) - depends on X86 + depends on X86 (!SMP || HOTPLUG_CPU) depends on EXPERIMENTAL default y ---help--- diff -puN include/linux/suspend.h~smp_sleep include/linux/suspend.h --- linux-2.6.11/include/linux/suspend.h~smp_sleep 2005-04-12 11:11:14.885684928 +0800 +++ linux-2.6.11-root/include/linux/suspend.h 2005-04-12 11:11:14.898682952 +0800 @@ -58,7 +58,7 @@ static inline int software_suspend(void) } #endif -#ifdef CONFIG_SMP +#ifdef CONFIG_HOTPLUG_CPU extern void disable_nonboot_cpus(void); extern void enable_nonboot_cpus(void); #else diff -puN kernel/power/disk.c~smp_sleep kernel/power/disk.c --- linux-2.6.11/kernel/power/disk.c~smp_sleep 2005-04-12 11:11:14.887684624 +0800 +++ linux-2.6.11-root/kernel/power/disk.c 2005-04-12 11:11:14.899682800 +0800 @@ -117,8 +117,8 @@ static void finish(void) { device_resume(); platform_finish(); - enable_nonboot_cpus(); thaw_processes(); + enable_nonboot_cpus(); pm_restore_console(); } @@ -131,28 +131,36 @@ static int prepare_processes(void) sys_sync(); + disable_nonboot_cpus(); + if (freeze_processes()) { error = -EBUSY; - return error; + goto enable_cpu; } if (pm_disk_mode == PM_DISK_PLATFORM) { if (pm_ops pm_ops-prepare) { if ((error = pm_ops-prepare(PM_SUSPEND_DISK))) - return error; + goto thaw; } } /* Free memory before shutting down devices. */ free_some_memory(); - return 0; +thaw: + thaw_processes(); +enable_cpu: + enable_nonboot_cpus(); + pm_restore_console(); + return error; } static void unprepare_processes(void) { - enable_nonboot_cpus(); + platform_finish(); thaw_processes(); + enable_nonboot_cpus(); pm_restore_console(); } @@ -160,15 +168,9 @@ static int prepare_devices(void) { int error; - disable_nonboot_cpus(); - if ((error = device_suspend(PMSG_FREEZE))) { + if ((error = device_suspend(PMSG_FREEZE))) printk(Some devices failed to suspend\n); - platform_finish(); - enable_nonboot_cpus(); - return error; - } - - return 0; + return error; } /** @@ -185,9 +187,9 @@ int pm_suspend_disk(void) int error; error = prepare_processes(); - if (!error) { - error = prepare_devices(); - } + if (error) + return error; + error = prepare_devices(); if (error) { unprepare_processes(); @@ -250,7 +252,7 @@ static int software_resume(void) if ((error = prepare_processes())) { swsusp_close(); - goto Cleanup; + goto Done; } pr_debug(PM: Reading swsusp image.\n); diff -puN kernel/power/Kconfig~smp_sleep kernel/power/Kconfig --- linux-2.6.11/kernel/power/Kconfig~smp_sleep 2005-04-12 11:11:14.888684472 +0800 +++ linux-2.6.11-root/kernel/power/Kconfig 2005-04-12 11:11:14.899682800 +0800 @@ -28,7 +28,7 @@ config PM_DEBUG config SOFTWARE_SUSPEND bool Software Suspend (EXPERIMENTAL) - depends on EXPERIMENTAL PM SWAP + depends on EXPERIMENTAL PM SWAP (HOTPLUG_CPU || !SMP) ---help--- Enable the possibility of suspending the machine. It doesn't need APM. diff -puN kernel/power/main.c~smp_sleep kernel/power/main.c --- linux-2.6.11/kernel/power/main.c~smp_sleep 2005-04-12 11:11:14.890684168 +0800 +++ linux-2.6.11-root/kernel/power/main.c 2005-04-12 11:11:14.899682800 +0800 @@ -59,6 +59,13 @@ static int suspend_prepare(suspend_state pm_prepare_console(); + disable_nonboot_cpus(); + + if (num_online_cpus() != 1) { + error = -EPERM
[PATCH 4/6]cpu state clean after hot remove
Clean CPU states in order to reuse smp boot code for CPU hotplug. Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 12 linux-2.6.11-root/arch/i386/kernel/irq.c|5 + linux-2.6.11-root/arch/i386/kernel/process.c| 19 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c| 62 ++-- linux-2.6.11-root/include/asm-i386/irq.h|2 linux-2.6.11-root/include/asm-i386/smp.h|5 + 6 files changed, 89 insertions(+), 16 deletions(-) diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean 2005-04-12 10:37:50.642376224 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 10:37:50.654374400 +0800 @@ -644,3 +644,15 @@ void __devinit cpu_init (void) clear_used_math(); mxcsr_feature_mask_init(); } + +#ifdef CONFIG_HOTPLUG_CPU +void __devinit cpu_uninit(void) +{ + int cpu = _smp_processor_id(); + cpu_clear(cpu, cpu_initialized); + + /* lazy TLB state */ + per_cpu(cpu_tlbstate, cpu).state = 0; + per_cpu(cpu_tlbstate, cpu).active_mm = init_mm; +} +#endif diff -puN arch/i386/kernel/irq.c~cpu_state_clean arch/i386/kernel/irq.c --- linux-2.6.11/arch/i386/kernel/irq.c~cpu_state_clean 2005-04-12 10:37:50.643376072 +0800 +++ linux-2.6.11-root/arch/i386/kernel/irq.c2005-04-12 10:37:50.654374400 +0800 @@ -158,6 +158,11 @@ void irq_ctx_init(int cpu) cpu,hardirq_ctx[cpu],softirq_ctx[cpu]); } +void irq_ctx_exit(int cpu) +{ + hardirq_ctx[cpu] = NULL; +} + extern asmlinkage void __do_softirq(void); asmlinkage void do_softirq(void) diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c --- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-04-12 10:37:50.645375768 +0800 +++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-12 10:37:50.655374248 +0800 @@ -148,21 +148,18 @@ static void poll_idle (void) /* We don't actually take CPU down, just spin without interrupts. */ static inline void play_dead(void) { + /* This must be done before dead CPU ack */ + cpu_exit_clear(); + mb(); /* Ack it */ __get_cpu_var(cpu_state) = CPU_DEAD; - /* We shouldn't have to disable interrupts while dead, but -* some interrupts just don't seem to go away, and this makes -* it work for testing purposes. */ - /* Death loop */ - while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE) - cpu_relax(); - + /* +* With physical CPU hotplug, we should halt the cpu +*/ local_irq_disable(); - __flush_tlb_all(); - cpu_set(smp_processor_id(), cpu_online_map); - enable_APIC_timer(); - local_irq_enable(); + while (1) + __asm__ __volatile__(hlt:::memory); } #else static inline void play_dead(void) diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-04-12 10:37:50.646375616 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 10:37:50.656374096 +0800 @@ -798,8 +798,18 @@ wakeup_secondary_cpu(int phys_apicid, un #endif /* WAKE_SECONDARY_VIA_INIT */ extern cpumask_t cpu_initialized; +static inline int alloc_cpu_id(void) +{ + cpumask_t tmp_map; + int cpu; + cpus_complement(tmp_map, cpu_present_map); + cpu = first_cpu(tmp_map); + if (cpu = NR_CPUS) + return -ENODEV; + return cpu; +} -static int __devinit do_boot_cpu(int apicid) +static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -808,11 +818,12 @@ static int __devinit do_boot_cpu(int api { struct task_struct *idle; unsigned long boot_error; - int timeout, cpu; + int timeout; unsigned long start_eip; unsigned short nmi_high = 0, nmi_low = 0; - cpu = ++cpucount; + ++cpucount; + /* * We can't use kernel_thread since we must avoid to * reschedule the child. @@ -884,13 +895,16 @@ static int __devinit do_boot_cpu(int api inquire_remote_apic(apicid); } } - x86_cpu_to_apicid[cpu] = apicid; + if (boot_error) { /* Try to put things back the way they were before ... */ unmap_cpu_to_logical_apicid(cpu); cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */ cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */ cpucount--; + } else { + x86_cpu_to_apicid[cpu] = apicid; + cpu_set(cpu, cpu_present_map
Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU
On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote: > > > > > > I don't understand why this is needed at all. It looks like a fair > > > amount of code from do_exit is being duplicated here. > > Yes, exactly. Someone who understand do_exit please help clean up the > > code. I'd like to remove the idle thread, since the smpboot code will > > create a new idle thread. > > I'd say fix the smpboot code so that it doesn't create new idle tasks > except during boot. I tried what you said. But I must use a ugly method to adjust idle->thread.esp (stack pointer in IA32). otherwise, the stack will soon overflow after several rounds of hotplug. I'll take close look at if other fields in thread_info cause problems. Did you reinitialize the idle's thread_info in ppc? I have no problem to do it in IA32, but is this a good approach? Creating a new idle thread for upcoming CPU looks more graceful to me. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU
On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote: I don't understand why this is needed at all. It looks like a fair amount of code from do_exit is being duplicated here. Yes, exactly. Someone who understand do_exit please help clean up the code. I'd like to remove the idle thread, since the smpboot code will create a new idle thread. I'd say fix the smpboot code so that it doesn't create new idle tasks except during boot. I tried what you said. But I must use a ugly method to adjust idle-thread.esp (stack pointer in IA32). otherwise, the stack will soon overflow after several rounds of hotplug. I'll take close look at if other fields in thread_info cause problems. Did you reinitialize the idle's thread_info in ppc? I have no problem to do it in IA32, but is this a good approach? Creating a new idle thread for upcoming CPU looks more graceful to me. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/6]SEP initialization rework
On Tue, 2005-04-05 at 03:10, Zwane Mwaikambo wrote: > On Mon, 4 Apr 2005, Li Shaohua wrote: > > > linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ > > linux-2.6.11-root/arch/i386/kernel/sysenter.c | 10 ++ > > linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++ > > 3 files changed, 18 insertions(+), 4 deletions(-) > > > > diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup > > arch/i386/kernel/sysenter.c > > --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup > > 2005-03-28 09:32:30.936304248 +0800 > > +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-03-28 > > 09:58:20.703703792 +0800 > > @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info) > > int cpu = get_cpu(); > > struct tss_struct *tss = _cpu(init_tss, cpu); > > > > + if (!boot_cpu_has(X86_FEATURE_SEP)) { > > + put_cpu(); > > + return; > > + } > > + > > Do you have systems like this? Is it really skipping SEP if the boot > processor doesn't have SEP? No, I haven't such system. This is the logic of original SEP initialization. If the CPU hasn't SEP, original logic doesn't call 'on_each_cpu(enable_sep_cpu,...)'. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU
Hi, On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote: > > I'd say fix the smpboot code so that it doesn't create new idle tasks > except during boot. I'd like the the CPU hotremove case just likes the case that CPU isn't boot. A non-boot CPU hasn't a idle thread. But you may think it's not worthy doing. Anyway, I will keep the idle thread in a updated patch like what you said. > > > We've been > > > doing cpu removal on ppc64 logical partitions for a while and never > > > needed to do anything like this. > > Did it remove idle thread? or dead cpu is in a busy loop of idle? > > Neither. The cpu is definitely offline, but there is no reason to > free the idle thread. > > > > > > Maybe idle_task_exit would suffice? > > idle_task_exit seems just drop mm. We need destroy the idle task for > > physical CPU hotplug, right? > > No. > > > > > > > I don't understand the need for this, either. The existing cpu > > > hotplug notifier in the scheduler takes care of initializing the sched > > > domains and groups appropriately for online/offline events; why do you > > > need to touch the runqueue structures? > > If a CPU is physically hotremoved from the system, shouldn't we clean > > its runqueue? > > No. It should make zero difference to the scheduler whether the "play > dead" cpu hotplug or "physical" hotplug is being used. Keeping some fields like 'cpu_load' are meanless for a hotadded CPU to me. Just ignore them? Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/6]clean cpu state after hotremove CPU
On Tue, 2005-04-05 at 03:11, Zwane Mwaikambo wrote: > On Mon, 4 Apr 2005, Li Shaohua wrote: > > > Clean up all CPU states including its runqueue and idle thread, > > so we can use boot time code without any changes. > > Note this makes /sys/devices/system/cpu/cpux/online unworkable. > > > > #ifdef CONFIG_HOTPLUG_CPU > > #include > > + > > +#ifdef CONFIG_STR_SMP > > +extern void cpu_exit_clear(int); > > +#endif > > Perhaps change that ifdef to denote something which clearly shows that its > physical hotplug as we'll need this for other users too. Ok. > > +#ifdef CONFIG_STR_SMP > > +extern void do_exit_idle(void); > > +extern void cpu_uninit(void); > > +void cpu_exit_clear(int cpu) > > +{ > > + int sibling; > > + cpucount --; > > Is that protected by the cpu_control semaphore? cpu_exit_clear is called before the dead CPU ack CPU_DEAD, so it's finished before __cpu_die returns, which is protected by cpu_control. Maybe I should add comments for it. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 17:10, Pavel Machek wrote: > Hi! > > > > I'm switching suspend2 to use hotplug too. Li, I'll try adding your > > > patches as well as Zwane's if you like > > Great! > > > > > (suspend2 can enter S3, S4 or S5 > > > after writing the image). I'd love to try it on my HT desktop, and > > > hotplug will get more testing too :> > > Unfortunately, my patches break Pavel's swsusp SMP, as my patches break > > current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real > > mode. > > Uh, I don't like that one. Is it possible to put secondary CPUs back > to the real mode Possibly doesn't need the trouble. Send a SIPI also can wakeup the a CPU in protected mode. > so that cpu_up mechanism can handle them? If S4 also calls a smp_prepare_cpu, then the patches don't break S4. If people don't complain warm boot a CPU is slow, I'd like S4 also use smp_prepare_cpu. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/6]Add kconfig for S3 SMP
On Mon, 2005-04-04 at 16:59, Pavel Machek wrote: > Hi! > > > Add kconfig for IA32 S3 SMP. > > > > Thanks, > > Shaohua > > > > --- > > > > linux-2.6.11-root/kernel/power/Kconfig |7 +++ > > 1 files changed, 7 insertions(+) > > > > diff -puN kernel/power/Kconfig~smp_s3_kconfig kernel/power/Kconfig > > --- linux-2.6.11/kernel/power/Kconfig~smp_s3_kconfig2005-03-31 > > 10:49:57.156487160 +0800 > > +++ linux-2.6.11-root/kernel/power/Kconfig 2005-03-31 10:49:57.158486856 > > +0800 > > @@ -72,3 +72,10 @@ config PM_STD_PARTITION > > suspended image to. It will simply pick the first available swap > > device. > > > > +config STR_SMP > > + bool "Suspend to RAM SMP support (EXPERIMENTAL)" > > + depends on EXPERIMENTAL && ACPI_SLEEP && !X86_64 > > + depends on HOTPLUG_CPU > > + default y > > + ---help--- > > +enable Suspend to RAM SMP support. Some HT systems require this. > > Should this be config option? If we have ACPI_SLEEP and SMP set, we > should probably require this one (so that user does not have to > care) Sure, quite reasonable! > Also name is "interesting", perhaps CONFIG_SMP_SLEEP or > something? Just because my patches break S4 currently. After we figure out how to make both S3 and S4 work, I'll change it like you said. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/6]SEP initialization rework
Hi, On Mon, 2005-04-04 at 16:46, Pavel Machek wrote: > > --- > > > > linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ > > linux-2.6.11-root/arch/i386/kernel/sysenter.c | 10 ++ > > linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++ > > 3 files changed, 18 insertions(+), 4 deletions(-) > > > > diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup > > arch/i386/kernel/sysenter.c > > --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup > > 2005-03-28 09:32:30.936304248 +0800 > > +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-03-28 > > 09:58:20.703703792 +0800 > > @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info) > > int cpu = get_cpu(); > > struct tss_struct *tss = _cpu(init_tss, cpu); > > > > + if (!boot_cpu_has(X86_FEATURE_SEP)) { > > + put_cpu(); > > + return; > > + } > > + > > tss->ss1 = __KERNEL_CS; > > tss->esp1 = sizeof(struct tss_struct) + (unsigned long) tss; > > wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); > > @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info) > > extern const char vsyscall_int80_start, vsyscall_int80_end; > > extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; > > > > -static int __init sysenter_setup(void) > > +int __init sysenter_setup(void) > > { > > void *page = (void *)get_zeroed_page(GFP_ATOMIC); > > > > Can this still be __init? I think you are calling it from hotplug code > now, right? Only BP executes it. AP calls enable_sep_cpu. > > > diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup > > arch/i386/kernel/smpboot.c > > --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup > > 2005-03-28 09:33:49.972288952 +0800 > > +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 > > 09:46:01.814032096 +0800 > > @@ -415,6 +415,8 @@ static void __init smp_callin(void) > > > > static int cpucount; > > > > +extern int sysenter_setup(void); > > +extern void enable_sep_cpu(void *); > > /* > > * Activate a secondary processor. > > */ > > Perhaps these should go to header file somewhere? in asm-i386/smp.h? Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 16:01, Nigel Cunningham wrote: > Hi. > > I'm switching suspend2 to use hotplug too. Li, I'll try adding your > patches as well as Zwane's if you like Great! > (suspend2 can enter S3, S4 or S5 > after writing the image). I'd love to try it on my HT desktop, and > hotplug will get more testing too :> Unfortunately, my patches break Pavel's swsusp SMP, as my patches break current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real mode. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 16:01, Nigel Cunningham wrote: Hi. I'm switching suspend2 to use hotplug too. Li, I'll try adding your patches as well as Zwane's if you like Great! (suspend2 can enter S3, S4 or S5 after writing the image). I'd love to try it on my HT desktop, and hotplug will get more testing too : Unfortunately, my patches break Pavel's swsusp SMP, as my patches break current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real mode. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/6]SEP initialization rework
Hi, On Mon, 2005-04-04 at 16:46, Pavel Machek wrote: --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ linux-2.6.11-root/arch/i386/kernel/sysenter.c | 10 ++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup arch/i386/kernel/sysenter.c --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup 2005-03-28 09:32:30.936304248 +0800 +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-03-28 09:58:20.703703792 +0800 @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info) int cpu = get_cpu(); struct tss_struct *tss = per_cpu(init_tss, cpu); + if (!boot_cpu_has(X86_FEATURE_SEP)) { + put_cpu(); + return; + } + tss-ss1 = __KERNEL_CS; tss-esp1 = sizeof(struct tss_struct) + (unsigned long) tss; wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info) extern const char vsyscall_int80_start, vsyscall_int80_end; extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; -static int __init sysenter_setup(void) +int __init sysenter_setup(void) { void *page = (void *)get_zeroed_page(GFP_ATOMIC); Can this still be __init? I think you are calling it from hotplug code now, right? Only BP executes it. AP calls enable_sep_cpu. diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup 2005-03-28 09:33:49.972288952 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 09:46:01.814032096 +0800 @@ -415,6 +415,8 @@ static void __init smp_callin(void) static int cpucount; +extern int sysenter_setup(void); +extern void enable_sep_cpu(void *); /* * Activate a secondary processor. */ Perhaps these should go to header file somewhere? in asm-i386/smp.h? Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/6]Add kconfig for S3 SMP
On Mon, 2005-04-04 at 16:59, Pavel Machek wrote: Hi! Add kconfig for IA32 S3 SMP. Thanks, Shaohua --- linux-2.6.11-root/kernel/power/Kconfig |7 +++ 1 files changed, 7 insertions(+) diff -puN kernel/power/Kconfig~smp_s3_kconfig kernel/power/Kconfig --- linux-2.6.11/kernel/power/Kconfig~smp_s3_kconfig2005-03-31 10:49:57.156487160 +0800 +++ linux-2.6.11-root/kernel/power/Kconfig 2005-03-31 10:49:57.158486856 +0800 @@ -72,3 +72,10 @@ config PM_STD_PARTITION suspended image to. It will simply pick the first available swap device. +config STR_SMP + bool Suspend to RAM SMP support (EXPERIMENTAL) + depends on EXPERIMENTAL ACPI_SLEEP !X86_64 + depends on HOTPLUG_CPU + default y + ---help--- +enable Suspend to RAM SMP support. Some HT systems require this. Should this be config option? If we have ACPI_SLEEP and SMP set, we should probably require this one (so that user does not have to care) Sure, quite reasonable! Also name is interesting, perhaps CONFIG_SMP_SLEEP or something? Just because my patches break S4 currently. After we figure out how to make both S3 and S4 work, I'll change it like you said. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 17:10, Pavel Machek wrote: Hi! I'm switching suspend2 to use hotplug too. Li, I'll try adding your patches as well as Zwane's if you like Great! (suspend2 can enter S3, S4 or S5 after writing the image). I'd love to try it on my HT desktop, and hotplug will get more testing too : Unfortunately, my patches break Pavel's swsusp SMP, as my patches break current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real mode. Uh, I don't like that one. Is it possible to put secondary CPUs back to the real mode Possibly doesn't need the trouble. Send a SIPI also can wakeup the a CPU in protected mode. so that cpu_up mechanism can handle them? If S4 also calls a smp_prepare_cpu, then the patches don't break S4. If people don't complain warm boot a CPU is slow, I'd like S4 also use smp_prepare_cpu. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/6]clean cpu state after hotremove CPU
On Tue, 2005-04-05 at 03:11, Zwane Mwaikambo wrote: On Mon, 4 Apr 2005, Li Shaohua wrote: Clean up all CPU states including its runqueue and idle thread, so we can use boot time code without any changes. Note this makes /sys/devices/system/cpu/cpux/online unworkable. #ifdef CONFIG_HOTPLUG_CPU #include asm/nmi.h + +#ifdef CONFIG_STR_SMP +extern void cpu_exit_clear(int); +#endif Perhaps change that ifdef to denote something which clearly shows that its physical hotplug as we'll need this for other users too. Ok. +#ifdef CONFIG_STR_SMP +extern void do_exit_idle(void); +extern void cpu_uninit(void); +void cpu_exit_clear(int cpu) +{ + int sibling; + cpucount --; Is that protected by the cpu_control semaphore? cpu_exit_clear is called before the dead CPU ack CPU_DEAD, so it's finished before __cpu_die returns, which is protected by cpu_control. Maybe I should add comments for it. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU
Hi, On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote: I'd say fix the smpboot code so that it doesn't create new idle tasks except during boot. I'd like the the CPU hotremove case just likes the case that CPU isn't boot. A non-boot CPU hasn't a idle thread. But you may think it's not worthy doing. Anyway, I will keep the idle thread in a updated patch like what you said. We've been doing cpu removal on ppc64 logical partitions for a while and never needed to do anything like this. Did it remove idle thread? or dead cpu is in a busy loop of idle? Neither. The cpu is definitely offline, but there is no reason to free the idle thread. Maybe idle_task_exit would suffice? idle_task_exit seems just drop mm. We need destroy the idle task for physical CPU hotplug, right? No. I don't understand the need for this, either. The existing cpu hotplug notifier in the scheduler takes care of initializing the sched domains and groups appropriately for online/offline events; why do you need to touch the runqueue structures? If a CPU is physically hotremoved from the system, shouldn't we clean its runqueue? No. It should make zero difference to the scheduler whether the play dead cpu hotplug or physical hotplug is being used. Keeping some fields like 'cpu_load' are meanless for a hotadded CPU to me. Just ignore them? Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 1/6]SEP initialization rework
On Tue, 2005-04-05 at 03:10, Zwane Mwaikambo wrote: On Mon, 4 Apr 2005, Li Shaohua wrote: linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ linux-2.6.11-root/arch/i386/kernel/sysenter.c | 10 ++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup arch/i386/kernel/sysenter.c --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup 2005-03-28 09:32:30.936304248 +0800 +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-03-28 09:58:20.703703792 +0800 @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info) int cpu = get_cpu(); struct tss_struct *tss = per_cpu(init_tss, cpu); + if (!boot_cpu_has(X86_FEATURE_SEP)) { + put_cpu(); + return; + } + Do you have systems like this? Is it really skipping SEP if the boot processor doesn't have SEP? No, I haven't such system. This is the logic of original SEP initialization. If the CPU hasn't SEP, original logic doesn't call 'on_each_cpu(enable_sep_cpu,...)'. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU
Hi, On Mon, 2005-04-04 at 13:28, Nathan Lynch wrote: > On Mon, Apr 04, 2005 at 10:07:02AM +0800, Li Shaohua wrote: > > Clean up all CPU states including its runqueue and idle thread, > > so we can use boot time code without any changes. > > Note this makes /sys/devices/system/cpu/cpux/online unworkable. > > In what sense does it make the online attribute unworkable? I removed the idle thread and other CPU states, and makes the dead CPU into a 'halt' busy loop. > > > diff -puN kernel/exit.c~cpu_state_clean kernel/exit.c > > --- linux-2.6.11/kernel/exit.c~cpu_state_clean 2005-03-31 > > 10:50:27.0 +0800 > > +++ linux-2.6.11-root/kernel/exit.c 2005-03-31 10:50:27.0 +0800 > > @@ -845,6 +845,65 @@ fastcall NORET_TYPE void do_exit(long co > > for (;;) ; > > } > > > > +#ifdef CONFIG_STR_SMP > > +void do_exit_idle(void) > > +{ > > + struct task_struct *tsk = current; > > + int group_dead; > > + > > + BUG_ON(tsk->pid); > > + BUG_ON(tsk->mm); > > + > > + if (tsk->io_context) > > + exit_io_context(); > > + tsk->flags |= PF_EXITING; > > + tsk->it_virt_expires = cputime_zero; > > + tsk->it_prof_expires = cputime_zero; > > + tsk->it_sched_expires = 0; > > + > > + acct_update_integrals(tsk); > > + update_mem_hiwater(tsk); > > + group_dead = atomic_dec_and_test(>signal->live); > > + if (group_dead) { > > + del_timer_sync(>signal->real_timer); > > + acct_process(-1); > > + } > > + exit_mm(tsk); > > + > > + exit_sem(tsk); > > + __exit_files(tsk); > > + __exit_fs(tsk); > > + exit_namespace(tsk); > > + exit_thread(); > > + exit_keys(tsk); > > + > > + if (group_dead && tsk->signal->leader) > > + disassociate_ctty(1); > > + > > + module_put(tsk->thread_info->exec_domain->module); > > + if (tsk->binfmt) > > + module_put(tsk->binfmt->module); > > + > > + tsk->exit_code = -1; > > + tsk->exit_state = EXIT_DEAD; > > + > > + /* in release_task */ > > + atomic_dec(>user->processes); > > + write_lock_irq(_lock); > > + __exit_signal(tsk); > > + __exit_sighand(tsk); > > + write_unlock_irq(_lock); > > + release_thread(tsk); > > + put_task_struct(tsk); > > + > > + tsk->flags |= PF_DEAD; > > +#ifdef CONFIG_NUMA > > + mpol_free(tsk->mempolicy); > > + tsk->mempolicy = NULL; > > +#endif > > +} > > +#endif > > I don't understand why this is needed at all. It looks like a fair > amount of code from do_exit is being duplicated here. Yes, exactly. Someone who understand do_exit please help clean up the code. I'd like to remove the idle thread, since the smpboot code will create a new idle thread. > We've been > doing cpu removal on ppc64 logical partitions for a while and never > needed to do anything like this. Did it remove idle thread? or dead cpu is in a busy loop of idle? > Maybe idle_task_exit would suffice? idle_task_exit seems just drop mm. We need destroy the idle task for physical CPU hotplug, right? > > > > diff -puN kernel/sched.c~cpu_state_clean kernel/sched.c > > --- linux-2.6.11/kernel/sched.c~cpu_state_clean 2005-03-31 > > 10:50:27.0 +0800 > > +++ linux-2.6.11-root/kernel/sched.c2005-04-04 09:06:40.362357104 > > +0800 > > @@ -4028,6 +4028,58 @@ void __devinit init_idle(task_t *idle, i > > } > > > > /* > > + * Initial dummy domain for early boot and for hotplug cpu. Being static, > > + * it is initialized to zero, so all balancing flags are cleared which is > > + * what we want. > > + */ > > +static struct sched_domain sched_domain_dummy; > > + > > +#ifdef CONFIG_STR_SMP > > +static void __devinit exit_idle(int cpu) > > +{ > > + runqueue_t *rq = cpu_rq(cpu); > > + struct task_struct *p = rq->idle; > > + int j, k; > > + prio_array_t *array; > > + > > + /* init runqueue */ > > + spin_lock_init(>lock); > > + rq->active = rq->arrays; > > + rq->expired = rq->arrays + 1; > > + rq->best_expired_prio = MAX_PRIO; > > + > > + rq->prev_mm = NULL; > > + rq->curr = rq->idle = NULL; > > + rq->expired_timestamp = 0; > > + > > + rq->sd = _domain_dummy; > > + rq->cpu_load = 0; > > + rq->active_balance = 0; > > + rq->push_cpu = 0; >
Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 10:48, Andrew Morton wrote: > Li Shaohua <[EMAIL PROTECTED]> wrote: > > > > On Mon, 2005-04-04 at 10:37, Andrew Morton wrote: > > > Li Shaohua <[EMAIL PROTECTED]> wrote: > > > > > > > > The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm > > > > tree. > > > > > > Should I merge that thing into mainline? It seems that a few people are > > > needing it. > > I'd like to listen to some comments first. There are still some things > > I'm not sure, such as the do_exit_idle. > > > > I was referring to Zwane's i386-cpu-hotplug-updated-for-mm.patch Yep, great. Pavel's swsusp also need it. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 10:37, Andrew Morton wrote: > Li Shaohua <[EMAIL PROTECTED]> wrote: > > > > The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm > > tree. > > Should I merge that thing into mainline? It seems that a few people are > needing it. I'd like to listen to some comments first. There are still some things I'm not sure, such as the do_exit_idle. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 6/6]Physcial CPU hotadd and S3 SMP support
Boot a CPU at runtime and use it to support S3 SMP. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 79 +++ linux-2.6.11-root/include/asm-i386/smp.h |4 + linux-2.6.11-root/kernel/power/main.c| 30 ++ 3 files changed, 104 insertions(+), 9 deletions(-) diff -puN arch/i386/kernel/smpboot.c~warmboot_cpu arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~warmboot_cpu2005-04-04 09:13:48.600255048 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 09:13:48.607253984 +0800 @@ -76,6 +76,12 @@ cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; static cpumask_t smp_commenced_mask; +/* This is ugly, but TSC's upper 32 bits can't be written in eariler CPU + * (before prescott), there is no way to resync one AP against BP + * TBD: for prescott and above, we should use IA64's algorithm + */ +static int __devinit tsc_sync_disabled; + /* Per CPU bogomips and other parameters */ struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned; @@ -412,7 +418,7 @@ static void __devinit smp_callin(void) /* * Synchronize the TSC with the BP */ - if (cpu_has_tsc && cpu_khz) + if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled) synchronize_tsc_ap(); } @@ -781,8 +787,19 @@ wakeup_secondary_cpu(int phys_apicid, un #endif /* WAKE_SECONDARY_VIA_INIT */ extern cpumask_t cpu_initialized; +static inline int alloc_cpu_id(void) +{ + cpumask_t tmp_map; + int cpu; -static int __devinit do_boot_cpu(int apicid) + cpus_complement(tmp_map, cpu_present_map); + cpu = first_cpu(tmp_map); + if (cpu >= NR_CPUS) + return -ENODEV; + return cpu; +} + +static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -791,15 +808,10 @@ static int __devinit do_boot_cpu(int api { struct task_struct *idle; unsigned long boot_error; - int timeout, cpu; + int timeout; unsigned long start_eip; unsigned short nmi_high = 0, nmi_low = 0; - cpumask_t tmp_map; - cpus_complement(tmp_map, cpu_present_map); - cpu = first_cpu(tmp_map); - if (cpu >= NR_CPUS) - return -ENODEV; ++cpucount; /* * We can't use kernel_thread since we must avoid to @@ -920,6 +932,53 @@ void cpu_exit_clear(int cpu) do_exit_idle(); } + +struct warm_boot_cpu_info { + struct completion *complete; + int apicid; + int cpu; +}; + +static void __devinit do_warm_boot_cpu(void *p) +{ + struct warm_boot_cpu_info *info = p; + do_boot_cpu(info->apicid, info->cpu); + complete(info->complete); +} + +int __devinit smp_prepare_cpu(int apicid) +{ + DECLARE_COMPLETION(done); + struct warm_boot_cpu_info info; + struct work_struct task; + int cpu; + + lock_cpu_hotplug(); + cpu = alloc_cpu_id(); + + if (cpu < 0) + goto exit; + + info.complete = + info.apicid = apicid; + info.cpu = cpu; + INIT_WORK(, do_warm_boot_cpu, ); + + tsc_sync_disabled = 1; + + /* init low mem mapping */ + memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, + sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS); + flush_tlb_all(); + schedule_work(); + wait_for_completion(); + + tsc_sync_disabled = 0; + zap_low_mappings(); +exit: + unlock_cpu_hotplug(); + return cpu; +} #endif static void smp_tune_scheduling (void) { @@ -1064,7 +1123,7 @@ static void __init smp_boot_cpus(unsigne if (max_cpus <= cpucount+1) continue; - if (do_boot_cpu(apicid)) + if (((cpu = alloc_cpu_id()) > 0) && do_boot_cpu(apicid, cpu)) printk("CPU #%d not responding - cannot use it.\n", apicid); else @@ -1253,10 +1312,12 @@ void __init smp_cpus_done(unsigned int m setup_ioapic_dest(); #endif zap_low_mappings(); +#ifndef CONFIG_STR_SMP /* * Disable executability of the SMP trampoline: */ set_kernel_exec((unsigned long)trampoline_base, trampoline_exec); +#endif } void __init smp_intr_init(void) diff -puN kernel/power/main.c~warmboot_cpu kernel/power/main.c --- linux-2.6.11/kernel/power/main.c~warmboot_cpu 2005-04-04 09:13:48.601254896 +0800 +++ linux-2.6.11-root/kernel/power/main.c 2005-04-04 09:13:48.607253984 +0800 @@ -15,6 +15,7 @@ #include #include #include +#include #include "power.h" @@ -137,6 +138,24 @@ static char * pm_states[] = { static int enter_state(suspend_state_t state) { int error; +#ifdef CONFIG_STR_SMP +
[RFC 5/6]clean cpu state after hotremove CPU
Clean up all CPU states including its runqueue and idle thread, so we can use boot time code without any changes. Note this makes /sys/devices/system/cpu/cpux/online unworkable. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 12 linux-2.6.11-root/arch/i386/kernel/irq.c|5 + linux-2.6.11-root/arch/i386/kernel/process.c| 20 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c| 44 - linux-2.6.11-root/include/asm-i386/irq.h|2 linux-2.6.11-root/kernel/exit.c | 59 +++ linux-2.6.11-root/kernel/sched.c| 61 +--- 7 files changed, 195 insertions(+), 8 deletions(-) diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c --- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-04 09:07:29.172936768 +0800 @@ -144,12 +144,32 @@ static void poll_idle (void) #ifdef CONFIG_HOTPLUG_CPU #include + +#ifdef CONFIG_STR_SMP +extern void cpu_exit_clear(int); +#endif + /* We don't actually take CPU down, just spin without interrupts. */ static inline void play_dead(void) { +#ifdef CONFIG_STR_SMP + cpu_exit_clear(_smp_processor_id()); +#endif + /* Ack it */ __get_cpu_var(cpu_state) = CPU_DEAD; +#ifdef CONFIG_STR_SMP + /* +* With physical CPU hotplug, we should halt the CPU +* Note: release idle task struct requires the CPU doesn't +* touch stack or anything else. +*/ + local_irq_disable(); + while (1) + __asm__ __volatile__ ("hlt": : :"memory"); +#endif + /* We shouldn't have to disable interrupts while dead, but * some interrupts just don't seem to go away, and this makes * it "work" for testing purposes. */ diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 09:05:41.699275248 +0800 @@ -794,8 +794,13 @@ static int __devinit do_boot_cpu(int api int timeout, cpu; unsigned long start_eip; unsigned short nmi_high = 0, nmi_low = 0; + cpumask_t tmp_map; - cpu = ++cpucount; + cpus_complement(tmp_map, cpu_present_map); + cpu = first_cpu(tmp_map); + if (cpu >= NR_CPUS) + return -ENODEV; + ++cpucount; /* * We can't use kernel_thread since we must avoid to * reschedule the child. @@ -867,13 +872,16 @@ static int __devinit do_boot_cpu(int api inquire_remote_apic(apicid); } } - x86_cpu_to_apicid[cpu] = apicid; + if (boot_error) { /* Try to put things back the way they were before ... */ unmap_cpu_to_logical_apicid(cpu); cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */ cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */ cpucount--; + } else { + x86_cpu_to_apicid[cpu] = apicid; + cpu_set(cpu, cpu_present_map); } /* mark "stuck" area as not stuck */ @@ -882,6 +890,37 @@ static int __devinit do_boot_cpu(int api return boot_error; } +#ifdef CONFIG_STR_SMP +extern void do_exit_idle(void); +extern void cpu_uninit(void); +void cpu_exit_clear(int cpu) +{ + int sibling; + cpucount --; + + cpu_uninit(); + + irq_ctx_exit(cpu); + + cpu_clear(cpu, cpu_callout_map); + cpu_clear(cpu, cpu_callin_map); + cpu_clear(cpu, cpu_present_map); + + x86_cpu_to_apicid[cpu] = BAD_APICID; + + for_each_cpu_mask(sibling, cpu_sibling_map[cpu]) + cpu_clear(cpu, cpu_sibling_map[sibling]); + cpus_clear(cpu_sibling_map[cpu]); + + phys_proc_id[cpu] = BAD_APICID; + + cpu_clear(cpu, smp_commenced_mask); + + unmap_cpu_to_logical_apicid(cpu); + + do_exit_idle(); +} +#endif static void smp_tune_scheduling (void) { unsigned long cachesize; /* kB */ @@ -1104,6 +1143,7 @@ void __devinit smp_prepare_boot_cpu(void { cpu_set(smp_processor_id(), cpu_online_map); cpu_set(smp_processor_id(), cpu_callout_map); + cpu_set(smp_processor_id(), cpu_present_map); } #ifdef CONFIG_HOTPLUG_CPU diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-03-31 10:50:27.0 +0800 @@ -621,3 +621,15 @@ void __devinit cpu_init (void) clear_used_math(); mxcsr_feature_mask_init(); } + +#ifdef CONFIG_STR_SMP +void
[RFC 3/6]init call cleanup
Trival patch for CPU hotplug. In CPU identify part, only did cleanup for intel CPUs. Need do for other CPUs if they support S3 SMP. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/apic.c| 14 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 30 +++ linux-2.6.11-root/arch/i386/kernel/cpu/intel.c | 10 ++--- linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c |4 +- linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c |2 - linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c |2 - linux-2.6.11-root/arch/i386/kernel/process.c |2 - linux-2.6.11-root/arch/i386/kernel/smpboot.c | 18 - linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 - 11 files changed, 46 insertions(+), 46 deletions(-) diff -puN arch/i386/kernel/process.c~init_call_cleanup arch/i386/kernel/process.c --- linux-2.6.11/arch/i386/kernel/process.c~init_call_cleanup 2005-03-31 10:48:40.721107104 +0800 +++ linux-2.6.11-root/arch/i386/kernel/process.c2005-03-31 10:48:40.745103456 +0800 @@ -242,7 +242,7 @@ static void mwait_idle(void) } } -void __init select_idle_routine(const struct cpuinfo_x86 *c) +void __devinit select_idle_routine(const struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_MWAIT)) { printk("monitor/mwait feature present.\n"); diff -puN arch/i386/kernel/smpboot.c~init_call_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~init_call_cleanup 2005-03-31 10:48:40.722106952 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-31 10:48:40.746103304 +0800 @@ -59,7 +59,7 @@ #include /* Set if we find a B stepping CPU */ -static int __initdata smp_b_stepping; +static int __devinitdata smp_b_stepping; /* Number of siblings per CPU package */ int smp_num_siblings = 1; @@ -103,7 +103,7 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 }; * has made sure it's suitably aligned. */ -static unsigned long __init setup_trampoline(void) +static unsigned long __devinit setup_trampoline(void) { memcpy(trampoline_base, trampoline_data, trampoline_end - trampoline_data); return virt_to_phys(trampoline_base); @@ -133,7 +133,7 @@ void __init smp_alloc_memory(void) * a given CPU */ -static void __init smp_store_cpu_info(int id) +static void __devinit smp_store_cpu_info(int id) { struct cpuinfo_x86 *c = cpu_data + id; @@ -327,7 +327,7 @@ extern void calibrate_delay(void); static atomic_t init_deasserted; -static void __init smp_callin(void) +static void __devinit smp_callin(void) { int cpuid, phys_id; unsigned long timeout; @@ -423,7 +423,7 @@ extern void enable_sep_cpu(void *); /* * Activate a secondary processor. */ -static void __init start_secondary(void *unused) +static void __devinit start_secondary(void *unused) { int siblings = 0; int i; @@ -486,7 +486,7 @@ static void __init start_secondary(void * from the task structure * This function must not return. */ -void __init initialize_secondary(void) +void __devinit initialize_secondary(void) { /* * We don't actually need to load the full TSS, @@ -600,7 +600,7 @@ static inline void __inquire_remote_apic * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this * won't ... remember to clear down the APIC, etc later. */ -static int __init +static int __devinit wakeup_secondary_cpu(int logical_apicid, unsigned long start_eip) { unsigned long send_status = 0, accept_status = 0; @@ -646,7 +646,7 @@ wakeup_secondary_cpu(int logical_apicid, #endif /* WAKE_SECONDARY_VIA_NMI */ #ifdef WAKE_SECONDARY_VIA_INIT -static int __init +static int __devinit wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip) { unsigned long send_status = 0, accept_status = 0; @@ -782,7 +782,7 @@ wakeup_secondary_cpu(int phys_apicid, un extern cpumask_t cpu_initialized; -static int __init do_boot_cpu(int apicid) +static int __devinit do_boot_cpu(int apicid) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup 2005-03-31 10:48:40.724106648 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-03-31 10:48:40.747103152 +0800 @@ -21,9 +21,9 @@ DEFINE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]); EXPORT_PER_CPU_SYMBOL(cpu_gdt_table); -static int cachesize_override __initdata = -1; -static int disable_x86_fxsr __initdata = 0; -static int disable_x86_serial_nr __initdata = 1; +static int cachesize_override __devinitdata =
[RFC 4/6]Add kconfig for S3 SMP
Add kconfig for IA32 S3 SMP. Thanks, Shaohua --- linux-2.6.11-root/kernel/power/Kconfig |7 +++ 1 files changed, 7 insertions(+) diff -puN kernel/power/Kconfig~smp_s3_kconfig kernel/power/Kconfig --- linux-2.6.11/kernel/power/Kconfig~smp_s3_kconfig2005-03-31 10:49:57.156487160 +0800 +++ linux-2.6.11-root/kernel/power/Kconfig 2005-03-31 10:49:57.158486856 +0800 @@ -72,3 +72,10 @@ config PM_STD_PARTITION suspended image to. It will simply pick the first available swap device. +config STR_SMP + bool "Suspend to RAM SMP support (EXPERIMENTAL)" + depends on EXPERIMENTAL && ACPI_SLEEP && !X86_64 + depends on HOTPLUG_CPU + default y + ---help--- +enable Suspend to RAM SMP support. Some HT systems require this. _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 2/6]cpu_sibling_map rework
Make sibling map init per-cpu. Hotplug CPU may change the map at runtime. cpuhotplug semaphore should be used to protect the map. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 56 +-- 1 files changed, 29 insertions(+), 27 deletions(-) diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 2005-03-28 16:29:55.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-31 10:46:51.572700184 +0800 @@ -63,9 +63,12 @@ static int __initdata smp_b_stepping; /* Number of siblings per CPU package */ int smp_num_siblings = 1; -int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */ +/* Package ID of each logical CPU */ +int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID}; EXPORT_SYMBOL(phys_proc_id); +cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; + /* bitmap of online cpus */ cpumask_t cpu_online_map; @@ -422,6 +425,9 @@ extern void enable_sep_cpu(void *); */ static void __init start_secondary(void *unused) { + int siblings = 0; + int i; + int self = smp_processor_id(); /* * Dont put anything before smp_callin(), SMP * booting is too fragile that we want to limit the @@ -443,6 +449,27 @@ static void __init start_secondary(void * the local TLBs too. */ local_flush_tlb(); + + /* This must be doen before setting cpu_online_map */ + if (smp_num_siblings > 1) { + for (i = 0; i < NR_CPUS; i++) { + if (!cpu_isset(i, cpu_callout_map)) + continue; + if (phys_proc_id[self] == phys_proc_id[i]) { + siblings ++; + cpu_set(i, cpu_sibling_map[self]); + cpu_set(self, cpu_sibling_map[i]); + } + } + } else { + siblings ++; + cpu_set(self, cpu_sibling_map[self]); + } + + if (siblings != smp_num_siblings) + printk(KERN_WARNING "WARNING: %d siblings found for CPU%d, should be %d\n", siblings, self, smp_num_siblings); + wmb(); + cpu_set(smp_processor_id(), cpu_online_map); /* We can take interrupts now: we're officially "up". */ @@ -893,8 +920,6 @@ static int boot_cpu_logical_apicid; /* Where the IO area was mapped on multiquad, always 0 otherwise */ void *xquad_portio; -cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; - static void __init smp_boot_cpus(unsigned int max_cpus) { int apicid, cpu, bit, kicked; @@ -1049,30 +1074,7 @@ static void __init smp_boot_cpus(unsigne */ for (cpu = 0; cpu < NR_CPUS; cpu++) cpus_clear(cpu_sibling_map[cpu]); - - for (cpu = 0; cpu < NR_CPUS; cpu++) { - int siblings = 0; - int i; - if (!cpu_isset(cpu, cpu_callout_map)) - continue; - - if (smp_num_siblings > 1) { - for (i = 0; i < NR_CPUS; i++) { - if (!cpu_isset(i, cpu_callout_map)) - continue; - if (phys_proc_id[cpu] == phys_proc_id[i]) { - siblings++; - cpu_set(i, cpu_sibling_map[cpu]); - } - } - } else { - siblings++; - cpu_set(cpu, cpu_sibling_map[cpu]); - } - - if (siblings != smp_num_siblings) - printk(KERN_WARNING "WARNING: %d siblings found for CPU%d, should be %d\n", siblings, cpu, smp_num_siblings); - } + cpu_set(0, cpu_sibling_map[0]); if (nmi_watchdog == NMI_LOCAL_APIC) check_nmi_watchdog(); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 0/6] S3 SMP support with physcial CPU hotplug
Hi, The following 6 patches try to add suspend-to-ram (or S3) SMP support for IA32. It's for support HT based system suspend/resume currently and most of the code are also useful for physical CPU hotplug. In a SMP system, after S3 resume, the BP is starting to execute the ACPI wakeup address just like the UP case. And the APs possibly are in a BIOS's busy loop. This just looks like the boot time case, we must use a SIPI circle to wakeup the APs. We uses the CPU hotplug infrastructure. In order to reuse the SMP boot code, we clean up all CPU states after the CPU is dead, including its idle thread, runqueue and other CPU states. Since the CPU is in idle thread before suspend, we don't require to save and restore after resume most of the CPU states. Now the sequences of S3 are: 1. hotremove all APs, put them into idle thread. 2. follow UP S3 code path. 3. warm boot all APs. 4. UP all APs. The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm tree. To test the SMP S3, please don't enable MTRR driver (it's SMP broken for Suspend/resume). And please kill syslogd, there is a bug in the sususpend/resume refrigerator mechanism, which can be fixed by swsusp2 refrigerator. I'm looking forward to your comments. Thanks in advance! Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 1/6]SEP initialization rework
Make SEP init per-cpu, so is hotplug safe. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ linux-2.6.11-root/arch/i386/kernel/sysenter.c | 10 ++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup arch/i386/kernel/sysenter.c --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup 2005-03-28 09:32:30.936304248 +0800 +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-03-28 09:58:20.703703792 +0800 @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info) int cpu = get_cpu(); struct tss_struct *tss = _cpu(init_tss, cpu); + if (!boot_cpu_has(X86_FEATURE_SEP)) { + put_cpu(); + return; + } + tss->ss1 = __KERNEL_CS; tss->esp1 = sizeof(struct tss_struct) + (unsigned long) tss; wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info) extern const char vsyscall_int80_start, vsyscall_int80_end; extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; -static int __init sysenter_setup(void) +int __init sysenter_setup(void) { void *page = (void *)get_zeroed_page(GFP_ATOMIC); @@ -58,8 +63,5 @@ static int __init sysenter_setup(void) _sysenter_start, _sysenter_end - _sysenter_start); - on_each_cpu(enable_sep_cpu, NULL, 1, 1); return 0; } - -__initcall(sysenter_setup); diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-03-28 09:33:49.972288952 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 09:46:01.814032096 +0800 @@ -415,6 +415,8 @@ static void __init smp_callin(void) static int cpucount; +extern int sysenter_setup(void); +extern void enable_sep_cpu(void *); /* * Activate a secondary processor. */ @@ -445,6 +447,7 @@ static void __init start_secondary(void /* We can take interrupts now: we're officially "up". */ local_irq_enable(); + enable_sep_cpu(NULL); wmb(); cpu_idle(); @@ -913,6 +916,9 @@ static void __init smp_boot_cpus(unsigne cpus_clear(cpu_sibling_map[0]); cpu_set(0, cpu_sibling_map[0]); + sysenter_setup(); + enable_sep_cpu(NULL); + /* * If we couldn't find an SMP configuration at boot time, * get out of here now! diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup arch/i386/mach-voyager/voyager_smp.c --- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 2005-03-28 09:48:27.909822160 +0800 +++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c 2005-03-28 09:51:37.896939728 +0800 @@ -441,6 +441,8 @@ setup_trampoline(void) return virt_to_phys((__u8 *)trampoline_base); } +extern void enable_sep_cpu(void *); +extern int sysenter_setup(void); /* Routine initially called when a non-boot CPU is brought online */ static void __init start_secondary(void *unused) @@ -499,6 +501,7 @@ start_secondary(void *unused) while (!cpu_isset(cpuid, smp_commenced_mask)) rep_nop(); local_irq_enable(); + enable_sep_cpu(NULL); local_flush_tlb(); @@ -696,6 +699,9 @@ smp_boot_cpus(void) printk("CPU%d: ", boot_cpu_id); print_cpu_info(_data[boot_cpu_id]); + sysenter_setup(); + enable_sep_cpu(NULL); + if(is_cpu_quad()) { /* booting on a Quad CPU */ printk("VOYAGER SMP: Boot CPU is Quad\n"); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 1/6]SEP initialization rework
Make SEP init per-cpu, so is hotplug safe. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c |6 ++ linux-2.6.11-root/arch/i386/kernel/sysenter.c | 10 ++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup arch/i386/kernel/sysenter.c --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup 2005-03-28 09:32:30.936304248 +0800 +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c 2005-03-28 09:58:20.703703792 +0800 @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info) int cpu = get_cpu(); struct tss_struct *tss = per_cpu(init_tss, cpu); + if (!boot_cpu_has(X86_FEATURE_SEP)) { + put_cpu(); + return; + } + tss-ss1 = __KERNEL_CS; tss-esp1 = sizeof(struct tss_struct) + (unsigned long) tss; wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info) extern const char vsyscall_int80_start, vsyscall_int80_end; extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; -static int __init sysenter_setup(void) +int __init sysenter_setup(void) { void *page = (void *)get_zeroed_page(GFP_ATOMIC); @@ -58,8 +63,5 @@ static int __init sysenter_setup(void) vsyscall_sysenter_start, vsyscall_sysenter_end - vsyscall_sysenter_start); - on_each_cpu(enable_sep_cpu, NULL, 1, 1); return 0; } - -__initcall(sysenter_setup); diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-03-28 09:33:49.972288952 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 09:46:01.814032096 +0800 @@ -415,6 +415,8 @@ static void __init smp_callin(void) static int cpucount; +extern int sysenter_setup(void); +extern void enable_sep_cpu(void *); /* * Activate a secondary processor. */ @@ -445,6 +447,7 @@ static void __init start_secondary(void /* We can take interrupts now: we're officially up. */ local_irq_enable(); + enable_sep_cpu(NULL); wmb(); cpu_idle(); @@ -913,6 +916,9 @@ static void __init smp_boot_cpus(unsigne cpus_clear(cpu_sibling_map[0]); cpu_set(0, cpu_sibling_map[0]); + sysenter_setup(); + enable_sep_cpu(NULL); + /* * If we couldn't find an SMP configuration at boot time, * get out of here now! diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup arch/i386/mach-voyager/voyager_smp.c --- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 2005-03-28 09:48:27.909822160 +0800 +++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c 2005-03-28 09:51:37.896939728 +0800 @@ -441,6 +441,8 @@ setup_trampoline(void) return virt_to_phys((__u8 *)trampoline_base); } +extern void enable_sep_cpu(void *); +extern int sysenter_setup(void); /* Routine initially called when a non-boot CPU is brought online */ static void __init start_secondary(void *unused) @@ -499,6 +501,7 @@ start_secondary(void *unused) while (!cpu_isset(cpuid, smp_commenced_mask)) rep_nop(); local_irq_enable(); + enable_sep_cpu(NULL); local_flush_tlb(); @@ -696,6 +699,9 @@ smp_boot_cpus(void) printk(CPU%d: , boot_cpu_id); print_cpu_info(cpu_data[boot_cpu_id]); + sysenter_setup(); + enable_sep_cpu(NULL); + if(is_cpu_quad()) { /* booting on a Quad CPU */ printk(VOYAGER SMP: Boot CPU is Quad\n); _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 0/6] S3 SMP support with physcial CPU hotplug
Hi, The following 6 patches try to add suspend-to-ram (or S3) SMP support for IA32. It's for support HT based system suspend/resume currently and most of the code are also useful for physical CPU hotplug. In a SMP system, after S3 resume, the BP is starting to execute the ACPI wakeup address just like the UP case. And the APs possibly are in a BIOS's busy loop. This just looks like the boot time case, we must use a SIPI circle to wakeup the APs. We uses the CPU hotplug infrastructure. In order to reuse the SMP boot code, we clean up all CPU states after the CPU is dead, including its idle thread, runqueue and other CPU states. Since the CPU is in idle thread before suspend, we don't require to save and restore after resume most of the CPU states. Now the sequences of S3 are: 1. hotremove all APs, put them into idle thread. 2. follow UP S3 code path. 3. warm boot all APs. 4. UP all APs. The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm tree. To test the SMP S3, please don't enable MTRR driver (it's SMP broken for Suspend/resume). And please kill syslogd, there is a bug in the sususpend/resume refrigerator mechanism, which can be fixed by swsusp2 refrigerator. I'm looking forward to your comments. Thanks in advance! Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 2/6]cpu_sibling_map rework
Make sibling map init per-cpu. Hotplug CPU may change the map at runtime. cpuhotplug semaphore should be used to protect the map. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 56 +-- 1 files changed, 29 insertions(+), 27 deletions(-) diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 2005-03-28 16:29:55.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-31 10:46:51.572700184 +0800 @@ -63,9 +63,12 @@ static int __initdata smp_b_stepping; /* Number of siblings per CPU package */ int smp_num_siblings = 1; -int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */ +/* Package ID of each logical CPU */ +int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID}; EXPORT_SYMBOL(phys_proc_id); +cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; + /* bitmap of online cpus */ cpumask_t cpu_online_map; @@ -422,6 +425,9 @@ extern void enable_sep_cpu(void *); */ static void __init start_secondary(void *unused) { + int siblings = 0; + int i; + int self = smp_processor_id(); /* * Dont put anything before smp_callin(), SMP * booting is too fragile that we want to limit the @@ -443,6 +449,27 @@ static void __init start_secondary(void * the local TLBs too. */ local_flush_tlb(); + + /* This must be doen before setting cpu_online_map */ + if (smp_num_siblings 1) { + for (i = 0; i NR_CPUS; i++) { + if (!cpu_isset(i, cpu_callout_map)) + continue; + if (phys_proc_id[self] == phys_proc_id[i]) { + siblings ++; + cpu_set(i, cpu_sibling_map[self]); + cpu_set(self, cpu_sibling_map[i]); + } + } + } else { + siblings ++; + cpu_set(self, cpu_sibling_map[self]); + } + + if (siblings != smp_num_siblings) + printk(KERN_WARNING WARNING: %d siblings found for CPU%d, should be %d\n, siblings, self, smp_num_siblings); + wmb(); + cpu_set(smp_processor_id(), cpu_online_map); /* We can take interrupts now: we're officially up. */ @@ -893,8 +920,6 @@ static int boot_cpu_logical_apicid; /* Where the IO area was mapped on multiquad, always 0 otherwise */ void *xquad_portio; -cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned; - static void __init smp_boot_cpus(unsigned int max_cpus) { int apicid, cpu, bit, kicked; @@ -1049,30 +1074,7 @@ static void __init smp_boot_cpus(unsigne */ for (cpu = 0; cpu NR_CPUS; cpu++) cpus_clear(cpu_sibling_map[cpu]); - - for (cpu = 0; cpu NR_CPUS; cpu++) { - int siblings = 0; - int i; - if (!cpu_isset(cpu, cpu_callout_map)) - continue; - - if (smp_num_siblings 1) { - for (i = 0; i NR_CPUS; i++) { - if (!cpu_isset(i, cpu_callout_map)) - continue; - if (phys_proc_id[cpu] == phys_proc_id[i]) { - siblings++; - cpu_set(i, cpu_sibling_map[cpu]); - } - } - } else { - siblings++; - cpu_set(cpu, cpu_sibling_map[cpu]); - } - - if (siblings != smp_num_siblings) - printk(KERN_WARNING WARNING: %d siblings found for CPU%d, should be %d\n, siblings, cpu, smp_num_siblings); - } + cpu_set(0, cpu_sibling_map[0]); if (nmi_watchdog == NMI_LOCAL_APIC) check_nmi_watchdog(); _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 5/6]clean cpu state after hotremove CPU
Clean up all CPU states including its runqueue and idle thread, so we can use boot time code without any changes. Note this makes /sys/devices/system/cpu/cpux/online unworkable. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/cpu/common.c | 12 linux-2.6.11-root/arch/i386/kernel/irq.c|5 + linux-2.6.11-root/arch/i386/kernel/process.c| 20 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c| 44 - linux-2.6.11-root/include/asm-i386/irq.h|2 linux-2.6.11-root/kernel/exit.c | 59 +++ linux-2.6.11-root/kernel/sched.c| 61 +--- 7 files changed, 195 insertions(+), 8 deletions(-) diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c --- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-04 09:07:29.172936768 +0800 @@ -144,12 +144,32 @@ static void poll_idle (void) #ifdef CONFIG_HOTPLUG_CPU #include asm/nmi.h + +#ifdef CONFIG_STR_SMP +extern void cpu_exit_clear(int); +#endif + /* We don't actually take CPU down, just spin without interrupts. */ static inline void play_dead(void) { +#ifdef CONFIG_STR_SMP + cpu_exit_clear(_smp_processor_id()); +#endif + /* Ack it */ __get_cpu_var(cpu_state) = CPU_DEAD; +#ifdef CONFIG_STR_SMP + /* +* With physical CPU hotplug, we should halt the CPU +* Note: release idle task struct requires the CPU doesn't +* touch stack or anything else. +*/ + local_irq_disable(); + while (1) + __asm__ __volatile__ (hlt: : :memory); +#endif + /* We shouldn't have to disable interrupts while dead, but * some interrupts just don't seem to go away, and this makes * it work for testing purposes. */ diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 09:05:41.699275248 +0800 @@ -794,8 +794,13 @@ static int __devinit do_boot_cpu(int api int timeout, cpu; unsigned long start_eip; unsigned short nmi_high = 0, nmi_low = 0; + cpumask_t tmp_map; - cpu = ++cpucount; + cpus_complement(tmp_map, cpu_present_map); + cpu = first_cpu(tmp_map); + if (cpu = NR_CPUS) + return -ENODEV; + ++cpucount; /* * We can't use kernel_thread since we must avoid to * reschedule the child. @@ -867,13 +872,16 @@ static int __devinit do_boot_cpu(int api inquire_remote_apic(apicid); } } - x86_cpu_to_apicid[cpu] = apicid; + if (boot_error) { /* Try to put things back the way they were before ... */ unmap_cpu_to_logical_apicid(cpu); cpu_clear(cpu, cpu_callout_map); /* was set here (do_boot_cpu()) */ cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */ cpucount--; + } else { + x86_cpu_to_apicid[cpu] = apicid; + cpu_set(cpu, cpu_present_map); } /* mark stuck area as not stuck */ @@ -882,6 +890,37 @@ static int __devinit do_boot_cpu(int api return boot_error; } +#ifdef CONFIG_STR_SMP +extern void do_exit_idle(void); +extern void cpu_uninit(void); +void cpu_exit_clear(int cpu) +{ + int sibling; + cpucount --; + + cpu_uninit(); + + irq_ctx_exit(cpu); + + cpu_clear(cpu, cpu_callout_map); + cpu_clear(cpu, cpu_callin_map); + cpu_clear(cpu, cpu_present_map); + + x86_cpu_to_apicid[cpu] = BAD_APICID; + + for_each_cpu_mask(sibling, cpu_sibling_map[cpu]) + cpu_clear(cpu, cpu_sibling_map[sibling]); + cpus_clear(cpu_sibling_map[cpu]); + + phys_proc_id[cpu] = BAD_APICID; + + cpu_clear(cpu, smp_commenced_mask); + + unmap_cpu_to_logical_apicid(cpu); + + do_exit_idle(); +} +#endif static void smp_tune_scheduling (void) { unsigned long cachesize; /* kB */ @@ -1104,6 +1143,7 @@ void __devinit smp_prepare_boot_cpu(void { cpu_set(smp_processor_id(), cpu_online_map); cpu_set(smp_processor_id(), cpu_callout_map); + cpu_set(smp_processor_id(), cpu_present_map); } #ifdef CONFIG_HOTPLUG_CPU diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean arch/i386/kernel/cpu/common.c --- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-03-31 10:50:27.0 +0800 @@ -621,3 +621,15 @@ void __devinit cpu_init (void) clear_used_math(); mxcsr_feature_mask_init(); } + +#ifdef CONFIG_STR_SMP +void
[RFC 6/6]Physcial CPU hotadd and S3 SMP support
Boot a CPU at runtime and use it to support S3 SMP. Thanks, Shaohua --- linux-2.6.11-root/arch/i386/kernel/smpboot.c | 79 +++ linux-2.6.11-root/include/asm-i386/smp.h |4 + linux-2.6.11-root/kernel/power/main.c| 30 ++ 3 files changed, 104 insertions(+), 9 deletions(-) diff -puN arch/i386/kernel/smpboot.c~warmboot_cpu arch/i386/kernel/smpboot.c --- linux-2.6.11/arch/i386/kernel/smpboot.c~warmboot_cpu2005-04-04 09:13:48.600255048 +0800 +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 09:13:48.607253984 +0800 @@ -76,6 +76,12 @@ cpumask_t cpu_callin_map; cpumask_t cpu_callout_map; static cpumask_t smp_commenced_mask; +/* This is ugly, but TSC's upper 32 bits can't be written in eariler CPU + * (before prescott), there is no way to resync one AP against BP + * TBD: for prescott and above, we should use IA64's algorithm + */ +static int __devinit tsc_sync_disabled; + /* Per CPU bogomips and other parameters */ struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned; @@ -412,7 +418,7 @@ static void __devinit smp_callin(void) /* * Synchronize the TSC with the BP */ - if (cpu_has_tsc cpu_khz) + if (cpu_has_tsc cpu_khz !tsc_sync_disabled) synchronize_tsc_ap(); } @@ -781,8 +787,19 @@ wakeup_secondary_cpu(int phys_apicid, un #endif /* WAKE_SECONDARY_VIA_INIT */ extern cpumask_t cpu_initialized; +static inline int alloc_cpu_id(void) +{ + cpumask_t tmp_map; + int cpu; -static int __devinit do_boot_cpu(int apicid) + cpus_complement(tmp_map, cpu_present_map); + cpu = first_cpu(tmp_map); + if (cpu = NR_CPUS) + return -ENODEV; + return cpu; +} + +static int __devinit do_boot_cpu(int apicid, int cpu) /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -791,15 +808,10 @@ static int __devinit do_boot_cpu(int api { struct task_struct *idle; unsigned long boot_error; - int timeout, cpu; + int timeout; unsigned long start_eip; unsigned short nmi_high = 0, nmi_low = 0; - cpumask_t tmp_map; - cpus_complement(tmp_map, cpu_present_map); - cpu = first_cpu(tmp_map); - if (cpu = NR_CPUS) - return -ENODEV; ++cpucount; /* * We can't use kernel_thread since we must avoid to @@ -920,6 +932,53 @@ void cpu_exit_clear(int cpu) do_exit_idle(); } + +struct warm_boot_cpu_info { + struct completion *complete; + int apicid; + int cpu; +}; + +static void __devinit do_warm_boot_cpu(void *p) +{ + struct warm_boot_cpu_info *info = p; + do_boot_cpu(info-apicid, info-cpu); + complete(info-complete); +} + +int __devinit smp_prepare_cpu(int apicid) +{ + DECLARE_COMPLETION(done); + struct warm_boot_cpu_info info; + struct work_struct task; + int cpu; + + lock_cpu_hotplug(); + cpu = alloc_cpu_id(); + + if (cpu 0) + goto exit; + + info.complete = done; + info.apicid = apicid; + info.cpu = cpu; + INIT_WORK(task, do_warm_boot_cpu, info); + + tsc_sync_disabled = 1; + + /* init low mem mapping */ + memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, + sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS); + flush_tlb_all(); + schedule_work(task); + wait_for_completion(done); + + tsc_sync_disabled = 0; + zap_low_mappings(); +exit: + unlock_cpu_hotplug(); + return cpu; +} #endif static void smp_tune_scheduling (void) { @@ -1064,7 +1123,7 @@ static void __init smp_boot_cpus(unsigne if (max_cpus = cpucount+1) continue; - if (do_boot_cpu(apicid)) + if (((cpu = alloc_cpu_id()) 0) do_boot_cpu(apicid, cpu)) printk(CPU #%d not responding - cannot use it.\n, apicid); else @@ -1253,10 +1312,12 @@ void __init smp_cpus_done(unsigned int m setup_ioapic_dest(); #endif zap_low_mappings(); +#ifndef CONFIG_STR_SMP /* * Disable executability of the SMP trampoline: */ set_kernel_exec((unsigned long)trampoline_base, trampoline_exec); +#endif } void __init smp_intr_init(void) diff -puN kernel/power/main.c~warmboot_cpu kernel/power/main.c --- linux-2.6.11/kernel/power/main.c~warmboot_cpu 2005-04-04 09:13:48.601254896 +0800 +++ linux-2.6.11-root/kernel/power/main.c 2005-04-04 09:13:48.607253984 +0800 @@ -15,6 +15,7 @@ #include linux/errno.h #include linux/init.h #include linux/pm.h +#include linux/cpu.h #include power.h @@ -137,6 +138,24 @@ static char * pm_states[] = { static int enter_state(suspend_state_t state)
Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 10:37, Andrew Morton wrote: Li Shaohua [EMAIL PROTECTED] wrote: The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm tree. Should I merge that thing into mainline? It seems that a few people are needing it. I'd like to listen to some comments first. There are still some things I'm not sure, such as the do_exit_idle. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug
On Mon, 2005-04-04 at 10:48, Andrew Morton wrote: Li Shaohua [EMAIL PROTECTED] wrote: On Mon, 2005-04-04 at 10:37, Andrew Morton wrote: Li Shaohua [EMAIL PROTECTED] wrote: The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm tree. Should I merge that thing into mainline? It seems that a few people are needing it. I'd like to listen to some comments first. There are still some things I'm not sure, such as the do_exit_idle. I was referring to Zwane's i386-cpu-hotplug-updated-for-mm.patch Yep, great. Pavel's swsusp also need it. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU
Hi, On Mon, 2005-04-04 at 13:28, Nathan Lynch wrote: On Mon, Apr 04, 2005 at 10:07:02AM +0800, Li Shaohua wrote: Clean up all CPU states including its runqueue and idle thread, so we can use boot time code without any changes. Note this makes /sys/devices/system/cpu/cpux/online unworkable. In what sense does it make the online attribute unworkable? I removed the idle thread and other CPU states, and makes the dead CPU into a 'halt' busy loop. diff -puN kernel/exit.c~cpu_state_clean kernel/exit.c --- linux-2.6.11/kernel/exit.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/kernel/exit.c 2005-03-31 10:50:27.0 +0800 @@ -845,6 +845,65 @@ fastcall NORET_TYPE void do_exit(long co for (;;) ; } +#ifdef CONFIG_STR_SMP +void do_exit_idle(void) +{ + struct task_struct *tsk = current; + int group_dead; + + BUG_ON(tsk-pid); + BUG_ON(tsk-mm); + + if (tsk-io_context) + exit_io_context(); + tsk-flags |= PF_EXITING; + tsk-it_virt_expires = cputime_zero; + tsk-it_prof_expires = cputime_zero; + tsk-it_sched_expires = 0; + + acct_update_integrals(tsk); + update_mem_hiwater(tsk); + group_dead = atomic_dec_and_test(tsk-signal-live); + if (group_dead) { + del_timer_sync(tsk-signal-real_timer); + acct_process(-1); + } + exit_mm(tsk); + + exit_sem(tsk); + __exit_files(tsk); + __exit_fs(tsk); + exit_namespace(tsk); + exit_thread(); + exit_keys(tsk); + + if (group_dead tsk-signal-leader) + disassociate_ctty(1); + + module_put(tsk-thread_info-exec_domain-module); + if (tsk-binfmt) + module_put(tsk-binfmt-module); + + tsk-exit_code = -1; + tsk-exit_state = EXIT_DEAD; + + /* in release_task */ + atomic_dec(tsk-user-processes); + write_lock_irq(tasklist_lock); + __exit_signal(tsk); + __exit_sighand(tsk); + write_unlock_irq(tasklist_lock); + release_thread(tsk); + put_task_struct(tsk); + + tsk-flags |= PF_DEAD; +#ifdef CONFIG_NUMA + mpol_free(tsk-mempolicy); + tsk-mempolicy = NULL; +#endif +} +#endif I don't understand why this is needed at all. It looks like a fair amount of code from do_exit is being duplicated here. Yes, exactly. Someone who understand do_exit please help clean up the code. I'd like to remove the idle thread, since the smpboot code will create a new idle thread. We've been doing cpu removal on ppc64 logical partitions for a while and never needed to do anything like this. Did it remove idle thread? or dead cpu is in a busy loop of idle? Maybe idle_task_exit would suffice? idle_task_exit seems just drop mm. We need destroy the idle task for physical CPU hotplug, right? diff -puN kernel/sched.c~cpu_state_clean kernel/sched.c --- linux-2.6.11/kernel/sched.c~cpu_state_clean 2005-03-31 10:50:27.0 +0800 +++ linux-2.6.11-root/kernel/sched.c2005-04-04 09:06:40.362357104 +0800 @@ -4028,6 +4028,58 @@ void __devinit init_idle(task_t *idle, i } /* + * Initial dummy domain for early boot and for hotplug cpu. Being static, + * it is initialized to zero, so all balancing flags are cleared which is + * what we want. + */ +static struct sched_domain sched_domain_dummy; + +#ifdef CONFIG_STR_SMP +static void __devinit exit_idle(int cpu) +{ + runqueue_t *rq = cpu_rq(cpu); + struct task_struct *p = rq-idle; + int j, k; + prio_array_t *array; + + /* init runqueue */ + spin_lock_init(rq-lock); + rq-active = rq-arrays; + rq-expired = rq-arrays + 1; + rq-best_expired_prio = MAX_PRIO; + + rq-prev_mm = NULL; + rq-curr = rq-idle = NULL; + rq-expired_timestamp = 0; + + rq-sd = sched_domain_dummy; + rq-cpu_load = 0; + rq-active_balance = 0; + rq-push_cpu = 0; + rq-migration_thread = NULL; + INIT_LIST_HEAD(rq-migration_queue); + atomic_set(rq-nr_iowait, 0); + + for (j = 0; j 2; j++) { + array = rq-arrays + j; + for (k = 0; k MAX_PRIO; k++) { + INIT_LIST_HEAD(array-queue + k); + __clear_bit(k, array-bitmap); + } + // delimiter for bitsearch + __set_bit(MAX_PRIO, array-bitmap); + } + /* Destroy IDLE thread. +* it's safe now, the CPU is in busy loop +*/ + if (p-active_mm) + mmdrop(p-active_mm); + p-active_mm = NULL; + put_task_struct(p); +} +#endif + +/* * In a system that switches off the HZ timer nohz_cpu_mask * indicates which cpus entered this state. This is used * in the rcu update to wait only for active cpus. For system @@ -4432,6 +4484,9 @@ static int migration_call(struct notifie complete(req-done); } spin_unlock_irq(rq-lock); +#ifdef
Re: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers
On Sun, 2005-03-27 at 02:23, Rafael J. Wysocki wrote: > Hi, > > On Friday, 25 of March 2005 15:19, Rafael J. Wysocki wrote: > > On Friday, 25 of March 2005 13:54, you wrote: > > ]--snip--[ > > > >My box is still hanged solid on resume (swsusp) by the drivers: > > > > > > > >ohci_hcd > > > >ehci_hcd > > > >yenta_socket > > > > > > > >possibly others, too. To avoid this, I had to revert the following > > > patch from the Len's tree: > > > > > > > >diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c > > > >--- a/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00 > > > >+++ b/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00 > > > >@@ -72,10 +72,12 @@ > > > > u8 active; /* Current IRQ > > > */ > > > > u8 edge_level; /* All IRQs */ > > > > u8 active_high_low;/* All IRQs */ > > > >-u8 initialized; > > > > u8 resource_type; > > > > u8 possible_count; > > > > u8 possible[ACPI_PCI_LINK_MAX_POSSIBLE]; > > > >+u8 initialized:1; > > > >+u8 suspend_resume:1; > > > >+u8 reserved:6; > > > > }; > > > > > > > > struct acpi_pci_link { > > > >@@ -530,6 +532,10 @@ > > > > > > > > ACPI_FUNCTION_TRACE("acpi_pci_link_allocate"); > > > > > > > >+if (link->irq.suspend_resume) { > > > >+acpi_pci_link_set(link, link->irq.active); > > > >+link->irq.suspend_resume = 0; > > > >+} > > > > if (link->irq.initialized) > > > > return_VALUE(0); > > > > > > How about just remove below line: > > > >+acpi_pci_link_set(link, link->irq.active); > > > > You mean apply the patch again and remove just the single > > line? No effect (ie hangs). > > It looks like removing this line couldn't help. > > Apparently, acpi_pci_link_set(link, link->irq.active) must be called > _before_ the call to pci_write_config_word() in > drivers/pci/pci.c:pci_set_power_state(), because the box hangs > otherwise. However, with the patch applied, > acpi_pci_link_set(link, link->irq.active) is only called through > pcibios_enable_irq() in pcibios_enable_device(), which is _after_ > the call to pci_set_power_state() in pci_enable_device_bars(), > so it's too late. > > Hence, it seems, if you really want to get rid of the > irqrouter_resume(), whatever the reason, the simplest fix > seems to be to change the order of calls to pci_set_power_state() > and pcibios_enable_device() in pci_enable_device_bars(): > > --- old/drivers/pci/pci.c 2005-03-26 19:10:09.0 +0100 > +++ linux-2.6.12-rc1-mm2/drivers/pci/pci.c2005-03-26 19:10:54.0 > +0100 > @@ -442,9 +442,9 @@ pci_enable_device_bars(struct pci_dev *d > { > int err; > > - pci_set_power_state(dev, PCI_D0); > if ((err = pcibios_enable_device(dev, bars)) < 0) > return err; > + pci_set_power_state(dev, PCI_D0); > return 0; > } > > though I'm not sure if that's legal. Hmm, no, pci_set_power_state should be called before pcibios_enable_device, otherwise enable_device may fail. This is very strange. In boot time, there also are uninitialized link devices, I'm wonder why the call of pci_enable_device_bars doesn't fail in boot time. Did you find the bug only in specific system? Could you please file a bug in bugzilla? I don't want to lose the context of thread. And please attach your acpidmp output in the bug. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers
On Sun, 2005-03-27 at 02:23, Rafael J. Wysocki wrote: Hi, On Friday, 25 of March 2005 15:19, Rafael J. Wysocki wrote: On Friday, 25 of March 2005 13:54, you wrote: ]--snip--[ My box is still hanged solid on resume (swsusp) by the drivers: ohci_hcd ehci_hcd yenta_socket possibly others, too. To avoid this, I had to revert the following patch from the Len's tree: diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c --- a/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00 +++ b/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00 @@ -72,10 +72,12 @@ u8 active; /* Current IRQ */ u8 edge_level; /* All IRQs */ u8 active_high_low;/* All IRQs */ -u8 initialized; u8 resource_type; u8 possible_count; u8 possible[ACPI_PCI_LINK_MAX_POSSIBLE]; +u8 initialized:1; +u8 suspend_resume:1; +u8 reserved:6; }; struct acpi_pci_link { @@ -530,6 +532,10 @@ ACPI_FUNCTION_TRACE(acpi_pci_link_allocate); +if (link-irq.suspend_resume) { +acpi_pci_link_set(link, link-irq.active); +link-irq.suspend_resume = 0; +} if (link-irq.initialized) return_VALUE(0); How about just remove below line: +acpi_pci_link_set(link, link-irq.active); You mean apply the patch again and remove just the single line? No effect (ie hangs). It looks like removing this line couldn't help. Apparently, acpi_pci_link_set(link, link-irq.active) must be called _before_ the call to pci_write_config_word() in drivers/pci/pci.c:pci_set_power_state(), because the box hangs otherwise. However, with the patch applied, acpi_pci_link_set(link, link-irq.active) is only called through pcibios_enable_irq() in pcibios_enable_device(), which is _after_ the call to pci_set_power_state() in pci_enable_device_bars(), so it's too late. Hence, it seems, if you really want to get rid of the irqrouter_resume(), whatever the reason, the simplest fix seems to be to change the order of calls to pci_set_power_state() and pcibios_enable_device() in pci_enable_device_bars(): --- old/drivers/pci/pci.c 2005-03-26 19:10:09.0 +0100 +++ linux-2.6.12-rc1-mm2/drivers/pci/pci.c2005-03-26 19:10:54.0 +0100 @@ -442,9 +442,9 @@ pci_enable_device_bars(struct pci_dev *d { int err; - pci_set_power_state(dev, PCI_D0); if ((err = pcibios_enable_device(dev, bars)) 0) return err; + pci_set_power_state(dev, PCI_D0); return 0; } though I'm not sure if that's legal. Hmm, no, pci_set_power_state should be called before pcibios_enable_device, otherwise enable_device may fail. This is very strange. In boot time, there also are uninitialized link devices, I'm wonder why the call of pci_enable_device_bars doesn't fail in boot time. Did you find the bug only in specific system? Could you please file a bug in bugzilla? I don't want to lose the context of thread. And please attach your acpidmp output in the bug. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers
Hi, >On Friday, 25 of March 2005 09:21, Andrew Morton wrote: >> >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12- >rc1/2.6.12-rc1-mm3/ >> >> - Mainly a bunch of fixes relative to 2.6.12-rc1-mm2. > >First, rmmod works again (thanks ;-)). > >> - Again, we'd like people who have had recent DRM and USB resume problems >to >> test and report, please. > >My box is still hanged solid on resume (swsusp) by the drivers: > >ohci_hcd >ehci_hcd >yenta_socket > >possibly others, too. To avoid this, I had to revert the following patch >from >the Len's tree: > >diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c >--- a/drivers/acpi/pci_link.c 2005-03-24 04:57:27 -08:00 >+++ b/drivers/acpi/pci_link.c 2005-03-24 04:57:27 -08:00 >@@ -72,10 +72,12 @@ > u8 active; /* Current IRQ */ > u8 edge_level; /* All IRQs */ > u8 active_high_low;/* All IRQs */ >- u8 initialized; > u8 resource_type; > u8 possible_count; > u8 possible[ACPI_PCI_LINK_MAX_POSSIBLE]; >+ u8 initialized:1; >+ u8 suspend_resume:1; >+ u8 reserved:6; > }; > > struct acpi_pci_link { >@@ -530,6 +532,10 @@ > > ACPI_FUNCTION_TRACE("acpi_pci_link_allocate"); > >+ if (link->irq.suspend_resume) { >+ acpi_pci_link_set(link, link->irq.active); >+ link->irq.suspend_resume = 0; >+ } > if (link->irq.initialized) > return_VALUE(0); How about just remove below line: >+ acpi_pci_link_set(link, link->irq.active); Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers
Hi, On Friday, 25 of March 2005 09:21, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12- rc1/2.6.12-rc1-mm3/ - Mainly a bunch of fixes relative to 2.6.12-rc1-mm2. First, rmmod works again (thanks ;-)). - Again, we'd like people who have had recent DRM and USB resume problems to test and report, please. My box is still hanged solid on resume (swsusp) by the drivers: ohci_hcd ehci_hcd yenta_socket possibly others, too. To avoid this, I had to revert the following patch from the Len's tree: diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c --- a/drivers/acpi/pci_link.c 2005-03-24 04:57:27 -08:00 +++ b/drivers/acpi/pci_link.c 2005-03-24 04:57:27 -08:00 @@ -72,10 +72,12 @@ u8 active; /* Current IRQ */ u8 edge_level; /* All IRQs */ u8 active_high_low;/* All IRQs */ - u8 initialized; u8 resource_type; u8 possible_count; u8 possible[ACPI_PCI_LINK_MAX_POSSIBLE]; + u8 initialized:1; + u8 suspend_resume:1; + u8 reserved:6; }; struct acpi_pci_link { @@ -530,6 +532,10 @@ ACPI_FUNCTION_TRACE(acpi_pci_link_allocate); + if (link-irq.suspend_resume) { + acpi_pci_link_set(link, link-irq.active); + link-irq.suspend_resume = 0; + } if (link-irq.initialized) return_VALUE(0); How about just remove below line: + acpi_pci_link_set(link, link-irq.active); Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)
On Thu, 2005-03-24 at 21:42, Rafael J. Wysocki wrote: > Hi, > > On Thursday, 24 of March 2005 02:27, Li Shaohua wrote: > > On Thu, 2005-03-24 at 09:03, Len Brown wrote: > > > On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote: > > > > Hi, > > > > > > > > On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote: > > > > > Hi! > > > > > > > > > > > > > > Will this do it for the moment? > > > > > > > > > > > > > > > > Its certainly better. > > > > > > > > > > > > > > With the Len's patch applied I have to unload the modules: > > > > > > > > > > > > > > ohci_hcd > > > > > > > ehci_hcd > > > > > > > yenta_socket > > > > > > > > > > > > > > before suspend as each of them hangs the box solid during > > > either > > > > > > > suspend or resume. Moreover, when I tried to load the > > > ehci_hcd > > > > > > > module back after resume, it hanged the box solid too. > > > > > > Is this failure with suspend to RAM or to disk? > > > > > > How about if you try this patch? > > > > > > http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED] > > > > > > patch -Rp1 from 2.6.12-rc1-mm1 and see if it stops being broken > > > or patch -Np1 to 2.6.12-rc and see if it starts being broken. > > > > > > This one removes an earlier attempt at resuming PCI links -- now > > > putting the onus on the drivers to be properly written > > > to release and acquire their interrupt for a successful > > > suspend/resume. > > > > > > > > > In theory, this is taken care of something like this: > > > driver.resume > > > pci_enable_device > > > pci_enable_device_bars > > > pcibios_enable_device > > > pcibios_enable_irq > > > acpi_pci_irq_enable > > > > > > but if the patch above makes a difference, then theory != practice:-) > > It looks like that. ;-) > > > > I'd believe that ohci_hcd and ehci_hcd are fragile since glancing > > > at their lengthy .resume routines it isn't immediately obvious > > > that they do this. But yenta_dev_resume has a pci_enable_device(), > > > so that failure may be less straightforward. > > > > > > cheers, > > > -Len > > > > > > ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled > > > boot, that would help -- for it will show if we're even using pci > > > interrupt links (and programming them) for these devices on this box. > > Yes, we changed the behavior of device suspend/resume. Every PCI device > > should call 'pci_disable_device' at suspend and call 'pci_enable_device' > > at resume. It fixes a bug and more important thing is it's safer (Eg. it > > disable interrupts, bus master and etc). > > I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and > > definitely required for S3). Unclear if it's ok for S4, so please try > > revert the patch. > > 2.6.11-rc1-mm1 with the patch reverted works fine. :-) So just remove the pci_enable/disable_device call in the driver makes the system work? Strange, I tried them on two laptops (one HP nx5000, and one Toshiba M2N), both works (no hang, and USB mouse works after S3/S4. I didn't try yenta, since I have no pc card) for S3/S4. Is it possible it's another bug or just because of different BIOS? Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)
On Thu, 2005-03-24 at 21:42, Rafael J. Wysocki wrote: Hi, On Thursday, 24 of March 2005 02:27, Li Shaohua wrote: On Thu, 2005-03-24 at 09:03, Len Brown wrote: On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote: Hi, On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote: Hi! Will this do it for the moment? Its certainly better. With the Len's patch applied I have to unload the modules: ohci_hcd ehci_hcd yenta_socket before suspend as each of them hangs the box solid during either suspend or resume. Moreover, when I tried to load the ehci_hcd module back after resume, it hanged the box solid too. Is this failure with suspend to RAM or to disk? How about if you try this patch? http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED] patch -Rp1 from 2.6.12-rc1-mm1 and see if it stops being broken or patch -Np1 to 2.6.12-rc and see if it starts being broken. This one removes an earlier attempt at resuming PCI links -- now putting the onus on the drivers to be properly written to release and acquire their interrupt for a successful suspend/resume. In theory, this is taken care of something like this: driver.resume pci_enable_device pci_enable_device_bars pcibios_enable_device pcibios_enable_irq acpi_pci_irq_enable but if the patch above makes a difference, then theory != practice:-) It looks like that. ;-) I'd believe that ohci_hcd and ehci_hcd are fragile since glancing at their lengthy .resume routines it isn't immediately obvious that they do this. But yenta_dev_resume has a pci_enable_device(), so that failure may be less straightforward. cheers, -Len ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled boot, that would help -- for it will show if we're even using pci interrupt links (and programming them) for these devices on this box. Yes, we changed the behavior of device suspend/resume. Every PCI device should call 'pci_disable_device' at suspend and call 'pci_enable_device' at resume. It fixes a bug and more important thing is it's safer (Eg. it disable interrupts, bus master and etc). I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and definitely required for S3). Unclear if it's ok for S4, so please try revert the patch. 2.6.11-rc1-mm1 with the patch reverted works fine. :-) So just remove the pci_enable/disable_device call in the driver makes the system work? Strange, I tried them on two laptops (one HP nx5000, and one Toshiba M2N), both works (no hang, and USB mouse works after S3/S4. I didn't try yenta, since I have no pc card) for S3/S4. Is it possible it's another bug or just because of different BIOS? Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389
On Tue, 2005-03-22 at 20:20, Pavel Machek wrote: > Hi! > > > >> > Yes, but it is needed. There are many drivers, and they look at > > >> > numerical value of PMSG_*. I'm proceeding in steps. I hopefully > > killed > > >> > all direct accesses to the constants, and will switch constants > to > > >> > something else... But that is going to be tommorow (need some > > sleep). > > >> The patches are going to acquire correct PCI device sleep state > for > > >> suspend/resume. We discussed the issue several months ago. My > plan is > > we > > >> first introduce 'platform_pci_set_power_state', then merge the > > >> 'platform_pci_choose_state' patch after Pavel's pm_message_t > > conversion > > >> finished. Maybe Len mislead my comments. > > >> > > >> Anyway for the callback, my intend is platform_pci_choose_state > > accept > > >> the pm_message_t parameter, and it return an 'int', since > platform > > >> method possibly failed and then pci_choose_state translate the > return > > >> value to pci_power_t. > > > > > >You can't just retype around like that. You may want it take > > >pci_power_t * as an argument, and then return 0/-ENODEV or > something > > >like that. But you can't retype between int and pm_message_t... > > No, taking pci_power_t as an argument is meaningless. For ACPI, we > > should know the exact sleep state, pm_message_t will tell us. But > I'm ok > > to let it return a pci_power_t, and the failure case returns > > -ENODEV. > > You can't put -ENODEV into pci_power_t ... but maybe we should create > PCI_ERROR and pass it in cases like this one? That makes sense, please do it. > > > >> > Could you just revert those two patches? First one is very > > >> > wrong. Second one might be fixed, but... See comments below. > > >> I think the platform_pci_set_power_state should be ok, did you > see it > > >> causes oops? > > > > > >No its just ugly and uses __force in "creative" way. That one can > be > > >recovered. > > Do you mean this? > > > > > + static int state_conv[] = { > > > + [0] = 0, > > > + [1] = 1, > > > + [2] = 2, > > > + [3] = 3, > > > + [4] = 3 > > > + }; > > > + int acpi_state = state_conv[(int __force) state]; > > > > The table should be > > [PCI_D0] = 0, > > > > I'm not sure, but then could we use state_conv[state] directly? It > seems > > I think so. Of course it is wrong, but it is less wrong than forcing > it to integer than index, without using macros at all. > > Or perhaps you should do > > switch (state) { > case PCI_D0: ... > } > > ...and handle default case somehow. That's ok for me. I'll change it later. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)
On Thu, 2005-03-24 at 09:03, Len Brown wrote: > On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote: > > Hi, > > > > On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote: > > > Hi! > > > > > > > > > > Will this do it for the moment? > > > > > > > > > > > > Its certainly better. > > > > > > > > > > With the Len's patch applied I have to unload the modules: > > > > > > > > > > ohci_hcd > > > > > ehci_hcd > > > > > yenta_socket > > > > > > > > > > before suspend as each of them hangs the box solid during > either > > > > > suspend or resume. Moreover, when I tried to load the > ehci_hcd > > > > > module back after resume, it hanged the box solid too. > > Is this failure with suspend to RAM or to disk? > > How about if you try this patch? > > http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED] > > patch -Rp1 from 2.6.12-rc1-mm and see if it stops being broken > or patch -Np1 to 2.6.12-rc and see if it starts being broken. > > This one removes an earlier attempt at resuming PCI links -- now > putting the onus on the drivers to be properly written > to release and acquire their interrupt for a successful > suspend/resume. > > > In theory, this is taken care of something like this: > driver.resume > pci_enable_device > pci_enable_device_bars > pcibios_enable_device > pcibios_enable_irq > acpi_pci_irq_enable > > but if the patch above makes a difference, then theory != practice:-) > > I'd believe that ohci_hcd and ehci_hcd are fragile since glancing > at their lengthy .resume routines it isn't immediately obvious > that they do this. But yenta_dev_resume has a pci_enable_device(), > so that failure may be less straightforward. > > cheers, > -Len > > ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled > boot, that would help -- for it will show if we're even using pci > interrupt links (and programming them) for these devices on this box. Yes, we changed the behavior of device suspend/resume. Every PCI device should call 'pci_disable_device' at suspend and call 'pci_enable_device' at resume. It fixes a bug and more important thing is it's safer (Eg. it disable interrupts, bus master and etc). I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and definitely required for S3). Unclear if it's ok for S4, so please try revert the patch. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)
On Thu, 2005-03-24 at 09:03, Len Brown wrote: On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote: Hi, On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote: Hi! Will this do it for the moment? Its certainly better. With the Len's patch applied I have to unload the modules: ohci_hcd ehci_hcd yenta_socket before suspend as each of them hangs the box solid during either suspend or resume. Moreover, when I tried to load the ehci_hcd module back after resume, it hanged the box solid too. Is this failure with suspend to RAM or to disk? How about if you try this patch? http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED] patch -Rp1 from 2.6.12-rc1-mm and see if it stops being broken or patch -Np1 to 2.6.12-rc and see if it starts being broken. This one removes an earlier attempt at resuming PCI links -- now putting the onus on the drivers to be properly written to release and acquire their interrupt for a successful suspend/resume. In theory, this is taken care of something like this: driver.resume pci_enable_device pci_enable_device_bars pcibios_enable_device pcibios_enable_irq acpi_pci_irq_enable but if the patch above makes a difference, then theory != practice:-) I'd believe that ohci_hcd and ehci_hcd are fragile since glancing at their lengthy .resume routines it isn't immediately obvious that they do this. But yenta_dev_resume has a pci_enable_device(), so that failure may be less straightforward. cheers, -Len ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled boot, that would help -- for it will show if we're even using pci interrupt links (and programming them) for these devices on this box. Yes, we changed the behavior of device suspend/resume. Every PCI device should call 'pci_disable_device' at suspend and call 'pci_enable_device' at resume. It fixes a bug and more important thing is it's safer (Eg. it disable interrupts, bus master and etc). I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and definitely required for S3). Unclear if it's ok for S4, so please try revert the patch. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389
On Tue, 2005-03-22 at 20:20, Pavel Machek wrote: Hi! Yes, but it is needed. There are many drivers, and they look at numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed all direct accesses to the constants, and will switch constants to something else... But that is going to be tommorow (need some sleep). The patches are going to acquire correct PCI device sleep state for suspend/resume. We discussed the issue several months ago. My plan is we first introduce 'platform_pci_set_power_state', then merge the 'platform_pci_choose_state' patch after Pavel's pm_message_t conversion finished. Maybe Len mislead my comments. Anyway for the callback, my intend is platform_pci_choose_state accept the pm_message_t parameter, and it return an 'int', since platform method possibly failed and then pci_choose_state translate the return value to pci_power_t. You can't just retype around like that. You may want it take pci_power_t * as an argument, and then return 0/-ENODEV or something like that. But you can't retype between int and pm_message_t... No, taking pci_power_t as an argument is meaningless. For ACPI, we should know the exact sleep state, pm_message_t will tell us. But I'm ok to let it return a pci_power_t, and the failure case returns -ENODEV. You can't put -ENODEV into pci_power_t ... but maybe we should create PCI_ERROR and pass it in cases like this one? That makes sense, please do it. Could you just revert those two patches? First one is very wrong. Second one might be fixed, but... See comments below. I think the platform_pci_set_power_state should be ok, did you see it causes oops? No its just ugly and uses __force in creative way. That one can be recovered. Do you mean this? + static int state_conv[] = { + [0] = 0, + [1] = 1, + [2] = 2, + [3] = 3, + [4] = 3 + }; + int acpi_state = state_conv[(int __force) state]; The table should be [PCI_D0] = 0, I'm not sure, but then could we use state_conv[state] directly? It seems I think so. Of course it is wrong, but it is less wrong than forcing it to integer than index, without using macros at all. Or perhaps you should do switch (state) { case PCI_D0: ... } ...and handle default case somehow. That's ok for me. I'll change it later. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
On Wed, 2005-03-23 at 04:57, Bjorn Helgaas wrote: > > Your patch applied with some problems: > > > > patching file arch/i386/pci/irq.c > > Hunk #2 succeeded at 1081 with fuzz 2 (offset 1 line). > > patching file drivers/acpi/pci_irq.c > > patching file drivers/pci/quirks.c > > Hunk #1 succeeded at 678 (offset -5 lines). > > These indicate minor differences in these files between upstream BK > (which is what my patch was against) and the kernel you're building. > You can ignore them. > > > Then I tested it and it works (at least my speedtouch still works). > > Great. Shaohua, where should we go from here? Do you have more > concerns with the current patch, or should we ask Andrew to put it > in -mm? If you do have concerns, would you like to propose an > alternate patch that fixes the problem for Grzegorz? No, the patch is great to me. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc1-mm1: Kernel BUG at pci:389
> >> > Yes, but it is needed. There are many drivers, and they look at >> > numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed >> > all direct accesses to the constants, and will switch constants to >> > something else... But that is going to be tommorow (need some sleep). >> The patches are going to acquire correct PCI device sleep state for >> suspend/resume. We discussed the issue several months ago. My plan is we >> first introduce 'platform_pci_set_power_state', then merge the >> 'platform_pci_choose_state' patch after Pavel's pm_message_t conversion >> finished. Maybe Len mislead my comments. >> >> Anyway for the callback, my intend is platform_pci_choose_state accept >> the pm_message_t parameter, and it return an 'int', since platform >> method possibly failed and then pci_choose_state translate the return >> value to pci_power_t. > >You can't just retype around like that. You may want it take >pci_power_t * as an argument, and then return 0/-ENODEV or something >like that. But you can't retype between int and pm_message_t... No, taking pci_power_t as an argument is meaningless. For ACPI, we should know the exact sleep state, pm_message_t will tell us. But I'm ok to let it return a pci_power_t, and the failure case returns -ENODEV. > >Plus that function should have a documentation somewhere! I will add it. > >> > Could you just revert those two patches? First one is very >> > wrong. Second one might be fixed, but... See comments below. >> I think the platform_pci_set_power_state should be ok, did you see it >> causes oops? > >No its just ugly and uses __force in "creative" way. That one can be >recovered. Do you mean this? > + static int state_conv[] = { > + [0] = 0, > + [1] = 1, > + [2] = 2, > + [3] = 3, > + [4] = 3 > + }; > + int acpi_state = state_conv[(int __force) state]; The table should be [PCI_D0] = 0, I'm not sure, but then could we use state_conv[state] directly? It seems wrong to me (the array accepts a pci_power_t as index?) Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 2.6.12-rc1-mm1: Kernel BUG at pci:389
Yes, but it is needed. There are many drivers, and they look at numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed all direct accesses to the constants, and will switch constants to something else... But that is going to be tommorow (need some sleep). The patches are going to acquire correct PCI device sleep state for suspend/resume. We discussed the issue several months ago. My plan is we first introduce 'platform_pci_set_power_state', then merge the 'platform_pci_choose_state' patch after Pavel's pm_message_t conversion finished. Maybe Len mislead my comments. Anyway for the callback, my intend is platform_pci_choose_state accept the pm_message_t parameter, and it return an 'int', since platform method possibly failed and then pci_choose_state translate the return value to pci_power_t. You can't just retype around like that. You may want it take pci_power_t * as an argument, and then return 0/-ENODEV or something like that. But you can't retype between int and pm_message_t... No, taking pci_power_t as an argument is meaningless. For ACPI, we should know the exact sleep state, pm_message_t will tell us. But I'm ok to let it return a pci_power_t, and the failure case returns -ENODEV. Plus that function should have a documentation somewhere! I will add it. Could you just revert those two patches? First one is very wrong. Second one might be fixed, but... See comments below. I think the platform_pci_set_power_state should be ok, did you see it causes oops? No its just ugly and uses __force in creative way. That one can be recovered. Do you mean this? + static int state_conv[] = { + [0] = 0, + [1] = 1, + [2] = 2, + [3] = 3, + [4] = 3 + }; + int acpi_state = state_conv[(int __force) state]; The table should be [PCI_D0] = 0, I'm not sure, but then could we use state_conv[state] directly? It seems wrong to me (the array accepts a pci_power_t as index?) Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
On Wed, 2005-03-23 at 04:57, Bjorn Helgaas wrote: Your patch applied with some problems: patching file arch/i386/pci/irq.c Hunk #2 succeeded at 1081 with fuzz 2 (offset 1 line). patching file drivers/acpi/pci_irq.c patching file drivers/pci/quirks.c Hunk #1 succeeded at 678 (offset -5 lines). These indicate minor differences in these files between upstream BK (which is what my patch was against) and the kernel you're building. You can ignore them. Then I tested it and it works (at least my speedtouch still works). Great. Shaohua, where should we go from here? Do you have more concerns with the current patch, or should we ask Andrew to put it in -mm? If you do have concerns, would you like to propose an alternate patch that fixes the problem for Grzegorz? No, the patch is great to me. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389
On Tue, 2005-03-22 at 09:35, Pavel Machek wrote: > Hi! > > > > and that says: > > > > > > #define PMSG_FREEZE ((__force pm_message_t) 3) > > > > > > ... I certainly have _FREEZE defined as 1 in my local tree, but I > do > > > not see that change in -mm yet. > > > > Both 2.6.12-rc1-mm1 and 2.6.12-rc1 have: > > > > #define PMSG_FREEZE ((__force pm_message_t) 3) > > #define PMSG_SUSPEND((__force pm_message_t) 3) > > #define PMSG_ON ((__force pm_message_t) 0) > > > > which looks odd. > > Yes, but it is needed. There are many drivers, and they look at > numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed > all direct accesses to the constants, and will switch constants to > something else... But that is going to be tommorow (need some sleep). The patches are going to acquire correct PCI device sleep state for suspend/resume. We discussed the issue several months ago. My plan is we first introduce 'platform_pci_set_power_state', then merge the 'platform_pci_choose_state' patch after Pavel's pm_message_t conversion finished. Maybe Len mislead my comments. Anyway for the callback, my intend is platform_pci_choose_state accept the pm_message_t parameter, and it return an 'int', since platform method possibly failed and then pci_choose_state translate the return value to pci_power_t. > > > I reproduced it here.. I do not know who introduced > > > platform_pci_choose_state, but it is *very* wrong. It returns > > > it. Should it return pci_power_t? It probably should to match > > > pci_choose_state, but that int is retyped to pm_message_t. Oops. > > > > That change came from Len. I've appended the two relevant patches > below. > > > > So hm. We have incompatible changes in flight. That doesn't happen > very > > often. > > > > Could I suggest that you prepare a fixup against 2.6.12-rc1-mm1 and > send > > that to Len and myself? If that fixup is not suitable for a > 2.6.12-rc1 > > based tree then I can look after it until things get flushed out. > > Could you just revert those two patches? First one is very > wrong. Second one might be fixed, but... See comments below. I think the platform_pci_set_power_state should be ok, did you see it causes oops? > > And they are both "dangerous" -- they introduce new and untested > functionality while I'm trying to transition from int to > pm_message_t. They also affect all the drivers. > > Len, please Cc me on patches that affect suspend. > > > @@ -17,6 +17,7 @@ > > #include > > > > #include > > +#include "pci.h" > > > Should be ? I suppose it's not exported out side of PCI, so I used 'pci.h' > > > +static int acpi_pci_choose_state(struct pci_dev *pdev, pm_message_t > state) > > +{ > > Should return pci_power_t, probably. Should return int as I said above. > > > + char dstate_str[] = "_S0D"; > > + acpi_status status; > > + unsigned long val; > > + struct device *dev = >dev; > > + > > + /* Fixme: the check is wrong after pm_message_t is a struct */ > > Exactly. > > > + if ((state >= PM_SUSPEND_MAX) || !DEVICE_ACPI_HANDLE(dev)) > > PM_SUSPEND_MAX and friends is going to disappear. Yep, this should be fixed. > > > + return -EINVAL; > > + dstate_str[2] += state; /* _S1D, _S2D, _S3D, _S4D */ > > Ugh, assumes numerical values of states actually meaning anything. It > definitely should not. Should be switch(state.event), but that code > is not merged, yet => I'll send code that switches pm_message_t to > struct, tommorow. But it may compile-time break some obscure > drivers... > > > diff -Nru a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > > --- a/drivers/pci/pci-acpi.c 2005-03-21 17:02:38 -08:00 > > +++ b/drivers/pci/pci-acpi.c 2005-03-21 17:02:38 -08:00 > > @@ -253,6 +253,24 @@ > > return -ENODEV; > > } > > > > +static int acpi_pci_set_power_state(struct pci_dev *dev, > pci_power_t state) > > +{ > > + acpi_handle handle = DEVICE_ACPI_HANDLE(>dev); > > + static int state_conv[] = { > > + [0] = 0, > > + [1] = 1, > > + [2] = 2, > > + [3] = 3, > > + [4] = 3 > > + }; > > + int acpi_state = state_conv[(int __force) state]; > > The table should be > [PCI_D0] = 0, > ... Ok, please revert the 'platform_pci_choose_pci' patch, I will add it after Pavel's conversion is finished. Or after Pavel's is done, I can send a quick fix. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389
On Tue, 2005-03-22 at 09:35, Pavel Machek wrote: Hi! and that says: #define PMSG_FREEZE ((__force pm_message_t) 3) ... I certainly have _FREEZE defined as 1 in my local tree, but I do not see that change in -mm yet. Both 2.6.12-rc1-mm1 and 2.6.12-rc1 have: #define PMSG_FREEZE ((__force pm_message_t) 3) #define PMSG_SUSPEND((__force pm_message_t) 3) #define PMSG_ON ((__force pm_message_t) 0) which looks odd. Yes, but it is needed. There are many drivers, and they look at numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed all direct accesses to the constants, and will switch constants to something else... But that is going to be tommorow (need some sleep). The patches are going to acquire correct PCI device sleep state for suspend/resume. We discussed the issue several months ago. My plan is we first introduce 'platform_pci_set_power_state', then merge the 'platform_pci_choose_state' patch after Pavel's pm_message_t conversion finished. Maybe Len mislead my comments. Anyway for the callback, my intend is platform_pci_choose_state accept the pm_message_t parameter, and it return an 'int', since platform method possibly failed and then pci_choose_state translate the return value to pci_power_t. I reproduced it here.. I do not know who introduced platform_pci_choose_state, but it is *very* wrong. It returns it. Should it return pci_power_t? It probably should to match pci_choose_state, but that int is retyped to pm_message_t. Oops. That change came from Len. I've appended the two relevant patches below. So hm. We have incompatible changes in flight. That doesn't happen very often. Could I suggest that you prepare a fixup against 2.6.12-rc1-mm1 and send that to Len and myself? If that fixup is not suitable for a 2.6.12-rc1 based tree then I can look after it until things get flushed out. Could you just revert those two patches? First one is very wrong. Second one might be fixed, but... See comments below. I think the platform_pci_set_power_state should be ok, did you see it causes oops? And they are both dangerous -- they introduce new and untested functionality while I'm trying to transition from int to pm_message_t. They also affect all the drivers. Len, please Cc me on patches that affect suspend. @@ -17,6 +17,7 @@ #include acpi/acpi_bus.h #include linux/pci-acpi.h +#include pci.h Should be linux/pci.h? I suppose it's not exported out side of PCI, so I used 'pci.h' +static int acpi_pci_choose_state(struct pci_dev *pdev, pm_message_t state) +{ Should return pci_power_t, probably. Should return int as I said above. + char dstate_str[] = _S0D; + acpi_status status; + unsigned long val; + struct device *dev = pdev-dev; + + /* Fixme: the check is wrong after pm_message_t is a struct */ Exactly. + if ((state = PM_SUSPEND_MAX) || !DEVICE_ACPI_HANDLE(dev)) PM_SUSPEND_MAX and friends is going to disappear. Yep, this should be fixed. + return -EINVAL; + dstate_str[2] += state; /* _S1D, _S2D, _S3D, _S4D */ Ugh, assumes numerical values of states actually meaning anything. It definitely should not. Should be switch(state.event), but that code is not merged, yet = I'll send code that switches pm_message_t to struct, tommorow. But it may compile-time break some obscure drivers... diff -Nru a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c --- a/drivers/pci/pci-acpi.c 2005-03-21 17:02:38 -08:00 +++ b/drivers/pci/pci-acpi.c 2005-03-21 17:02:38 -08:00 @@ -253,6 +253,24 @@ return -ENODEV; } +static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state) +{ + acpi_handle handle = DEVICE_ACPI_HANDLE(dev-dev); + static int state_conv[] = { + [0] = 0, + [1] = 1, + [2] = 2, + [3] = 3, + [4] = 3 + }; + int acpi_state = state_conv[(int __force) state]; The table should be [PCI_D0] = 0, ... Ok, please revert the 'platform_pci_choose_pci' patch, I will add it after Pavel's conversion is finished. Or after Pavel's is done, I can send a quick fix. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
On Fri, 2005-03-18 at 02:08, Bjorn Helgaas wrote: > On Thu, 2005-03-17 at 09:33 +0800, Li Shaohua wrote: > > The comments in previous quirk said it's required only in PIC mode. > ... > > I feel we concerned too much. Changing the interrupt line isn't harmful, > > right? Linux actually ignored interrupt line. Maybe just a > > PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is > > sufficient. > > I think it's good to limit the scope of the quirk as much as > possible because that makes it easier to do future restructuring, > such as device-specific interrupt routers. > > The comment (before quirk_via_acpi(), nowhere near quirk_via_irqpic()) > says *on-chip devices* have this unusual behavior when the interrupt > line is written. That makes sense to me. > > Writing the interrupt line on random plug-in Via PCI devices does > not make sense to me, because for that to have any effect, an > upstream bridge would have to be snooping the traffic going through > it. That doesn't sound plausible to me. > > What about this: Hmm, this looks like previous solution. We removed the specific via quirk is because we don't know how many devices have such issue. Every time we encounter an IRQ issue in a VIA PCI device, we will suspect it requires quirk and keep try. This is a big overhead. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]fix oops when inserting ipmi_si module
Hi, In one of machines in our lab, spmi->addr.register_bit_width is 0 (so the returned address is invalid). Ignoring the check will cause inserting the module oops. Thanks, Shaohua Signed-off-by: Li Shaohua<[EMAIL PROTECTED]> --- a/drivers/char/ipmi/ipmi_si_intf.c 2005-03-03 10:56:51.0 +0800 +++ b/drivers/char/ipmi/ipmi_si_intf.c 2005-03-17 16:34:32.478606080 +0800 @@ -1466,6 +1466,11 @@ static int try_init_acpi(int intf_num, s if (!is_new_interface(-1, addr_space, spmi->addr.address)) return -ENODEV; + if (!spmi->addr.register_bit_width) { + acpi_failure = 1; + return -ENODEV; + } + /* Figure out the interface type. */ switch (spmi->InterfaceType) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]fix oops when inserting ipmi_si module
Hi, In one of machines in our lab, spmi-addr.register_bit_width is 0 (so the returned address is invalid). Ignoring the check will cause inserting the module oops. Thanks, Shaohua Signed-off-by: Li Shaohua[EMAIL PROTECTED] --- a/drivers/char/ipmi/ipmi_si_intf.c 2005-03-03 10:56:51.0 +0800 +++ b/drivers/char/ipmi/ipmi_si_intf.c 2005-03-17 16:34:32.478606080 +0800 @@ -1466,6 +1466,11 @@ static int try_init_acpi(int intf_num, s if (!is_new_interface(-1, addr_space, spmi-addr.address)) return -ENODEV; + if (!spmi-addr.register_bit_width) { + acpi_failure = 1; + return -ENODEV; + } + /* Figure out the interface type. */ switch (spmi-InterfaceType) { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
On Fri, 2005-03-18 at 02:08, Bjorn Helgaas wrote: On Thu, 2005-03-17 at 09:33 +0800, Li Shaohua wrote: The comments in previous quirk said it's required only in PIC mode. ... I feel we concerned too much. Changing the interrupt line isn't harmful, right? Linux actually ignored interrupt line. Maybe just a PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is sufficient. I think it's good to limit the scope of the quirk as much as possible because that makes it easier to do future restructuring, such as device-specific interrupt routers. The comment (before quirk_via_acpi(), nowhere near quirk_via_irqpic()) says *on-chip devices* have this unusual behavior when the interrupt line is written. That makes sense to me. Writing the interrupt line on random plug-in Via PCI devices does not make sense to me, because for that to have any effect, an upstream bridge would have to be snooping the traffic going through it. That doesn't sound plausible to me. What about this: Hmm, this looks like previous solution. We removed the specific via quirk is because we don't know how many devices have such issue. Every time we encounter an IRQ issue in a VIA PCI device, we will suspect it requires quirk and keep try. This is a big overhead. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
Hi, On Thu, 2005-03-17 at 00:10, Bjorn Helgaas wrote: > On Tue, 2005-03-15 at 16:02 -0700, Zwane Mwaikambo wrote: > > On Tue, 15 Mar 2005, Bjorn Helgaas wrote: > > > That seems awfully suspicious to me. So the following is > > > probably safe as far as it goes, but not sufficient for all > > > cases. > > > > VIA bridges allow for IRQ routing updates by programming > > PCI_INTERRUPT_LINE, so it is supposed to work even if we do it for > all the > > devices, so it appears to be a board/bios specific problem. > > This just feels like a sledgehammer approach, i.e., we're > programming PCI_INTERRUPT_LINE in more cases that we actually > need to. I especially don't like that any Via device with > devfn==0 triggers the quirk. That doesn't seem like the > right test if we're really looking for a Via bridge. > > > > -static void __devinit quirk_via_bridge(struct pci_dev *pdev) > > > +static void __devinit quirk_via_irqpic(struct pci_dev *dev) > > > { > > > - if(pdev->devfn == 0) { > > > - printk(KERN_INFO "PCI: Via IRQ fixup\n"); > > > - via_interrupt_line_quirk = 1; > > > + u8 irq, new_irq = dev->irq & 0xf; > > > + > > > + pci_read_config_byte(dev, PCI_INTERRUPT_LINE, ); > > > + if (new_irq != irq) { > > > + printk(KERN_INFO "PCI: Via IRQ fixup for %s, from %d > to %d\n", > > > + pci_name(dev), irq, new_irq); > > > + udelay(15); > > > + pci_write_config_byte(dev, PCI_INTERRUPT_LINE, > new_irq); > > > } > > > } > > > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, > PCI_ANY_ID, quirk_via_bridge ); > > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, > PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irqpic); > > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, > PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic); > > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, > PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic); > > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, > PCI_DEVICE_ID_VIA_8233_5, quirk_via_irqpic); > > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, > PCI_DEVICE_ID_VIA_8233_7, quirk_via_irqpic); > > > > This looks like it'll only affect the PCI device associated with the > > listed south bridges, which might break systems which relied on the > per > > device setting. Your 'debug' patch actually made sense to me, that > is, > > moving the PCI_INTERRUPT_LINE fixup at gsi register. > > Yes, that's what I meant by the above probably not being sufficient. > > The main thing the debug patch did was to move the write to after > the IOAPIC programming. (And I think it added back the mysterious > udelay().) My point is that the write could just as easily be done > in a pci_enable fixup, because that also happens after the IOAPIC > update. The comments in previous quirk said it's required only in PIC mode. > > The quirk would have to be something like this: > > static void __devinit quirk_via_irq(struct pci_dev *dev) > { > if (!via_interrupt_line_quirk) > return; > > /* update PCI_INTERRUPT_LINE */ > ... > } > DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, > quirk_via_irq); > > with a PCI_FIXUP_HEADER quirk that sets via_interrupt_line_quirk when > we find a Via bridge. > > But I'm uneasy even about this -- what if there are multiple bridges, > with only one of them being a Via? Why would we want to apply this > quirk to the devices under the non-Via bridges? Wouldn't it be better > to search up the hierarchy of each device, looking for a Via bridge, > and apply the quirk only if we find one? I feel we concerned too much. Changing the interrupt line isn't harmful, right? Linux actually ignored interrupt line. Maybe just a PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is sufficient. and quirk_via_irq(..) { update_interrupt_line } Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
Hi, On Thu, 2005-03-17 at 00:10, Bjorn Helgaas wrote: On Tue, 2005-03-15 at 16:02 -0700, Zwane Mwaikambo wrote: On Tue, 15 Mar 2005, Bjorn Helgaas wrote: That seems awfully suspicious to me. So the following is probably safe as far as it goes, but not sufficient for all cases. VIA bridges allow for IRQ routing updates by programming PCI_INTERRUPT_LINE, so it is supposed to work even if we do it for all the devices, so it appears to be a board/bios specific problem. This just feels like a sledgehammer approach, i.e., we're programming PCI_INTERRUPT_LINE in more cases that we actually need to. I especially don't like that any Via device with devfn==0 triggers the quirk. That doesn't seem like the right test if we're really looking for a Via bridge. -static void __devinit quirk_via_bridge(struct pci_dev *pdev) +static void __devinit quirk_via_irqpic(struct pci_dev *dev) { - if(pdev-devfn == 0) { - printk(KERN_INFO PCI: Via IRQ fixup\n); - via_interrupt_line_quirk = 1; + u8 irq, new_irq = dev-irq 0xf; + + pci_read_config_byte(dev, PCI_INTERRUPT_LINE, irq); + if (new_irq != irq) { + printk(KERN_INFO PCI: Via IRQ fixup for %s, from %d to %d\n, + pci_name(dev), irq, new_irq); + udelay(15); + pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq); } } -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_bridge ); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irqpic); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8233_5, quirk_via_irqpic); +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_8233_7, quirk_via_irqpic); This looks like it'll only affect the PCI device associated with the listed south bridges, which might break systems which relied on the per device setting. Your 'debug' patch actually made sense to me, that is, moving the PCI_INTERRUPT_LINE fixup at gsi register. Yes, that's what I meant by the above probably not being sufficient. The main thing the debug patch did was to move the write to after the IOAPIC programming. (And I think it added back the mysterious udelay().) My point is that the write could just as easily be done in a pci_enable fixup, because that also happens after the IOAPIC update. The comments in previous quirk said it's required only in PIC mode. The quirk would have to be something like this: static void __devinit quirk_via_irq(struct pci_dev *dev) { if (!via_interrupt_line_quirk) return; /* update PCI_INTERRUPT_LINE */ ... } DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, quirk_via_irq); with a PCI_FIXUP_HEADER quirk that sets via_interrupt_line_quirk when we find a Via bridge. But I'm uneasy even about this -- what if there are multiple bridges, with only one of them being a Via? Why would we want to apply this quirk to the devices under the non-Via bridges? Wouldn't it be better to search up the hierarchy of each device, looking for a Via bridge, and apply the quirk only if we find one? I feel we concerned too much. Changing the interrupt line isn't harmful, right? Linux actually ignored interrupt line. Maybe just a PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is sufficient. and quirk_via_irq(..) { update_interrupt_line } Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Call for help: list of machines with working S3
Hi, On Mon, 2005-03-14 at 16:00, Pavel Machek wrote: > Hi! > > > * MySQL (hinders the actual suspension process and kicks the pc > back to > > where it was) > > Try this patch... > Pavel > > --- clean/kernel/signal.c 2005-02-03 22:27:26.0 +0100 > +++ linux/kernel/signal.c 2005-02-03 22:28:19.0 +0100 > @@ -,6 +,7 @@ > ret = -EINTR; > } > > + try_to_freeze(1); > return ret; > } I also encounter a similar issue. syslogd can't be stopped. It's waiting for kjournald to flush some works but kjournald is stopped first. Looks like the kernel thread should be stopped later than user thread just like Nigel's suspend2 patch does. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ACPI] Re: Call for help: list of machines with working S3
Hi, On Mon, 2005-03-14 at 16:00, Pavel Machek wrote: Hi! * MySQL (hinders the actual suspension process and kicks the pc back to where it was) Try this patch... Pavel --- clean/kernel/signal.c 2005-02-03 22:27:26.0 +0100 +++ linux/kernel/signal.c 2005-02-03 22:28:19.0 +0100 @@ -,6 +,7 @@ ret = -EINTR; } + try_to_freeze(1); return ret; } I also encounter a similar issue. syslogd can't be stopped. It's waiting for kjournald to flush some works but kjournald is stopped first. Looks like the kernel thread should be stopped later than user thread just like Nigel's suspend2 patch does. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
Hi, This issue is quite interesting. We removed all specific VIA quirk recently and apply a generic VIA quirk. But in this case, the MCH 00:0.0 is from AMD, and the ISA bridge and built-in devices are from VIA, this means VIA quirk is useless, since it takes action only when the MCH is from VIA. We possibly should enable VIA quirk if a VIA ISA bridge is found instead of a VIA MCH found, but Bjorn's method seems ok. If you want to put the patch into kernel, please also change the ' pirq_enable_irq' case. Thanks, Shaohua >-Original Message- >From: [EMAIL PROTECTED] [mailto:acpi-devel- >[EMAIL PROTECTED] On Behalf Of Grzegorz Kulewski >Sent: Sunday, March 13, 2005 11:15 PM >To: Bjorn Helgaas >Cc: Andrew Morton; ACPI List; lkml >Subject: Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks >USB > >On Fri, 11 Mar 2005, Bjorn Helgaas wrote: > >> Can you do an "lspci -vvn"? I'm looking at quirk_via_irqpic() in >> 2.6.9, which is what printed this: >> PCI: Via IRQ fixup for :00:07.2, from 9 to 10 PCI: Via IRQ fixup for :00:07.3, from 9 to 10 >> >> but it looks like it should only run for PCI_DEVICE_ID_VIA_82C586_2, >> PCI_DEVICE_ID_VIA_82C686_5, and PCI_DEVICE_ID_VIA_82C686_6. >> >> You have: >> >> :00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super >South] (rev 40) >> :00:07.1 IDE interface: VIA Technologies, Inc. >VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06) >> :00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 1a) (prog-if >00 [UHCI]) >> :00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 1a) (prog-if >00 [UHCI]) >> >> and we apparently ran the quirk for 07.2 and 07.3. I wouldn't >> have thought those would have one of the above device IDs. The >> "lspci -vvn" should tell us for sure. >> >> 2.6.11 removed that quirk and runs quirk_via_bridge() for >> all VIA devices, but only sets via_interrupt_line_quirk if >> (pdev->devfn == 0), which you don't have. So that's why >> my patch didn't do anything. >> >>> Also two more questions: >>> >>> 1. What is VIA fixup? Is it some hardware bug? Or BIOS problem? Why is >it >>> needed? On what hardware / software it is needed? >> >> I really don't know much about the VIA fixup. I just noticed >> that we seem to be doing it slightly differently in 2.6.11 than >> we did in 2.6.9, and thought maybe it was related to your problem. >> Here's a changeset that has a couple pointers: >> >>http://linux.bkbits.net:8080/linux- >2.5/cset%4041cb9d48DRV4TYe77gvstTawuZFYyQ >> >>> 2. Why this patch shrinked bzImage that much: >>> >>> -rw-r--r-- 1 root root 1828186 mar 11 23:33 vmlinuz-2.6.11-cko1 >>> -rw-r--r-- 1 root root 1828355 mar 2 20:48 vmlinuz-2.6.11-cko1.old >> >> I have no idea about this. But it's only a couple hundred bytes. >> >> So here's another patch to try (revert the first one, then apply this). >> >> = drivers/acpi/pci_irq.c 1.37 vs edited = >> --- 1.37/drivers/acpi/pci_irq.c 2005-03-01 09:57:29 -07:00 >> +++ edited/drivers/acpi/pci_irq.c2005-03-11 15:13:49 -07:00 >> @@ -30,6 +30,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -438,10 +439,17 @@ >> } >> } >> >> -if (via_interrupt_line_quirk) >> -pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq & 15); >> - >> dev->irq = acpi_register_gsi(irq, edge_level, active_high_low); >> + >> +if (dev->vendor == PCI_VENDOR_ID_VIA) { >> +u8 old_irq, new_irq = dev->irq & 0xf; >> + >> +pci_read_config_byte(dev, PCI_INTERRUPT_LINE, _irq); >> +printk(KERN_INFO PREFIX "Via IRQ fixup for %s, from %d " >> +"to %d\n", pci_name(dev), old_irq, new_irq); >> +udelay(15); >> +pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq); >> +} >> >> printk(KERN_INFO PREFIX "PCI interrupt %s[%c] -> GSI %u " >> "(%s, %s) -> IRQ %d\n", >> > >Ok, this patch works. Here is the log: > >Mar 13 17:16:17 kangur Linux version 2.6.11-cko1 ([EMAIL PROTECTED]) (gcc >version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)) #3 >Sun Mar 13 17:10:10 CET 2005 >Mar 13 17:16:17 kangur BIOS-provided physical RAM map: >Mar 13 17:16:17 kangur BIOS-e820: - 0009fc00 >(usable) >Mar 13 17:16:17 kangur BIOS-e820: 0009fc00 - 000a >(reserved) >Mar 13 17:16:17 kangur BIOS-e820: 000f - 0010 >(reserved) >Mar 13 17:16:17 kangur BIOS-e820: 0010 - 1fff >(usable) >Mar 13 17:16:17 kangur BIOS-e820: 1fff - 1fff3000 >(ACPI NVS) >Mar 13 17:16:17 kangur BIOS-e820: 1fff3000 - 2000 >(ACPI data) >Mar 13 17:16:17 kangur BIOS-e820: - 0001 >(reserved) >Mar 13 17:16:17 kangur 511MB LOWMEM available. >Mar 13 17:16:17 kangur On node 0 totalpages: 131056 >Mar 13 17:16:17 kangur DMA zone: 4096
RE: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB
Hi, This issue is quite interesting. We removed all specific VIA quirk recently and apply a generic VIA quirk. But in this case, the MCH 00:0.0 is from AMD, and the ISA bridge and built-in devices are from VIA, this means VIA quirk is useless, since it takes action only when the MCH is from VIA. We possibly should enable VIA quirk if a VIA ISA bridge is found instead of a VIA MCH found, but Bjorn's method seems ok. If you want to put the patch into kernel, please also change the ' pirq_enable_irq' case. Thanks, Shaohua -Original Message- From: [EMAIL PROTECTED] [mailto:acpi-devel- [EMAIL PROTECTED] On Behalf Of Grzegorz Kulewski Sent: Sunday, March 13, 2005 11:15 PM To: Bjorn Helgaas Cc: Andrew Morton; ACPI List; lkml Subject: Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB On Fri, 11 Mar 2005, Bjorn Helgaas wrote: Can you do an lspci -vvn? I'm looking at quirk_via_irqpic() in 2.6.9, which is what printed this: PCI: Via IRQ fixup for :00:07.2, from 9 to 10 PCI: Via IRQ fixup for :00:07.3, from 9 to 10 but it looks like it should only run for PCI_DEVICE_ID_VIA_82C586_2, PCI_DEVICE_ID_VIA_82C686_5, and PCI_DEVICE_ID_VIA_82C686_6. You have: :00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) :00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06) :00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 1a) (prog-if 00 [UHCI]) :00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 1a) (prog-if 00 [UHCI]) and we apparently ran the quirk for 07.2 and 07.3. I wouldn't have thought those would have one of the above device IDs. The lspci -vvn should tell us for sure. 2.6.11 removed that quirk and runs quirk_via_bridge() for all VIA devices, but only sets via_interrupt_line_quirk if (pdev-devfn == 0), which you don't have. So that's why my patch didn't do anything. Also two more questions: 1. What is VIA fixup? Is it some hardware bug? Or BIOS problem? Why is it needed? On what hardware / software it is needed? I really don't know much about the VIA fixup. I just noticed that we seem to be doing it slightly differently in 2.6.11 than we did in 2.6.9, and thought maybe it was related to your problem. Here's a changeset that has a couple pointers: http://linux.bkbits.net:8080/linux- 2.5/cset%4041cb9d48DRV4TYe77gvstTawuZFYyQ 2. Why this patch shrinked bzImage that much: -rw-r--r-- 1 root root 1828186 mar 11 23:33 vmlinuz-2.6.11-cko1 -rw-r--r-- 1 root root 1828355 mar 2 20:48 vmlinuz-2.6.11-cko1.old I have no idea about this. But it's only a couple hundred bytes. So here's another patch to try (revert the first one, then apply this). = drivers/acpi/pci_irq.c 1.37 vs edited = --- 1.37/drivers/acpi/pci_irq.c 2005-03-01 09:57:29 -07:00 +++ edited/drivers/acpi/pci_irq.c2005-03-11 15:13:49 -07:00 @@ -30,6 +30,7 @@ #include linux/module.h #include linux/init.h #include linux/types.h +#include linux/delay.h #include linux/proc_fs.h #include linux/spinlock.h #include linux/pm.h @@ -438,10 +439,17 @@ } } -if (via_interrupt_line_quirk) -pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq 15); - dev-irq = acpi_register_gsi(irq, edge_level, active_high_low); + +if (dev-vendor == PCI_VENDOR_ID_VIA) { +u8 old_irq, new_irq = dev-irq 0xf; + +pci_read_config_byte(dev, PCI_INTERRUPT_LINE, old_irq); +printk(KERN_INFO PREFIX Via IRQ fixup for %s, from %d +to %d\n, pci_name(dev), old_irq, new_irq); +udelay(15); +pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq); +} printk(KERN_INFO PREFIX PCI interrupt %s[%c] - GSI %u (%s, %s) - IRQ %d\n, Ok, this patch works. Here is the log: Mar 13 17:16:17 kangur Linux version 2.6.11-cko1 ([EMAIL PROTECTED]) (gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)) #3 Sun Mar 13 17:10:10 CET 2005 Mar 13 17:16:17 kangur BIOS-provided physical RAM map: Mar 13 17:16:17 kangur BIOS-e820: - 0009fc00 (usable) Mar 13 17:16:17 kangur BIOS-e820: 0009fc00 - 000a (reserved) Mar 13 17:16:17 kangur BIOS-e820: 000f - 0010 (reserved) Mar 13 17:16:17 kangur BIOS-e820: 0010 - 1fff (usable) Mar 13 17:16:17 kangur BIOS-e820: 1fff - 1fff3000 (ACPI NVS) Mar 13 17:16:17 kangur BIOS-e820: 1fff3000 - 2000 (ACPI data) Mar 13 17:16:17 kangur BIOS-e820: - 0001 (reserved) Mar 13 17:16:17 kangur 511MB LOWMEM available. Mar 13 17:16:17 kangur On node 0 totalpages: 131056 Mar 13 17:16:17 kangur DMA zone: 4096 pages, LIFO batch:1 Mar 13 17:16:17 kangur Normal zone: 126960 pages, LIFO batch:16 Mar 13 17:16:17 kangur HighMem zone: 0 pages, LIFO
RE: [ACPI] s4bios: does anyone use it?
Hi, >> > >> > Is there single user of s4bios? It used to work for me 4 notebooks >> > ago, but I never really used it. >> >> I don't have anymore my toshiba laptop where S4 bios was first >> implemented. >> >> > I think I'm the only person that ever >> > seen it working, but I could be wrong. >> >> You are indeed wrong. > >Okay, so we had 2 users in past but have 0 users now? :-). I wonder how could anyone use S4BIOS in 2.6.11. S4 and S4b all came into 'enter_state'. and in acpi_sleep_init: if (i == ACPI_STATE_S4) { if (acpi_gbl_FACS->S4bios_f) { sleep_states[i] = 1; printk(" S4bios"); acpi_pm_ops.pm_disk_mode = PM_DISK_FIRMWARE; } if (sleep_states[i]) acpi_pm_ops.pm_disk_mode = PM_DISK_PLATFORM; } That means we actually can't set PM_DISK_FIRMWARE (always set PM_DISK_PLATFORM). Is this intended? If no, .pm_disk_mode should be a mask. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [ACPI] s4bios: does anyone use it?
Hi, Is there single user of s4bios? It used to work for me 4 notebooks ago, but I never really used it. I don't have anymore my toshiba laptop where S4 bios was first implemented. I think I'm the only person that ever seen it working, but I could be wrong. You are indeed wrong. Okay, so we had 2 users in past but have 0 users now? :-). I wonder how could anyone use S4BIOS in 2.6.11. S4 and S4b all came into 'enter_state'. and in acpi_sleep_init: if (i == ACPI_STATE_S4) { if (acpi_gbl_FACS-S4bios_f) { sleep_states[i] = 1; printk( S4bios); acpi_pm_ops.pm_disk_mode = PM_DISK_FIRMWARE; } if (sleep_states[i]) acpi_pm_ops.pm_disk_mode = PM_DISK_PLATFORM; } That means we actually can't set PM_DISK_FIRMWARE (always set PM_DISK_PLATFORM). Is this intended? If no, .pm_disk_mode should be a mask. Thanks, Shaohua - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fixup debug warnings during ACPI S3 resume from ram
On Sat, 2005-01-15 at 08:24, Christian Borntraeger wrote: > During the wakeup from suspend-to-ram I get several warnings (see below). > This patch fixes the warnings for me, but I am not an expert in ACPI. Please > read the patch and consider to apply it. Thanks looking at this issue. We (intel ACPI team) have many discussions about this issue. Actually this problem isn't so easy. The warning is when doing resume PCI link device with interrupt disabled. A more important issue is suspend/resume is doing with all processes frozen, which will cause many issues such as semaphore, memory mapping, kmalloc. The real solution is on going. I'll let you know when it's ready. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4]Bind physical devices with ACPI devices - take 2
On Mon, 2005-01-17 at 19:28, Pavel Machek wrote: > Hi! > > > > The series of patches implement binding physical devices with ACPI > > > devices. With it, device drivers can utilize methods provided by > > > firmware (ACPI). These patches are against 2.6.10, please give your > > > comments. > > > This is updated patches according to latest discussion. > > Changes from last one: > > 1. introduce new field 'firmware_data' in 'struct device', since people > > complain rename 'platform_data. Greg, could you please check if the > > comments I added in 'struct device' are correct? > > 2. align to Pavel's latest PCI state convention work. > > 3. Some cleanups and add more comments. > > One issue is 'platform_pci_choose_state' doesn't get called, it should > > be after Pavel updates the parameter of 'pci_choose_state' > > diff -puN drivers/pci/pci.c~acpi-pci-get-suspend-state-callback > drivers/pci/pci.c > --- 2.5/drivers/pci/pci.c~acpi-pci-get-suspend-state-callback > 2005-01-17 12:54:05.357547072 +0800 > +++ 2.5-root/drivers/pci/pci.c 2005-01-17 13:08:50.835933896 +0800 > @@ -317,6 +317,7 @@ pci_set_power_state(struct pci_dev *dev, > * Returns PCI power state suitable for given device and given system > * message. > */ > +int (*platform_pci_choose_state)(struct pci_dev *, pm_message_t) = 0; > > pci_power_t pci_choose_state(struct pci_dev *dev, u32 state) > { > > Perhaps you want this to be "= NULL"? I must be in sleep :). I will fix it soon. > > > > @@ -208,6 +209,25 @@ acpi_status pci_osc_control_set(u32 flag > > } > > EXPORT_SYMBOL(pci_osc_control_set); > > > > +static int acpi_pci_choose_state(struct pci_dev *pdev, > > + pm_message_t state) > > +{ > > + char dstate_str[] = "_S0D"; > > + acpi_status status; > > + unsigned long val; > > + struct device *dev = >dev; > > + > > + /* state is PM_SUSPEND_* */ > > + if ((state >= PM_SUSPEND_MAX) || !DEVICE_ACPI_HANDLE(dev)) > > + return -EINVAL; > > + dstate_str[2] += (int __force)state; > > When I'm done, you will not be able to just retype state to > integer... Perhaps you want to do pci_choose_state first; that gets > you pci_power_t and that one *is* okay to retype to int? Firmware possibly will can't return a useful suspend state (Either firmware doesn't define such device or evaluation failed), that's why I return an int. I suppose pci_choose_state will do something: ret = firmware_pci_choose_state(dev, state); if (ret >= 0) pci_state = ret; switch(pci_state) { case 0: return PCI_D0; . } Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4]Bind physical devices with ACPI devices - take 2
On Wed, 2005-01-05 at 10:50, Li Shaohua wrote: > Hi, > The series of patches implement binding physical devices with ACPI > devices. With it, device drivers can utilize methods provided by > firmware (ACPI). These patches are against 2.6.10, please give your > comments. Hi, This is updated patches according to latest discussion. Changes from last one: 1. introduce new field 'firmware_data' in 'struct device', since people complain rename 'platform_data. Greg, could you please check if the comments I added in 'struct device' are correct? 2. align to Pavel's latest PCI state convention work. 3. Some cleanups and add more comments. One issue is 'platform_pci_choose_state' doesn't get called, it should be after Pavel updates the parameter of 'pci_choose_state' Thanks, Shaohua This patch implemented the framework for binding physical devices with ACPI devices. A physical bus like PCI bus should create a 'acpi_bus_type'. The method in 'acpi_bus_type': .find_device: For device which has parent such as normal PCI devices. .find_bridge: It's for special devices, such as PCI root bridge and IDE controller. such devices generally haven't parent or ->bus. We use the special method to get an ACPI handle. --- 2.5-root/drivers/acpi/Makefile |2 2.5-root/drivers/acpi/glue.c | 360 +++ 2.5-root/drivers/acpi/ibm_acpi.c |4 2.5-root/include/acpi/acpi_bus.h | 21 ++ 2.5-root/include/linux/device.h |6 5 files changed, 388 insertions(+), 5 deletions(-) diff -puN /dev/null drivers/acpi/glue.c --- /dev/null 2004-02-24 05:02:56.0 +0800 +++ 2.5-root/drivers/acpi/glue.c 2005-01-17 12:52:16.825046520 +0800 @@ -0,0 +1,360 @@ +/* + * Link physical devices with ACPI devices support + */ +#include +#include +#include +#include +#include + +#define ACPI_GLUE_DEBUG 0 +#if ACPI_GLUE_DEBUG +#define DBG(x...) printk(PREFIX x) +#else +#define DBG(x...) +#endif +static LIST_HEAD(bus_type_list); +static DECLARE_RWSEM(bus_type_sem); + +int register_acpi_bus_type(struct acpi_bus_type *type) +{ + if (acpi_disabled) + return -ENODEV; + if (type && type->bus && type->find_device) { + down_write(_type_sem); + list_add_tail(>list, _type_list); + up_write(_type_sem); + DBG("ACPI bus type %s registered\n", type->bus->name); + return 0; + } + return -ENODEV; +} +EXPORT_SYMBOL(register_acpi_bus_type); + +int unregister_acpi_bus_type(struct acpi_bus_type *type) +{ + if (acpi_disabled) + return 0; + if (type) { + down_write(_type_sem); + list_del_init(>list); + up_write(_type_sem); + DBG("ACPI bus type %s unregistered\n", type->bus->name); + return 0; + } + return -ENODEV; +} +EXPORT_SYMBOL(unregister_acpi_bus_type); + +static struct acpi_bus_type * +acpi_get_bus_type(struct bus_type *type) +{ + struct acpi_bus_type *tmp, *ret = NULL; + + down_read(_type_sem); + list_for_each_entry(tmp, _type_list, list) { + if (tmp->bus == type) { + ret = tmp; + break; + } + } + up_read(_type_sem); + return ret; +} + +static int +acpi_find_bridge_device(struct device *dev, acpi_handle *handle) +{ + struct acpi_bus_type *tmp; + int ret = -ENODEV; + + down_read(_type_sem); + list_for_each_entry(tmp, _type_list, list) { + if (tmp->find_bridge && !tmp->find_bridge(dev, handle)) { + ret = 0; + break; + } + } + up_read(_type_sem); + return ret; +} + +/* Get PCI root bridge's handle from its segment and bus number */ +struct acpi_find_pci_root { + unsigned int seg; + unsigned int bus; + acpi_handle handle; +}; + +static acpi_status +do_root_bridge_busnr_callback (struct acpi_resource *resource, void *data) +{ + int *busnr = (int *)data; + struct acpi_resource_address64 address; + + if (resource->id != ACPI_RSTYPE_ADDRESS16 && + resource->id != ACPI_RSTYPE_ADDRESS32 && + resource->id != ACPI_RSTYPE_ADDRESS64) + return AE_OK; + + acpi_resource_to_address64(resource, ); + if ((address.address_length > 0) && + (address.resource_type == ACPI_BUS_NUMBER_RANGE)) + *busnr = address.min_address_range; + + return AE_OK; +} + +static int +get_root_bridge_busnr(acpi_handle handle) +{ + acpi_status status; + int bus, bbn; + struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL}; + + acpi_get_name(handle, ACPI_FULL_PATHNAME, ); + + status = acpi_evaluate_integer(handle, METHOD_NAME__BBN, NULL, + (unsigned long *)); + if (status == AE_NOT_FOUND) { + /* Assume bus = 0 */ + printk(KERN_INFO PREFIX + "Assume root bridge [%s] bus is 0\n", + (char *)buffer.pointer); + status = AE_OK; + bbn = 0; + } + if (ACPI_FAILURE(status)) { + bbn = -ENODEV; + goto exit; + } + if (bbn > 0) + goto exit; + + /* _BBN in some systems return 0 for all root bridges */ + bus = -1; + status = acpi_walk_resources(handle, METHOD_NAME__CRS, + do_root_bridge_busnr_callback, ); + /* If _CRS failed, we just use _BBN */ + if (ACPI_FAILURE(status
Re: [PATCH 0/4]Bind physical devices with ACPI devices - take 2
On Wed, 2005-01-05 at 10:50, Li Shaohua wrote: Hi, The series of patches implement binding physical devices with ACPI devices. With it, device drivers can utilize methods provided by firmware (ACPI). These patches are against 2.6.10, please give your comments. Hi, This is updated patches according to latest discussion. Changes from last one: 1. introduce new field 'firmware_data' in 'struct device', since people complain rename 'platform_data. Greg, could you please check if the comments I added in 'struct device' are correct? 2. align to Pavel's latest PCI state convention work. 3. Some cleanups and add more comments. One issue is 'platform_pci_choose_state' doesn't get called, it should be after Pavel updates the parameter of 'pci_choose_state' Thanks, Shaohua This patch implemented the framework for binding physical devices with ACPI devices. A physical bus like PCI bus should create a 'acpi_bus_type'. The method in 'acpi_bus_type': .find_device: For device which has parent such as normal PCI devices. .find_bridge: It's for special devices, such as PCI root bridge and IDE controller. such devices generally haven't parent or -bus. We use the special method to get an ACPI handle. --- 2.5-root/drivers/acpi/Makefile |2 2.5-root/drivers/acpi/glue.c | 360 +++ 2.5-root/drivers/acpi/ibm_acpi.c |4 2.5-root/include/acpi/acpi_bus.h | 21 ++ 2.5-root/include/linux/device.h |6 5 files changed, 388 insertions(+), 5 deletions(-) diff -puN /dev/null drivers/acpi/glue.c --- /dev/null 2004-02-24 05:02:56.0 +0800 +++ 2.5-root/drivers/acpi/glue.c 2005-01-17 12:52:16.825046520 +0800 @@ -0,0 +1,360 @@ +/* + * Link physical devices with ACPI devices support + */ +#include linux/init.h +#include linux/list.h +#include linux/device.h +#include linux/rwsem.h +#include linux/acpi.h + +#define ACPI_GLUE_DEBUG 0 +#if ACPI_GLUE_DEBUG +#define DBG(x...) printk(PREFIX x) +#else +#define DBG(x...) +#endif +static LIST_HEAD(bus_type_list); +static DECLARE_RWSEM(bus_type_sem); + +int register_acpi_bus_type(struct acpi_bus_type *type) +{ + if (acpi_disabled) + return -ENODEV; + if (type type-bus type-find_device) { + down_write(bus_type_sem); + list_add_tail(type-list, bus_type_list); + up_write(bus_type_sem); + DBG(ACPI bus type %s registered\n, type-bus-name); + return 0; + } + return -ENODEV; +} +EXPORT_SYMBOL(register_acpi_bus_type); + +int unregister_acpi_bus_type(struct acpi_bus_type *type) +{ + if (acpi_disabled) + return 0; + if (type) { + down_write(bus_type_sem); + list_del_init(type-list); + up_write(bus_type_sem); + DBG(ACPI bus type %s unregistered\n, type-bus-name); + return 0; + } + return -ENODEV; +} +EXPORT_SYMBOL(unregister_acpi_bus_type); + +static struct acpi_bus_type * +acpi_get_bus_type(struct bus_type *type) +{ + struct acpi_bus_type *tmp, *ret = NULL; + + down_read(bus_type_sem); + list_for_each_entry(tmp, bus_type_list, list) { + if (tmp-bus == type) { + ret = tmp; + break; + } + } + up_read(bus_type_sem); + return ret; +} + +static int +acpi_find_bridge_device(struct device *dev, acpi_handle *handle) +{ + struct acpi_bus_type *tmp; + int ret = -ENODEV; + + down_read(bus_type_sem); + list_for_each_entry(tmp, bus_type_list, list) { + if (tmp-find_bridge !tmp-find_bridge(dev, handle)) { + ret = 0; + break; + } + } + up_read(bus_type_sem); + return ret; +} + +/* Get PCI root bridge's handle from its segment and bus number */ +struct acpi_find_pci_root { + unsigned int seg; + unsigned int bus; + acpi_handle handle; +}; + +static acpi_status +do_root_bridge_busnr_callback (struct acpi_resource *resource, void *data) +{ + int *busnr = (int *)data; + struct acpi_resource_address64 address; + + if (resource-id != ACPI_RSTYPE_ADDRESS16 + resource-id != ACPI_RSTYPE_ADDRESS32 + resource-id != ACPI_RSTYPE_ADDRESS64) + return AE_OK; + + acpi_resource_to_address64(resource, address); + if ((address.address_length 0) + (address.resource_type == ACPI_BUS_NUMBER_RANGE)) + *busnr = address.min_address_range; + + return AE_OK; +} + +static int +get_root_bridge_busnr(acpi_handle handle) +{ + acpi_status status; + int bus, bbn; + struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL}; + + acpi_get_name(handle, ACPI_FULL_PATHNAME, buffer); + + status = acpi_evaluate_integer(handle, METHOD_NAME__BBN, NULL, + (unsigned long *)bbn); + if (status == AE_NOT_FOUND) { + /* Assume bus = 0 */ + printk(KERN_INFO PREFIX + Assume root bridge [%s] bus is 0\n, + (char *)buffer.pointer); + status = AE_OK; + bbn = 0; + } + if (ACPI_FAILURE(status)) { + bbn = -ENODEV; + goto exit; + } + if (bbn 0) + goto exit; + + /* _BBN in some systems return 0 for all root bridges */ + bus = -1; + status = acpi_walk_resources(handle, METHOD_NAME__CRS, + do_root_bridge_busnr_callback, bus); + /* If _CRS failed, we just use _BBN */ + if (ACPI_FAILURE(status) || (bus == -1)) + goto exit; + /* We select _CRS */ + if (bbn