RE: [PATCH]PCIE ASPM support - takes 3

2008-01-25 Thread Li, Shaohua
>
>
>Hi!
>
>> v3->v2, fixed the issues Matthew Wilcox raised.
>>
>> PCI Express ASPM defines a protocol for PCI Express components in the
D0
>> state to reduce Link power by placing their Links into a low power
state
>> and instructing the other end of the Link to do likewise. This
>> capability allows hardware-autonomous, dynamic Link power reduction
>> beyond what is achievable by software-only controlled power
management.
>> However, The device should be configured by software appropriately.
>> Enabling ASPM will save power, but will introduce device latency.
>
>How big is the latency? 1msec? 10msec? 100usec?
Haven't accurate number, but in one device, it declaims L0s latency is <
128ns, L1 latency is < 64us.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH]PCIE ASPM support - takes 3

2008-01-25 Thread Li, Shaohua


Hi!

 v3-v2, fixed the issues Matthew Wilcox raised.

 PCI Express ASPM defines a protocol for PCI Express components in the
D0
 state to reduce Link power by placing their Links into a low power
state
 and instructing the other end of the Link to do likewise. This
 capability allows hardware-autonomous, dynamic Link power reduction
 beyond what is achievable by software-only controlled power
management.
 However, The device should be configured by software appropriately.
 Enabling ASPM will save power, but will introduce device latency.

How big is the latency? 1msec? 10msec? 100usec?
Haven't accurate number, but in one device, it declaims L0s latency is 
128ns, L1 latency is  64us.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

2007-06-28 Thread Li, Shaohua


>-Original Message-
>From: Robert Hancock [mailto:[EMAIL PROTECTED]
>Sent: Friday, June 29, 2007 8:59 AM
>To: Zan Lynx
>Cc: Andrew Morton; linux-kernel@vger.kernel.org; Raj, Ashok; Li,
Shaohua;
>Keshavamurthy, Anil S
>Subject: Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
>
>Zan Lynx wrote:
>> On Thu, 2007-06-28 at 03:43 -0700, Andrew Morton wrote:
>>>
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-
>rc6/2.6.22-rc6-mm1/
>>
>>> +intel-iommu-dmar-detection-and-parsing-logic.patch
>>> +intel-iommu-pci-generic-helper-function.patch
>>> +intel-iommu-pci-generic-helper-function-fix.patch
>>> +intel-iommu-clflush_cache_range-now-takes-size-param.patch
>>> +intel-iommu-iova-allocation-and-management-routines.patch
>>> +intel-iommu-iova-allocation-and-management-routines-fix.patch
>>> +intel-iommu-iova-allocation-and-management-routines-fix-2.patch
>>> +intel-iommu-intel-iommu-driver.patch
>>> +intel-iommu-intel-iommu-driver-fix.patch
>>> +intel-iommu-intel-iommu-driver-fix-2.patch
>>>
+intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
>>> +intel-iommu-intel-iommu-cmdline-option-forcedac.patch
>>> +intel-iommu-dmar-fault-handling-support.patch
>>> +intel-iommu-iommu-gfx-workaround.patch
>>> +intel-iommu-iommu-floppy-workaround.patch
>>> +intel-iommu-iommu-floppy-workaround-fix.patch
>>> +intel-iommu-iommu-floppy-workaround-fix-fix.patch
>>>
>>>  Intel IOMMU support
>>
>> I believe the above patch set is causing the problem.  On my first
try
>> with rc6-mm1 I said Yes to the CONFIG_DMAR options. (I'm nearly as
good
>> as random option selection :-)
>>
>> The system panicked during boot, I believe it was trying to detect an
>> Intel IOMMU.  Later when I have a camera, I will try to post a
>> screenshot of the backtrace. (I can't seem to get netconsole to work
on
>> boot, only in a module).
>>
>> When I recompiled without DMAR set, things seem to be working great.
I
>> seem to be getting better disk read throughput than rc3-mm1, by the
way.
>>
>> This laptop is an AMD Athlon64 on a NForce3 running a 64-bit Gentoo
>> build.
>>
>> I'll provide more details on request, and when I get the chance.
This
>> is a heads-up on the BUG in case someone has an "ah ha!" moment.
>
>I took a picture of it, looks like the backtrace is:
>
>NULL pointer dereference at 024
>EIP:dmar_table_init+0x11
>intel_iommu_init+0x30
>pci_iommu_init+0xe
>kernel_init+0x16e
>
>Presumably something is NULL in dmar_table_init that wasn't expected to
>be.. I would guess it likely crashes on any system without an Intel
>IOMMU in it.
How about something like below?


int __init dmar_table_init(void)
{
+   if (!dmar_tbl)
+   return -ENODEV;
parse_dmar_table();
if (list_empty(_drhd_units)) {
printk(KERN_ERR PREFIX "No DMAR devices found\n");
return -ENODEV;
}
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

2007-06-28 Thread Li, Shaohua


-Original Message-
From: Robert Hancock [mailto:[EMAIL PROTECTED]
Sent: Friday, June 29, 2007 8:59 AM
To: Zan Lynx
Cc: Andrew Morton; linux-kernel@vger.kernel.org; Raj, Ashok; Li,
Shaohua;
Keshavamurthy, Anil S
Subject: Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64

Zan Lynx wrote:
 On Thu, 2007-06-28 at 03:43 -0700, Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-
rc6/2.6.22-rc6-mm1/

 +intel-iommu-dmar-detection-and-parsing-logic.patch
 +intel-iommu-pci-generic-helper-function.patch
 +intel-iommu-pci-generic-helper-function-fix.patch
 +intel-iommu-clflush_cache_range-now-takes-size-param.patch
 +intel-iommu-iova-allocation-and-management-routines.patch
 +intel-iommu-iova-allocation-and-management-routines-fix.patch
 +intel-iommu-iova-allocation-and-management-routines-fix-2.patch
 +intel-iommu-intel-iommu-driver.patch
 +intel-iommu-intel-iommu-driver-fix.patch
 +intel-iommu-intel-iommu-driver-fix-2.patch

+intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
 +intel-iommu-intel-iommu-cmdline-option-forcedac.patch
 +intel-iommu-dmar-fault-handling-support.patch
 +intel-iommu-iommu-gfx-workaround.patch
 +intel-iommu-iommu-floppy-workaround.patch
 +intel-iommu-iommu-floppy-workaround-fix.patch
 +intel-iommu-iommu-floppy-workaround-fix-fix.patch

  Intel IOMMU support

 I believe the above patch set is causing the problem.  On my first
try
 with rc6-mm1 I said Yes to the CONFIG_DMAR options. (I'm nearly as
good
 as random option selection :-)

 The system panicked during boot, I believe it was trying to detect an
 Intel IOMMU.  Later when I have a camera, I will try to post a
 screenshot of the backtrace. (I can't seem to get netconsole to work
on
 boot, only in a module).

 When I recompiled without DMAR set, things seem to be working great.
I
 seem to be getting better disk read throughput than rc3-mm1, by the
way.

 This laptop is an AMD Athlon64 on a NForce3 running a 64-bit Gentoo
 build.

 I'll provide more details on request, and when I get the chance.
This
 is a heads-up on the BUG in case someone has an ah ha! moment.

I took a picture of it, looks like the backtrace is:

NULL pointer dereference at 024
EIP:dmar_table_init+0x11
intel_iommu_init+0x30
pci_iommu_init+0xe
kernel_init+0x16e

Presumably something is NULL in dmar_table_init that wasn't expected to
be.. I would guess it likely crashes on any system without an Intel
IOMMU in it.
How about something like below?


int __init dmar_table_init(void)
{
+   if (!dmar_tbl)
+   return -ENODEV;
parse_dmar_table();
if (list_empty(dmar_drhd_units)) {
printk(KERN_ERR PREFIX No DMAR devices found\n);
return -ENODEV;
}
return 0;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Li, Shaohua
Hi,
>> > If you think it is a linux bug, can you produce small test case
doing
>> > just the sigwait, and post it on l-k with big title "sigwait()
breaks
>> > when straced, and on suspend"?
>> >
>> > That way it is going to get some attetion, and you'll get either
>> > documentation or kernel fixed.
>> Looks like a linux bug to me. The refrigerator fake signal waked the
>> task up and without restart for the sigwait case. How about below
>> patch:
>
>Is there chance to fix strace case, too? sigwait() is broken in more
>than one way it seems...
Not sure about it. strace shows sigwait using sigtimedwait, which
doesn't say it can't return error.

>>  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
>>  1 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
>> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume  2005-08-
>01 14:00:39.089460688 +0800
>> +++ linux-2.6.13-rc4-root/kernel/signal.c2005-08-01
>14:30:13.821660384 +0800
>> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
>>  struct timespec ts;
>>  siginfo_t info;
>>  long timeout = 0;
>> +int recover = 0;
>>
>>  /* XXX: Don't preclude handling different sized sigset_t's.  */
>>  if (sigsetsize != sizeof(sigset_t))
>> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
>>   * be awakened when they arrive.  */
>>  current->real_blocked = current->blocked;
>>  sigandsets(>blocked, >blocked,
);
>> +do_recover:
>>  recalc_sigpending();
>>  spin_unlock_irq(>sighand->siglock);
>>
>>  current->state = TASK_INTERRUPTIBLE;
>>  timeout = schedule_timeout(timeout);
>>
>> -try_to_freeze();
>> +if (try_to_freeze())
>> +recover = 1;
>
>Can't you just goto do_recover here?
Not sure again.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Li, Shaohua
Hi,
  If you think it is a linux bug, can you produce small test case
doing
  just the sigwait, and post it on l-k with big title sigwait()
breaks
  when straced, and on suspend?
 
  That way it is going to get some attetion, and you'll get either
  documentation or kernel fixed.
 Looks like a linux bug to me. The refrigerator fake signal waked the
 task up and without restart for the sigwait case. How about below
 patch:

Is there chance to fix strace case, too? sigwait() is broken in more
than one way it seems...
Not sure about it. strace shows sigwait using sigtimedwait, which
doesn't say it can't return error.

  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
  1 files changed, 10 insertions(+), 1 deletion(-)

 diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
 --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume  2005-08-
01 14:00:39.089460688 +0800
 +++ linux-2.6.13-rc4-root/kernel/signal.c2005-08-01
14:30:13.821660384 +0800
 @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
  struct timespec ts;
  siginfo_t info;
  long timeout = 0;
 +int recover = 0;

  /* XXX: Don't preclude handling different sized sigset_t's.  */
  if (sigsetsize != sizeof(sigset_t))
 @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
   * be awakened when they arrive.  */
  current-real_blocked = current-blocked;
  sigandsets(current-blocked, current-blocked,
these);
 +do_recover:
  recalc_sigpending();
  spin_unlock_irq(current-sighand-siglock);

  current-state = TASK_INTERRUPTIBLE;
  timeout = schedule_timeout(timeout);

 -try_to_freeze();
 +if (try_to_freeze())
 +recover = 1;

Can't you just goto do_recover here?
Not sure again.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6]cpu state clean after hot remove

2005-04-20 Thread Li Shaohua
On Tue, 2005-04-12 at 13:31, Li Shaohua wrote:
> @@ -1052,7 +1086,7 @@ static void __init smp_boot_cpus(unsigne
>   if (max_cpus <= cpucount+1)
>   continue;
>  
> - if (do_boot_cpu(apicid))
> + if ((cpu = alloc_cpu_id() > 0) && do_boot_cpu(apicid, cpu))
>   printk("CPU #%d not responding - cannot use it.\n",
>   apicid);
>   else
Oops, there is a typo in the patch. Andrew, please apply below patch
against above patch. Sorry for the inconvenience.

Thanks,
Shaohua
---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/i386/kernel/smpboot.c~smpboot arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~smpboot 2005-04-21 
11:27:53.913041424 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-21 
11:28:44.103411328 +0800
@@ -1166,7 +1166,7 @@ static void __init smp_boot_cpus(unsigne
if (max_cpus <= cpucount+1)
continue;
 
-   if ((cpu = alloc_cpu_id() > 0) && do_boot_cpu(apicid, cpu))
+   if (((cpu = alloc_cpu_id()) <= 0) || do_boot_cpu(apicid, cpu))
printk("CPU #%d not responding - cannot use it.\n",
apicid);
else
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6]cpu state clean after hot remove

2005-04-20 Thread Li Shaohua
On Tue, 2005-04-12 at 13:31, Li Shaohua wrote:
 @@ -1052,7 +1086,7 @@ static void __init smp_boot_cpus(unsigne
   if (max_cpus = cpucount+1)
   continue;
  
 - if (do_boot_cpu(apicid))
 + if ((cpu = alloc_cpu_id()  0)  do_boot_cpu(apicid, cpu))
   printk(CPU #%d not responding - cannot use it.\n,
   apicid);
   else
Oops, there is a typo in the patch. Andrew, please apply below patch
against above patch. Sorry for the inconvenience.

Thanks,
Shaohua
---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/i386/kernel/smpboot.c~smpboot arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~smpboot 2005-04-21 
11:27:53.913041424 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-21 
11:28:44.103411328 +0800
@@ -1166,7 +1166,7 @@ static void __init smp_boot_cpus(unsigne
if (max_cpus = cpucount+1)
continue;
 
-   if ((cpu = alloc_cpu_id()  0)  do_boot_cpu(apicid, cpu))
+   if (((cpu = alloc_cpu_id()) = 0) || do_boot_cpu(apicid, cpu))
printk(CPU #%d not responding - cannot use it.\n,
apicid);
else
_


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6]suspend/resume SMP support

2005-04-14 Thread Li Shaohua
On Thu, 2005-04-14 at 16:27, Li Shaohua wrote:
> On Wed, 2005-04-13 at 16:32, Pavel Machek wrote:
> > [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25
> >  [] activate_task+0x1/0xa0
> >  [] resched_task+0x68/0x90
> >  [] try_to_wake_up+0x2aa/0x2f0
> >  [] fbcon_cursor+0x19a/0x270
> >  [] hide_cursor+0x18/0x30
> >  [] vt_console_print+0x24f/0x260
> >  [] vt_console_print+0x0/0x260
> >  [] __call_console_drivers+0x57/0x60
> >  [] call_console_drivers+0x80/0x110
> >  [] release_console_sem+0x4e/0xc0
> >  [] vprintk+0x192/0x240
> >  [] preempt_schedule_irq+0x51/0x80
> >  [] acpi_processor_idle+0x0/0x265
> >  [] need_resched+0x1f/0x21
> >  [] acpi_processor_idle+0x0/0x265
> >  [] printk+0x17/0x20
> >  [] cpu_init+0x73/0x360
> >  [] start_secondary+0x6/0x170
> > Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e
> > 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 
> > a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89
> >  <0>Kernel panic - not syncing: Attempted to kill the idle task!
> >  Stuck ??
> > Inquiring remote APIC #0...
> > ... APIC #0 ID: 
> > ... APIC #0 VERSION: 00040011
> > ... APIC #0 SPIV: 00ff
> > [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1#
> Andrew,
> Below patch fixed Pavel's oops. But strange is the 'system_state' check
> is added for CPU hotplug by Rusty. This really makes me confused. Could
> you please look at it.
> This can be reproduced 100% with radeonfb driver load. Attached is the
> dmesg of an oops. It seems the 'objp' parameter for
> 'cache_alloc_debugcheck_after' is invalid.
Looks the per-cpu array_cache isn't initialized. It's initialized in a
cpuhotplug callback. So before the CPU call cpu_up, all kmalloc will
failed. Isn't it?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6]suspend/resume SMP support

2005-04-14 Thread Li Shaohua
On Wed, 2005-04-13 at 16:32, Pavel Machek wrote:
> [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25
>  [] activate_task+0x1/0xa0
>  [] resched_task+0x68/0x90
>  [] try_to_wake_up+0x2aa/0x2f0
>  [] fbcon_cursor+0x19a/0x270
>  [] hide_cursor+0x18/0x30
>  [] vt_console_print+0x24f/0x260
>  [] vt_console_print+0x0/0x260
>  [] __call_console_drivers+0x57/0x60
>  [] call_console_drivers+0x80/0x110
>  [] release_console_sem+0x4e/0xc0
>  [] vprintk+0x192/0x240
>  [] preempt_schedule_irq+0x51/0x80
>  [] acpi_processor_idle+0x0/0x265
>  [] need_resched+0x1f/0x21
>  [] acpi_processor_idle+0x0/0x265
>  [] printk+0x17/0x20
>  [] cpu_init+0x73/0x360
>  [] start_secondary+0x6/0x170
> Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e
> 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 
> a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89
>  <0>Kernel panic - not syncing: Attempted to kill the idle task!
>  Stuck ??
> Inquiring remote APIC #0...
> ... APIC #0 ID: 
> ... APIC #0 VERSION: 00040011
> ... APIC #0 SPIV: 00ff
> [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1#
Andrew,
Below patch fixed Pavel's oops. But strange is the 'system_state' check
is added for CPU hotplug by Rusty. This really makes me confused. Could
you please look at it.
This can be reproduced 100% with radeonfb driver load. Attached is the
dmesg of an oops. It seems the 'objp' parameter for
'cache_alloc_debugcheck_after' is invalid.

Thanks,
Shaohua

--- a/kernel/printk.c   2005-04-12 10:12:19.0 +0800
+++ b/kernel/printk.c   2005-04-13 17:22:40.912897328 +0800
@@ -624,8 +624,7 @@ asmlinkage int vprintk(const char *fmt, 
log_level_unknown = 1;
}
 
-   if (!cpu_online(smp_processor_id()) &&
-   system_state != SYSTEM_RUNNING) {
+   if (!cpu_online(smp_processor_id())) {
/*
 * Some console drivers may assume that per-cpu resources have
 * been allocated.  So don't allow them to be called by this

CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
Booting processor 1/1 eip 3000
Initializing CPU#1
masked ExtINT on CPU#1
Unable to handle kernel paging request at virtual address f000acb2
 printing eip:
c014e4cc
*pde = 
Oops:  [#1]
PREEMPT SMP 
Modules linked in:
CPU:1
EIP:0060:[]Not tainted VLI
EFLAGS: 00010097   (2.6.12-rc2-mm3) 
EIP is at check_poison_obj+0x4c/0x1e0
eax: 006b   ebx: 005a   ecx: dff6e080   edx: dff6e480
esi:    edi: f000acb2   ebp: 0080   esp: c14fdcd4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c14fc000 task=dff42560)
Stack: dff6e480 5a5a5a5a 5a5a5a5a 007f 5a5a5a5a  005a f000acae 
   dff6e480 c021192e c0150031 dff6e480 f000acae 5a5a5a5a 5a5a5a5a dff6e480 
   0046 0020 0010 c015044b dff6e480 0020 f000acae c021192e 
Call Trace:
 [] soft_cursor+0x5e/0x260
 [] cache_alloc_debugcheck_after+0x181/0x1a0
 [] __kmalloc+0x9b/0xd0
 [] soft_cursor+0x5e/0x260
 [] soft_cursor+0x5e/0x260
 [] bit_cursor+0x339/0x540
 [] recalc_task_prio+0x88/0x150
 [] fbcon_cursor+0x1a2/0x270
 [] hide_cursor+0x25/0x40
 [] vt_console_print+0x2aa/0x2b0
 [] __call_console_drivers+0x62/0x70
 [] call_console_drivers+0x96/0x130
 [] release_console_sem+0x51/0xc0
 [] vprintk+0x19f/0x250
 [] __do_softirq+0xd6/0xf0
 [] preempt_schedule_irq+0x4b/0x80
 [] printk+0x17/0x20
 [] setup_local_APIC+0xe2/0x1d0
 [] smp_callin+0x7a/0x120
 [] start_secondary+0xe/0x190
Code: 24 30 89 14 24 01 c7 e8 13 f8 ff ff 39 44 24 14 89 c5 0f 8d b7 00 00 00 
8d 40 ff 89 44 24 0c 3b 74 24 0c b0 6b 0f 84 8c 01 00 00 <38> 04 3e 74 46 8b 44 
24 14 85 c0 0f 84 48 01 00 00 89 3c 24 83 
 <0>Kernel panic - not syncing: Attempted to kill the idle task!
 Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed


Re: [PATCH 6/6]suspend/resume SMP support

2005-04-14 Thread Li Shaohua
On Wed, 2005-04-13 at 16:32, Pavel Machek wrote:
 [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25
  [c011d001] activate_task+0x1/0xa0
  [c011d128] resched_task+0x68/0x90
  [c011d8ba] try_to_wake_up+0x2aa/0x2f0
  [c025ed7a] fbcon_cursor+0x19a/0x270
  [c02c4958] hide_cursor+0x18/0x30
  [c02c758f] vt_console_print+0x24f/0x260
  [c02c7340] vt_console_print+0x0/0x260
  [c01247e7] __call_console_drivers+0x57/0x60
  [c01248e0] call_console_drivers+0x80/0x110
  [c0124d8e] release_console_sem+0x4e/0xc0
  [c0124c12] vprintk+0x192/0x240
  [c0528891] preempt_schedule_irq+0x51/0x80
  [c02adeca] acpi_processor_idle+0x0/0x265
  [c010325e] need_resched+0x1f/0x21
  [c02adeca] acpi_processor_idle+0x0/0x265
  [c0124a77] printk+0x17/0x20
  [c010b583] cpu_init+0x73/0x360
  [c0117bd6] start_secondary+0x6/0x170
 Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e
 00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 f3
 a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89
  0Kernel panic - not syncing: Attempted to kill the idle task!
  Stuck ??
 Inquiring remote APIC #0...
 ... APIC #0 ID: 
 ... APIC #0 VERSION: 00040011
 ... APIC #0 SPIV: 00ff
 [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1#
Andrew,
Below patch fixed Pavel's oops. But strange is the 'system_state' check
is added for CPU hotplug by Rusty. This really makes me confused. Could
you please look at it.
This can be reproduced 100% with radeonfb driver load. Attached is the
dmesg of an oops. It seems the 'objp' parameter for
'cache_alloc_debugcheck_after' is invalid.

Thanks,
Shaohua

--- a/kernel/printk.c   2005-04-12 10:12:19.0 +0800
+++ b/kernel/printk.c   2005-04-13 17:22:40.912897328 +0800
@@ -624,8 +624,7 @@ asmlinkage int vprintk(const char *fmt, 
log_level_unknown = 1;
}
 
-   if (!cpu_online(smp_processor_id()) 
-   system_state != SYSTEM_RUNNING) {
+   if (!cpu_online(smp_processor_id())) {
/*
 * Some console drivers may assume that per-cpu resources have
 * been allocated.  So don't allow them to be called by this

CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
Booting processor 1/1 eip 3000
Initializing CPU#1
masked ExtINT on CPU#1
Unable to handle kernel paging request at virtual address f000acb2
 printing eip:
c014e4cc
*pde = 
Oops:  [#1]
PREEMPT SMP 
Modules linked in:
CPU:1
EIP:0060:[c014e4cc]Not tainted VLI
EFLAGS: 00010097   (2.6.12-rc2-mm3) 
EIP is at check_poison_obj+0x4c/0x1e0
eax: 006b   ebx: 005a   ecx: dff6e080   edx: dff6e480
esi:    edi: f000acb2   ebp: 0080   esp: c14fdcd4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c14fc000 task=dff42560)
Stack: dff6e480 5a5a5a5a 5a5a5a5a 007f 5a5a5a5a  005a f000acae 
   dff6e480 c021192e c0150031 dff6e480 f000acae 5a5a5a5a 5a5a5a5a dff6e480 
   0046 0020 0010 c015044b dff6e480 0020 f000acae c021192e 
Call Trace:
 [c021192e] soft_cursor+0x5e/0x260
 [c0150031] cache_alloc_debugcheck_after+0x181/0x1a0
 [c015044b] __kmalloc+0x9b/0xd0
 [c021192e] soft_cursor+0x5e/0x260
 [c021192e] soft_cursor+0x5e/0x260
 [c020a459] bit_cursor+0x339/0x540
 [c0118998] recalc_task_prio+0x88/0x150
 [c0205c32] fbcon_cursor+0x1a2/0x270
 [c0265475] hide_cursor+0x25/0x40
 [c026843a] vt_console_print+0x2aa/0x2b0
 [c0120d32] __call_console_drivers+0x62/0x70
 [c0120e66] call_console_drivers+0x96/0x130
 [c0121361] release_console_sem+0x51/0xc0
 [c01211df] vprintk+0x19f/0x250
 [c01264b6] __do_softirq+0xd6/0xf0
 [c0438f5b] preempt_schedule_irq+0x4b/0x80
 [c0121037] printk+0x17/0x20
 [c0114132] setup_local_APIC+0xe2/0x1d0
 [c01130ba] smp_callin+0x7a/0x120
 [c011316e] start_secondary+0xe/0x190
Code: 24 30 89 14 24 01 c7 e8 13 f8 ff ff 39 44 24 14 89 c5 0f 8d b7 00 00 00 
8d 40 ff 89 44 24 0c 3b 74 24 0c b0 6b 0f 84 8c 01 00 00 38 04 3e 74 46 8b 44 
24 14 85 c0 0f 84 48 01 00 00 89 3c 24 83 
 0Kernel panic - not syncing: Attempted to kill the idle task!
 Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: failed
... APIC #1 VERSION: failed
... APIC #1 SPIV: failed


Re: [PATCH 6/6]suspend/resume SMP support

2005-04-14 Thread Li Shaohua
On Thu, 2005-04-14 at 16:27, Li Shaohua wrote:
 On Wed, 2005-04-13 at 16:32, Pavel Machek wrote:
  [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1# dmesg | tail -25
   [c011d001] activate_task+0x1/0xa0
   [c011d128] resched_task+0x68/0x90
   [c011d8ba] try_to_wake_up+0x2aa/0x2f0
   [c025ed7a] fbcon_cursor+0x19a/0x270
   [c02c4958] hide_cursor+0x18/0x30
   [c02c758f] vt_console_print+0x24f/0x260
   [c02c7340] vt_console_print+0x0/0x260
   [c01247e7] __call_console_drivers+0x57/0x60
   [c01248e0] call_console_drivers+0x80/0x110
   [c0124d8e] release_console_sem+0x4e/0xc0
   [c0124c12] vprintk+0x192/0x240
   [c0528891] preempt_schedule_irq+0x51/0x80
   [c02adeca] acpi_processor_idle+0x0/0x265
   [c010325e] need_resched+0x1f/0x21
   [c02adeca] acpi_processor_idle+0x0/0x265
   [c0124a77] printk+0x17/0x20
   [c010b583] cpu_init+0x73/0x360
   [c0117bd6] start_secondary+0x6/0x170
  Code: d2 74 bd fc 8b 44 24 28 b9 0e 00 00 00 8b 74 24 14 01 c6 b8 0e
  00 00 00 89 74 24 1c 8b 74 24 30 89 44 24 10 8b 7c 24 1c 83 c6 10 f3
  a5 8b 74 24 24 8b 44 24 1c 89 4c 24 10 01 ee f7 d5 21 ee 89
   0Kernel panic - not syncing: Attempted to kill the idle task!
   Stuck ??
  Inquiring remote APIC #0...
  ... APIC #0 ID: 
  ... APIC #0 VERSION: 00040011
  ... APIC #0 SPIV: 00ff
  [EMAIL PROTECTED]:/sys/devices/system/cpu/cpu1#
 Andrew,
 Below patch fixed Pavel's oops. But strange is the 'system_state' check
 is added for CPU hotplug by Rusty. This really makes me confused. Could
 you please look at it.
 This can be reproduced 100% with radeonfb driver load. Attached is the
 dmesg of an oops. It seems the 'objp' parameter for
 'cache_alloc_debugcheck_after' is invalid.
Looks the per-cpu array_cache isn't initialized. It's initialized in a
cpuhotplug callback. So before the CPU call cpu_up, all kmalloc will
failed. Isn't it?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6]physical CPU hot add

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote:
> On Tue, 12 Apr 2005, Li Shaohua wrote:
> 
> >  #ifdef CONFIG_HOTPLUG_CPU
> > +int __attribute__ ((weak)) smp_prepare_cpu(int cpu)
> > +{
> > +   return 0;
> > +}
> > +
> 
> Any way for you to avoid using weak attribute?
Replace weak attribute with define method as suggested.

Thanks,
Shaohua


---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |  112 ---
 linux-2.6.11-root/drivers/base/cpu.c |7 +
 linux-2.6.11-root/include/asm-i386/smp.h |3 
 3 files changed, 93 insertions(+), 29 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu   2005-04-13 
10:58:37.152081456 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-13 
10:58:37.159080392 +0800
@@ -80,6 +80,12 @@ cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
 static cpumask_t smp_commenced_mask;
 
+/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there
+ * is no way to resync one AP against BP. TBD: for prescott and above, we
+ * should use IA64's algorithm
+ */
+static int __devinitdata tsc_sync_disabled;
+
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
@@ -416,7 +422,7 @@ static void __devinit smp_callin(void)
/*
 *  Synchronize the TSC with the BP
 */
-   if (cpu_has_tsc && cpu_khz)
+   if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled)
synchronize_tsc_ap();
 }
 
@@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void)
return cpu;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS];
+static inline struct task_struct * alloc_idle_task(int cpu)
+{
+   struct task_struct *idle;
+
+   if ((idle = cpu_idle_tasks[cpu]) != NULL) {
+   /* initialize thread_struct.  we really want to avoid destroy
+* idle tread
+*/
+   idle->thread.esp = (unsigned long)(((struct pt_regs *)
+   (THREAD_SIZE + (unsigned long) idle->thread_info)) - 1);
+   init_idle(idle, cpu);
+   return idle;
+   }
+   idle = fork_idle(cpu);
+
+   if (!IS_ERR(idle))
+   cpu_idle_tasks[cpu] = idle;
+   return idle;
+}
+#else
+#define alloc_idle_task(cpu) fork_idle(cpu)
+#endif
+
 static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
@@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
 */
-   idle = fork_idle(cpu);
+   idle = alloc_idle_task(cpu);
if (IS_ERR(idle))
panic("failed fork for CPU %d", cpu);
idle->thread.eip = (unsigned long) start_secondary;
@@ -931,6 +962,55 @@ void cpu_exit_clear(void)
cpu_clear(cpu, smp_commenced_mask);
unmap_cpu_to_logical_apicid(cpu);
 }
+
+struct warm_boot_cpu_info {
+   struct completion *complete;
+   int apicid;
+   int cpu;
+};
+
+static void __devinit do_warm_boot_cpu(void *p)
+{
+   struct warm_boot_cpu_info *info = p;
+   do_boot_cpu(info->apicid, info->cpu);
+   complete(info->complete);
+}
+
+int __devinit smp_prepare_cpu(int cpu)
+{
+   DECLARE_COMPLETION(done);
+   struct warm_boot_cpu_info info;
+   struct work_struct task;
+   int apicid, ret;
+
+   lock_cpu_hotplug();
+   apicid = x86_cpu_to_apicid[cpu];
+   if (apicid == BAD_APICID) {
+   ret = -ENODEV;
+   goto exit;
+   }
+
+   info.complete = 
+   info.apicid = apicid;
+   info.cpu = cpu;
+   INIT_WORK(, do_warm_boot_cpu, );
+
+   tsc_sync_disabled = 1;
+
+   /* init low mem mapping */
+   memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+   sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+   flush_tlb_all();
+   schedule_work();
+   wait_for_completion();
+
+   tsc_sync_disabled = 0;
+   zap_low_mappings();
+   ret = 0;
+exit:
+   unlock_cpu_hotplug();
+   return ret;
+}
 #endif
 
 static void smp_tune_scheduling (void)
@@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-
-/* must be called with the cpucontrol mutex held */
-static int __devinit cpu_enable(unsigned int cpu)
-{
-   /* get the target out of its holding state */
-   per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
-   wmb();
-
-   /* wait for the processor to ack it. timeout? */
-   while (!cpu_online(cpu))
-   cpu_relax();
-
-   fixup_irqs(cpu_online_map);
-   /* counter the disable in fixup_irqs() */
-  

Re: [PATCH 3/6]init call cleanup

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 17:32, Rolf Eike Beer wrote:
> Li Shaohua wrote:
> > Trival patch for CPU hotplug. In CPU identify  part, only did cleaup
> for
> > intel CPUs. Need do for other CPUs if they support S3 SMP.
> >
> > @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void)
> >   apic_write_around(APIC_LVT1, value);
> >  }
> >
> > -void __init setup_local_APIC (void)
> > +void __devinit setup_local_APIC (void)
>   ^
> 
> >  {
> >   unsigned long oldvalue, value, ver, maxlvt;
> >
> 
> Please remove this space while you are at it.
> 
> > @@ -556,7 +556,7 @@ void __init early_cpu_init(void)
> >   * and IDT. We reload them nevertheless, this function acts as a
> >   * 'CPU state barrier', nothing should get across.
> >   */
> > -void __init cpu_init (void)
> > +void __devinit cpu_init (void)
> >  {
> >   int cpu = smp_processor_id();
> >   struct tss_struct * t = _cpu(init_tss, cpu);
> 
> This one too.
Removed the space at two places as suggested.

Thanks,
Shaohua

Trival patch for CPU hotplug. In CPU identify  part, only did cleaup for intel
CPUs. Need do for other CPUs if they support S3 SMP.

---

 linux-2.6.11-root/arch/i386/kernel/apic.c|   14 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/common.c  |   30 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/intel.c   |   12 +++---
 linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c  |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c   |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/process.c |2 -
 linux-2.6.11-root/arch/i386/kernel/setup.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   18 -
 linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 -
 12 files changed, 48 insertions(+), 48 deletions(-)

diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c
--- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup  2005-04-12 
10:37:07.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/apic.c   2005-04-13 10:57:55.817365288 
+0800
@@ -405,7 +405,7 @@ void __init init_bsp_APIC(void)
apic_write_around(APIC_LVT1, value);
 }
 
-void __init setup_local_APIC (void)
+void __devinit setup_local_APIC(void)
 {
unsigned long oldvalue, value, ver, maxlvt;
 
@@ -676,7 +676,7 @@ static struct sys_device device_lapic = 
.cls= _sysclass,
 };
 
-static void __init apic_pm_activate(void)
+static void __devinit apic_pm_activate(void)
 {
apic_pm_state.active = 1;
 }
@@ -877,7 +877,7 @@ fake_ioapic_page:
  * but we do not accept timer interrupts yet. We only allow the BP
  * to calibrate.
  */
-static unsigned int __init get_8254_timer_count(void)
+static unsigned int __devinit get_8254_timer_count(void)
 {
extern spinlock_t i8253_lock;
unsigned long flags;
@@ -896,7 +896,7 @@ static unsigned int __init get_8254_time
 }
 
 /* next tick in 8254 can be caught by catching timer wraparound */
-static void __init wait_8254_wraparound(void)
+static void __devinit wait_8254_wraparound(void)
 {
unsigned int curr_count, prev_count;
 
@@ -916,7 +916,7 @@ static void __init wait_8254_wraparound(
  * Default initialization for 8254 timers. If we use other timers like HPET,
  * we override this later
  */
-void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound;
+void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound;
 
 /*
  * This function sets up the local APIC timer, with a timeout of
@@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i
apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
 }
 
-static void __init setup_APIC_timer(unsigned int clocks)
+static void __devinit setup_APIC_timer(unsigned int clocks)
 {
unsigned long flags;
 
@@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void)
local_irq_enable();
 }
 
-void __init setup_secondary_APIC_clock(void)
+void __devinit setup_secondary_APIC_clock(void)
 {
setup_APIC_timer(calibration_result);
 }
diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup
2005-04-12 10:37:07.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-13 
10:58:25.777810608 +0800
@@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
 DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
 
-static int cachesize_override __initdata = -1;
-static int disable_x86_fxsr __initdata = 0;
-static

RE: [PATCH 1/6]sep initializing rework

2005-04-12 Thread Li Shaohua
On Wed, 2005-04-13 at 01:57, Protasevich, Natalie wrote:
> Hello,
> This is a hotplug CPU patch for i386, done against 2.6.12-rc2-mm3.
> Somewhat alternative to the one posted by Li Shaohua, but not really
> (and I didn't mean that :). If you look closer, our patches are
> different and can complement each other I think. Li did great job on
> sep, after-offline cleanup, __devinit etc., and I have some radical
> changes in the AP bringup mechanism. I left alone __init to __devinit
> part (I was going through it lately, but I think even though I had few
> more than Li did, he covered it sufficiently perhaps). I started
> having
> doubts in free_initmem() vs __devinit because look how many of
> __init's
> left! just a few :). 
Looks quite smart, but people will argue it will keep all __init
sections in this way. I'd like we keep the default behavior of __init. 

> I got rid of do_boot_cpu loop in smpboot.c because
> the loop
> static void __init smp_init(void)
> {
> unsigned int i;
> 
> /* FIXME: This should be done in userspace --RR */
> for_each_present_cpu(i) {
> if (num_online_cpus() >= max_cpus)
> break;
> if (!cpu_online(i))
> cpu_up(i);
> }
> ...
> does it again so why leave it in smpboot.c to boot AP's twice. 
This is what IA64 does. In this way, you must clean up the bogomips
message, TSC synchronization. And CPU_UP could be called in user
context, so fork_idle possibly should be in workqueue. And please make
sure it doesn't break other things like check_nmi_watchdog. I just
select an easy way (add smp_prepare_cpu) and it doesn't break anything. 

> I also
> found that my system fails sooner or later when I try not to synch
> runtime booted processor with others, so I changed tsc synchronization
> to only sync between booting CPU and the one that boots it. 
IA64 also does like this. It synchronizes one AP's ITC against BP's one
time. But in IA32, TSC's upper 32 bits can be written only on prescott
and above. In earlier CPU, upper 32 bits will become 0 after any write.

> The patch
> works for me on Intel 8x generic box, and on ES7000. I was asked to
> separate my patch into smaller ones by the theme, but I'm posting the
> entire patch for now, because I think it is probably not the final
> one.
> I think (I hope) I will sync up with Li later on.
> My idea was that if we find a CPU core in ACPI (enabled or disabled),
> we
> encounter for it in sibling map and create a sysfs node accordingly,
> and
> cpu_possible_map will reflect that. We take processors up/down
> depending
> on physical presence using the existing node. That's the scenario
> implemented on ES7000 that reports all possible cores in ACPI marking
> absent processors as disabled. Runtime enablement/disablement depends
> on
> sysfs only and the driving agent can be anything (ACPI or user) that
> triggers sysfs node for this processor.
You possibly can refer to IA64's implementation. The goal of my patches
are to support suspend/resume, which actually doesn't really hotremove a
CPU, so I just ignored the sysfs/ACPI issues.

Thanks,
Shaohua

> 
> -Original Message-
> From: Zwane Mwaikambo [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, April 12, 2005 6:08 AM
> To: Li Shaohua
> Cc: lkml; ACPI-DEV; Len Brown; Pavel Machek; Andrew Morton;
> Protasevich,
> Natalie; Ryan Harper
> Subject: Re: [PATCH 1/6]sep initializing rework
> 
> Hello Shaohua,
> 
> On Tue, 12 Apr 2005, Li Shaohua wrote:
> 
> > These patches (together with 5 patches followed this one) are
> updated 
> > suspend/resume SMP patches. The patches fixed some bugs and do clean
> > up as suggested. Now they work for both suspend-to-ram and
> suspend-to-disk.
> > Patches are against 2.6.12-rc2-mm3.
> 
> These patches look good and i think we should go ahead with them. I've
> also cross checked with physical hotplug cpu patches for ES7xxx from
> Natalie (added to Cc) and it does indeed look like a lot of the code
> will work for her too, but i'd appreciate it if she also does a double
> check. 
> Obviously this won't work for other upcoming users of hotplug cpu like
> Xen (Ryan added to Cc) but i think we can abstract things later on to
> cover other special users.
> 
> Thanks Shaohua,
> Zwane
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6]physical CPU hot add

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote:
> On Tue, 12 Apr 2005, Li Shaohua wrote:
> 
> >  #ifdef CONFIG_HOTPLUG_CPU
> > +int __attribute__ ((weak)) smp_prepare_cpu(int cpu)
> > +{
> > +   return 0;
> > +}
> > +
> 
> Any way for you to avoid using weak attribute?
Just want to avoid more 'ifdef' or 'define empty routine for other
archs' staffs. Someone prefer 'weak' attribute. Either way is ok to me,
but if you think the former is better, I'd change it.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6]suspend/resume SMP support

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 18:51, Pavel Machek wrote:
> > Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
> > disable/enable_nonboot_cpus API. The S4 part is based on Pavel's
> > original S4 SMP patch.
> 
> I tested it on 2x PII(?) 550MHz system. Suspend went ok, resume loaded
> image from disk, but then I got
> 
> Thawing cpus 
> Booting processor 1/0 eip 3000
> 
> ...and very funny effect on keyboard leds. They started to blink
> (panic-like), but with very wrong frequency. It looked like 2 cpus
> doing panic blinks at once...
Check if /sys/device/system/cpu/cpu1/online attribute works. If it
works, then it's other issue. I only tested the patches in two HT based
systems.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6]init call cleanup

2005-04-12 Thread Li Shaohua

Trival patch for CPU hotplug. In CPU identify  part, only did cleaup for
intel CPUs. Need do for other CPUs if they support S3 SMP.

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>
---

 linux-2.6.11-root/arch/i386/kernel/apic.c|   14 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/common.c  |   30 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/intel.c   |   12 +++---
 linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c  |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c   |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/process.c |2 -
 linux-2.6.11-root/arch/i386/kernel/setup.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   18 -
 linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 -
 12 files changed, 48 insertions(+), 48 deletions(-)

diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c
--- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup  2005-04-12 
10:37:07.216977888 +0800
+++ linux-2.6.11-root/arch/i386/kernel/apic.c   2005-04-12 10:37:07.243973784 
+0800
@@ -405,7 +405,7 @@ void __init init_bsp_APIC(void)
apic_write_around(APIC_LVT1, value);
 }
 
-void __init setup_local_APIC (void)
+void __devinit setup_local_APIC (void)
 {
unsigned long oldvalue, value, ver, maxlvt;
 
@@ -676,7 +676,7 @@ static struct sys_device device_lapic = 
.cls= _sysclass,
 };
 
-static void __init apic_pm_activate(void)
+static void __devinit apic_pm_activate(void)
 {
apic_pm_state.active = 1;
 }
@@ -877,7 +877,7 @@ fake_ioapic_page:
  * but we do not accept timer interrupts yet. We only allow the BP
  * to calibrate.
  */
-static unsigned int __init get_8254_timer_count(void)
+static unsigned int __devinit get_8254_timer_count(void)
 {
extern spinlock_t i8253_lock;
unsigned long flags;
@@ -896,7 +896,7 @@ static unsigned int __init get_8254_time
 }
 
 /* next tick in 8254 can be caught by catching timer wraparound */
-static void __init wait_8254_wraparound(void)
+static void __devinit wait_8254_wraparound(void)
 {
unsigned int curr_count, prev_count;
 
@@ -916,7 +916,7 @@ static void __init wait_8254_wraparound(
  * Default initialization for 8254 timers. If we use other timers like HPET,
  * we override this later
  */
-void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound;
+void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound;
 
 /*
  * This function sets up the local APIC timer, with a timeout of
@@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i
apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
 }
 
-static void __init setup_APIC_timer(unsigned int clocks)
+static void __devinit setup_APIC_timer(unsigned int clocks)
 {
unsigned long flags;
 
@@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void)
local_irq_enable();
 }
 
-void __init setup_secondary_APIC_clock(void)
+void __devinit setup_secondary_APIC_clock(void)
 {
setup_APIC_timer(calibration_result);
 }
diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup
2005-04-12 10:37:07.218977584 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 
10:37:07.244973632 +0800
@@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
 DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
 
-static int cachesize_override __initdata = -1;
-static int disable_x86_fxsr __initdata = 0;
-static int disable_x86_serial_nr __initdata = 1;
+static int cachesize_override __devinitdata = -1;
+static int disable_x86_fxsr __devinitdata = 0;
+static int disable_x86_serial_nr __devinitdata = 1;
 
 struct cpu_dev * cpu_devs[X86_VENDOR_NUM] = {};
 
@@ -59,7 +59,7 @@ static int __init cachesize_setup(char *
 }
 __setup("cachesize=", cachesize_setup);
 
-int __init get_model_name(struct cpuinfo_x86 *c)
+int __devinit get_model_name(struct cpuinfo_x86 *c)
 {
unsigned int *v;
char *p, *q;
@@ -89,7 +89,7 @@ int __init get_model_name(struct cpuinfo
 }
 
 
-void __init display_cacheinfo(struct cpuinfo_x86 *c)
+void __devinit display_cacheinfo(struct cpuinfo_x86 *c)
 {
unsigned int n, dummy, ecx, edx, l2size;
 
@@ -130,7 +130,7 @@ void __init display_cacheinfo(struct cpu
 /* in particular, if CPUID levels 0x8002..4 are supported, this isn't used 
*/
 
 /* Look up CPU names by table lookup. */
-static char __init *table_lookup_model(struct cpuinfo_x86 *c)
+static char __devinit *table_lookup_model(struct cpuinfo_x86 *c)
 {
struct cpu_model_info *info;
 
@@ -151,7 +151,7 @@ st

[PATCH 1/6]sep initializing rework

2005-04-12 Thread Li Shaohua
Hi,
These patches (together with 5 patches followed this one) are updated
suspend/resume SMP patches. The patches fixed some bugs and do clean up
as suggested. Now they work for both suspend-to-ram and suspend-to-disk.
Patches are against 2.6.12-rc2-mm3.

Thanks,
Shaohua

---
Make SEP init per-cpu, so it is hotplug safed.

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
 linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   12 +++-
 linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |4 
 linux-2.6.11-root/arch/i386/power/cpu.c|4 +---
 linux-2.6.11-root/include/asm-i386/smp.h   |3 +++
 5 files changed, 21 insertions(+), 8 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-04-12 
10:36:00.164171464 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
10:36:00.174169944 +0800
@@ -443,6 +443,9 @@ static void __init start_secondary(void 
 * the local TLBs too.
 */
local_flush_tlb();
+
+   /* Note: this must be done before __cpu_up finish */
+   enable_sep_cpu();
cpu_set(smp_processor_id(), cpu_online_map);
 
/* We can take interrupts now: we're officially "up". */
@@ -920,6 +923,9 @@ static void __init smp_boot_cpus(unsigne
cpus_clear(cpu_core_map[0]);
cpu_set(0, cpu_core_map[0]);
 
+   sysenter_setup();
+   enable_sep_cpu();
+
/*
 * If we couldn't find an SMP configuration at boot time,
 * get out of here now!
diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
arch/i386/kernel/sysenter.c
--- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   2005-04-12 
10:36:00.165171312 +0800
+++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-04-12 
10:36:00.174169944 +0800
@@ -21,11 +21,16 @@
 
 extern asmlinkage void sysenter_entry(void);
 
-void enable_sep_cpu(void *info)
+void enable_sep_cpu(void)
 {
int cpu = get_cpu();
struct tss_struct *tss = _cpu(init_tss, cpu);
 
+   if (!boot_cpu_has(X86_FEATURE_SEP)) {
+   put_cpu();
+   return;
+   }
+
tss->ss1 = __KERNEL_CS;
tss->esp1 = sizeof(struct tss_struct) + (unsigned long) tss;
wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
@@ -41,7 +46,7 @@ void enable_sep_cpu(void *info)
 extern const char vsyscall_int80_start, vsyscall_int80_end;
 extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
 
-static int __init sysenter_setup(void)
+int __init sysenter_setup(void)
 {
void *page = (void *)get_zeroed_page(GFP_ATOMIC);
 
@@ -58,8 +63,5 @@ static int __init sysenter_setup(void)
   _sysenter_start,
   _sysenter_end - _sysenter_start);
 
-   on_each_cpu(enable_sep_cpu, NULL, 1, 1);
return 0;
 }
-
-__initcall(sysenter_setup);
diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 
arch/i386/mach-voyager/voyager_smp.c
--- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup  
2005-04-12 10:36:00.167171008 +0800
+++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c  2005-04-12 
10:36:00.175169792 +0800
@@ -499,6 +499,7 @@ start_secondary(void *unused)
while (!cpu_isset(cpuid, smp_commenced_mask))
rep_nop();
local_irq_enable();
+   enable_sep_cpu();
 
local_flush_tlb();
 
@@ -696,6 +697,9 @@ smp_boot_cpus(void)
printk("CPU%d: ", boot_cpu_id);
print_cpu_info(_data[boot_cpu_id]);
 
+   sysenter_setup();
+   enable_sep_cpu();
+
if(is_cpu_quad()) {
/* booting on a Quad CPU */
printk("VOYAGER SMP: Boot CPU is Quad\n");
diff -puN arch/i386/power/cpu.c~sep_init_cleanup arch/i386/power/cpu.c
--- linux-2.6.11/arch/i386/power/cpu.c~sep_init_cleanup 2005-04-12 
10:36:00.168170856 +0800
+++ linux-2.6.11-root/arch/i386/power/cpu.c 2005-04-12 10:36:00.175169792 
+0800
@@ -33,8 +33,6 @@ unsigned long saved_context_esp, saved_c
 unsigned long saved_context_esi, saved_context_edi;
 unsigned long saved_context_eflags;
 
-extern void enable_sep_cpu(void *);
-
 void __save_processor_state(struct saved_context *ctxt)
 {
kernel_fpu_begin();
@@ -136,7 +134,7 @@ void __restore_processor_state(struct sa
 * sysenter MSRs
 */
if (boot_cpu_has(X86_FEATURE_SEP))
-   enable_sep_cpu(NULL);
+   enable_sep_cpu();
 
fix_processor_context();
do_fpu_end();
diff -puN include/asm-i386/smp.h~sep_init_cleanup include/asm-i386/smp.h
--- linux-2.6.11/include/asm-i386/smp.h~sep_init_cleanup2005-04-12 
10:36:00.170170552 +0800
+++ linux-2.6.11-root/include/asm-i386/smp.h2005-04-12 10:36:00.176169640 
+0800
@@ -37,6 +37,9 @@ extern int smp_num_siblings;
 e

[PATCH 2/6]sibling map initializing rework

2005-04-12 Thread Li Shaohua
Make sibling map init per-cpu. Hotplug CPU may change the map at
runtime.

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   86 ++-
 1 files changed, 45 insertions(+), 41 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 
arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup
2005-04-12 10:36:34.283984464 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
10:36:34.287983856 +0800
@@ -63,11 +63,16 @@ static int __initdata smp_b_stepping;
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
-int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */
+/* Package ID of each logical CPU */
+int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID};
 EXPORT_SYMBOL(phys_proc_id);
-int cpu_core_id[NR_CPUS]; /* Core ID of each logical CPU */
+/* Core ID of each logical CPU */
+int cpu_core_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID};
 EXPORT_SYMBOL(cpu_core_id);
 
+cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
+cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned;
+
 /* bitmap of online cpus */
 cpumask_t cpu_online_map __cacheline_aligned;
 
@@ -417,6 +422,38 @@ static void __init smp_callin(void)
 
 static int cpucount;
 
+static inline void
+set_cpu_sibling_map(int cpu)
+{
+   int i;
+
+   if (smp_num_siblings > 1) {
+   for (i = 0; i < NR_CPUS; i++) {
+   if (!cpu_isset(i, cpu_callout_map))
+   continue;
+   if (cpu_core_id[cpu] == cpu_core_id[i]) {
+   cpu_set(i, cpu_sibling_map[cpu]);
+   cpu_set(cpu, cpu_sibling_map[i]);
+   }
+   }
+   } else {
+   cpu_set(cpu, cpu_sibling_map[cpu]);
+   }
+
+   if (current_cpu_data.x86_num_cores > 1) {
+   for (i = 0; i < NR_CPUS; i++) {
+   if (!cpu_isset(i, cpu_callout_map))
+   continue;
+   if (phys_proc_id[cpu] == phys_proc_id[i]) {
+   cpu_set(i, cpu_core_map[cpu]);
+   cpu_set(cpu, cpu_core_map[i]);
+   }
+   }
+   } else {
+   cpu_core_map[cpu] = cpu_sibling_map[cpu];
+   }
+}
+
 /*
  * Activate a secondary processor.
  */
@@ -444,6 +481,10 @@ static void __init start_secondary(void 
 */
local_flush_tlb();
 
+   /* This must be done before setting cpu_online_map */
+   set_cpu_sibling_map(_smp_processor_id());
+   wmb();
+
/* Note: this must be done before __cpu_up finish */
enable_sep_cpu();
cpu_set(smp_processor_id(), cpu_online_map);
@@ -896,8 +937,6 @@ static int boot_cpu_logical_apicid;
 /* Where the IO area was mapped on multiquad, always 0 otherwise */
 void *xquad_portio;
 
-cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
-cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned;
 
 static void __init smp_boot_cpus(unsigned int max_cpus)
 {
@@ -1064,43 +1103,8 @@ static void __init smp_boot_cpus(unsigne
cpus_clear(cpu_sibling_map[cpu]);
cpus_clear(cpu_core_map[cpu]);
}
-
-   for (cpu = 0; cpu < NR_CPUS; cpu++) {
-   struct cpuinfo_x86 *c = cpu_data + cpu;
-   int siblings = 0;
-   int i;
-   if (!cpu_isset(cpu, cpu_callout_map))
-   continue;
-
-   if (smp_num_siblings > 1) {
-   for (i = 0; i < NR_CPUS; i++) {
-   if (!cpu_isset(i, cpu_callout_map))
-   continue;
-   if (cpu_core_id[cpu] == cpu_core_id[i]) {
-   siblings++;
-   cpu_set(i, cpu_sibling_map[cpu]);
-   }
-   }
-   } else {
-   siblings++;
-   cpu_set(cpu, cpu_sibling_map[cpu]);
-   }
-
-   if (siblings != smp_num_siblings)
-   printk(KERN_WARNING "WARNING: %d siblings found for 
CPU%d, should be %d\n", siblings, cpu, smp_num_siblings);
-
-   if (c->x86_num_cores > 1) {
-   for (i = 0; i < NR_CPUS; i++) {
-   if (!cpu_isset(i, cpu_callout_map))
-   continue;
-   if (phys_proc_id[cpu] == phys_proc_id[i]) {
-   cpu_set(i, cpu_core_map[cpu]);
-   }
-   }
-   } else {
-   cpu_core_map[cpu] = cpu_sibling_map[cpu];
-   }

[PATCH 2/6]sibling map initializing rework

2005-04-12 Thread Li Shaohua
Make sibling map init per-cpu. Hotplug CPU may change the map at
runtime.

Signed-off-by: Li Shaohua[EMAIL PROTECTED]

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   86 ++-
 1 files changed, 45 insertions(+), 41 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 
arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup
2005-04-12 10:36:34.283984464 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
10:36:34.287983856 +0800
@@ -63,11 +63,16 @@ static int __initdata smp_b_stepping;
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
-int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */
+/* Package ID of each logical CPU */
+int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID};
 EXPORT_SYMBOL(phys_proc_id);
-int cpu_core_id[NR_CPUS]; /* Core ID of each logical CPU */
+/* Core ID of each logical CPU */
+int cpu_core_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID};
 EXPORT_SYMBOL(cpu_core_id);
 
+cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
+cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned;
+
 /* bitmap of online cpus */
 cpumask_t cpu_online_map __cacheline_aligned;
 
@@ -417,6 +422,38 @@ static void __init smp_callin(void)
 
 static int cpucount;
 
+static inline void
+set_cpu_sibling_map(int cpu)
+{
+   int i;
+
+   if (smp_num_siblings  1) {
+   for (i = 0; i  NR_CPUS; i++) {
+   if (!cpu_isset(i, cpu_callout_map))
+   continue;
+   if (cpu_core_id[cpu] == cpu_core_id[i]) {
+   cpu_set(i, cpu_sibling_map[cpu]);
+   cpu_set(cpu, cpu_sibling_map[i]);
+   }
+   }
+   } else {
+   cpu_set(cpu, cpu_sibling_map[cpu]);
+   }
+
+   if (current_cpu_data.x86_num_cores  1) {
+   for (i = 0; i  NR_CPUS; i++) {
+   if (!cpu_isset(i, cpu_callout_map))
+   continue;
+   if (phys_proc_id[cpu] == phys_proc_id[i]) {
+   cpu_set(i, cpu_core_map[cpu]);
+   cpu_set(cpu, cpu_core_map[i]);
+   }
+   }
+   } else {
+   cpu_core_map[cpu] = cpu_sibling_map[cpu];
+   }
+}
+
 /*
  * Activate a secondary processor.
  */
@@ -444,6 +481,10 @@ static void __init start_secondary(void 
 */
local_flush_tlb();
 
+   /* This must be done before setting cpu_online_map */
+   set_cpu_sibling_map(_smp_processor_id());
+   wmb();
+
/* Note: this must be done before __cpu_up finish */
enable_sep_cpu();
cpu_set(smp_processor_id(), cpu_online_map);
@@ -896,8 +937,6 @@ static int boot_cpu_logical_apicid;
 /* Where the IO area was mapped on multiquad, always 0 otherwise */
 void *xquad_portio;
 
-cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
-cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned;
 
 static void __init smp_boot_cpus(unsigned int max_cpus)
 {
@@ -1064,43 +1103,8 @@ static void __init smp_boot_cpus(unsigne
cpus_clear(cpu_sibling_map[cpu]);
cpus_clear(cpu_core_map[cpu]);
}
-
-   for (cpu = 0; cpu  NR_CPUS; cpu++) {
-   struct cpuinfo_x86 *c = cpu_data + cpu;
-   int siblings = 0;
-   int i;
-   if (!cpu_isset(cpu, cpu_callout_map))
-   continue;
-
-   if (smp_num_siblings  1) {
-   for (i = 0; i  NR_CPUS; i++) {
-   if (!cpu_isset(i, cpu_callout_map))
-   continue;
-   if (cpu_core_id[cpu] == cpu_core_id[i]) {
-   siblings++;
-   cpu_set(i, cpu_sibling_map[cpu]);
-   }
-   }
-   } else {
-   siblings++;
-   cpu_set(cpu, cpu_sibling_map[cpu]);
-   }
-
-   if (siblings != smp_num_siblings)
-   printk(KERN_WARNING WARNING: %d siblings found for 
CPU%d, should be %d\n, siblings, cpu, smp_num_siblings);
-
-   if (c-x86_num_cores  1) {
-   for (i = 0; i  NR_CPUS; i++) {
-   if (!cpu_isset(i, cpu_callout_map))
-   continue;
-   if (phys_proc_id[cpu] == phys_proc_id[i]) {
-   cpu_set(i, cpu_core_map[cpu]);
-   }
-   }
-   } else {
-   cpu_core_map[cpu] = cpu_sibling_map[cpu];
-   }
-   }
+   cpu_set(0, cpu_sibling_map[0]);
+   cpu_set

[PATCH 1/6]sep initializing rework

2005-04-12 Thread Li Shaohua
Hi,
These patches (together with 5 patches followed this one) are updated
suspend/resume SMP patches. The patches fixed some bugs and do clean up
as suggested. Now they work for both suspend-to-ram and suspend-to-disk.
Patches are against 2.6.12-rc2-mm3.

Thanks,
Shaohua

---
Make SEP init per-cpu, so it is hotplug safed.

Signed-off-by: Li Shaohua[EMAIL PROTECTED]

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
 linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   12 +++-
 linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |4 
 linux-2.6.11-root/arch/i386/power/cpu.c|4 +---
 linux-2.6.11-root/include/asm-i386/smp.h   |3 +++
 5 files changed, 21 insertions(+), 8 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-04-12 
10:36:00.164171464 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
10:36:00.174169944 +0800
@@ -443,6 +443,9 @@ static void __init start_secondary(void 
 * the local TLBs too.
 */
local_flush_tlb();
+
+   /* Note: this must be done before __cpu_up finish */
+   enable_sep_cpu();
cpu_set(smp_processor_id(), cpu_online_map);
 
/* We can take interrupts now: we're officially up. */
@@ -920,6 +923,9 @@ static void __init smp_boot_cpus(unsigne
cpus_clear(cpu_core_map[0]);
cpu_set(0, cpu_core_map[0]);
 
+   sysenter_setup();
+   enable_sep_cpu();
+
/*
 * If we couldn't find an SMP configuration at boot time,
 * get out of here now!
diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
arch/i386/kernel/sysenter.c
--- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   2005-04-12 
10:36:00.165171312 +0800
+++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-04-12 
10:36:00.174169944 +0800
@@ -21,11 +21,16 @@
 
 extern asmlinkage void sysenter_entry(void);
 
-void enable_sep_cpu(void *info)
+void enable_sep_cpu(void)
 {
int cpu = get_cpu();
struct tss_struct *tss = per_cpu(init_tss, cpu);
 
+   if (!boot_cpu_has(X86_FEATURE_SEP)) {
+   put_cpu();
+   return;
+   }
+
tss-ss1 = __KERNEL_CS;
tss-esp1 = sizeof(struct tss_struct) + (unsigned long) tss;
wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
@@ -41,7 +46,7 @@ void enable_sep_cpu(void *info)
 extern const char vsyscall_int80_start, vsyscall_int80_end;
 extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
 
-static int __init sysenter_setup(void)
+int __init sysenter_setup(void)
 {
void *page = (void *)get_zeroed_page(GFP_ATOMIC);
 
@@ -58,8 +63,5 @@ static int __init sysenter_setup(void)
   vsyscall_sysenter_start,
   vsyscall_sysenter_end - vsyscall_sysenter_start);
 
-   on_each_cpu(enable_sep_cpu, NULL, 1, 1);
return 0;
 }
-
-__initcall(sysenter_setup);
diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 
arch/i386/mach-voyager/voyager_smp.c
--- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup  
2005-04-12 10:36:00.167171008 +0800
+++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c  2005-04-12 
10:36:00.175169792 +0800
@@ -499,6 +499,7 @@ start_secondary(void *unused)
while (!cpu_isset(cpuid, smp_commenced_mask))
rep_nop();
local_irq_enable();
+   enable_sep_cpu();
 
local_flush_tlb();
 
@@ -696,6 +697,9 @@ smp_boot_cpus(void)
printk(CPU%d: , boot_cpu_id);
print_cpu_info(cpu_data[boot_cpu_id]);
 
+   sysenter_setup();
+   enable_sep_cpu();
+
if(is_cpu_quad()) {
/* booting on a Quad CPU */
printk(VOYAGER SMP: Boot CPU is Quad\n);
diff -puN arch/i386/power/cpu.c~sep_init_cleanup arch/i386/power/cpu.c
--- linux-2.6.11/arch/i386/power/cpu.c~sep_init_cleanup 2005-04-12 
10:36:00.168170856 +0800
+++ linux-2.6.11-root/arch/i386/power/cpu.c 2005-04-12 10:36:00.175169792 
+0800
@@ -33,8 +33,6 @@ unsigned long saved_context_esp, saved_c
 unsigned long saved_context_esi, saved_context_edi;
 unsigned long saved_context_eflags;
 
-extern void enable_sep_cpu(void *);
-
 void __save_processor_state(struct saved_context *ctxt)
 {
kernel_fpu_begin();
@@ -136,7 +134,7 @@ void __restore_processor_state(struct sa
 * sysenter MSRs
 */
if (boot_cpu_has(X86_FEATURE_SEP))
-   enable_sep_cpu(NULL);
+   enable_sep_cpu();
 
fix_processor_context();
do_fpu_end();
diff -puN include/asm-i386/smp.h~sep_init_cleanup include/asm-i386/smp.h
--- linux-2.6.11/include/asm-i386/smp.h~sep_init_cleanup2005-04-12 
10:36:00.170170552 +0800
+++ linux-2.6.11-root/include/asm-i386/smp.h2005-04-12 10:36:00.176169640 
+0800
@@ -37,6 +37,9 @@ extern int smp_num_siblings;
 extern cpumask_t

[PATCH 3/6]init call cleanup

2005-04-12 Thread Li Shaohua

Trival patch for CPU hotplug. In CPU identify  part, only did cleaup for
intel CPUs. Need do for other CPUs if they support S3 SMP.

Signed-off-by: Li Shaohua[EMAIL PROTECTED]
---

 linux-2.6.11-root/arch/i386/kernel/apic.c|   14 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/common.c  |   30 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/intel.c   |   12 +++---
 linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c  |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c   |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/process.c |2 -
 linux-2.6.11-root/arch/i386/kernel/setup.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   18 -
 linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 -
 12 files changed, 48 insertions(+), 48 deletions(-)

diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c
--- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup  2005-04-12 
10:37:07.216977888 +0800
+++ linux-2.6.11-root/arch/i386/kernel/apic.c   2005-04-12 10:37:07.243973784 
+0800
@@ -405,7 +405,7 @@ void __init init_bsp_APIC(void)
apic_write_around(APIC_LVT1, value);
 }
 
-void __init setup_local_APIC (void)
+void __devinit setup_local_APIC (void)
 {
unsigned long oldvalue, value, ver, maxlvt;
 
@@ -676,7 +676,7 @@ static struct sys_device device_lapic = 
.cls= lapic_sysclass,
 };
 
-static void __init apic_pm_activate(void)
+static void __devinit apic_pm_activate(void)
 {
apic_pm_state.active = 1;
 }
@@ -877,7 +877,7 @@ fake_ioapic_page:
  * but we do not accept timer interrupts yet. We only allow the BP
  * to calibrate.
  */
-static unsigned int __init get_8254_timer_count(void)
+static unsigned int __devinit get_8254_timer_count(void)
 {
extern spinlock_t i8253_lock;
unsigned long flags;
@@ -896,7 +896,7 @@ static unsigned int __init get_8254_time
 }
 
 /* next tick in 8254 can be caught by catching timer wraparound */
-static void __init wait_8254_wraparound(void)
+static void __devinit wait_8254_wraparound(void)
 {
unsigned int curr_count, prev_count;
 
@@ -916,7 +916,7 @@ static void __init wait_8254_wraparound(
  * Default initialization for 8254 timers. If we use other timers like HPET,
  * we override this later
  */
-void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound;
+void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound;
 
 /*
  * This function sets up the local APIC timer, with a timeout of
@@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i
apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
 }
 
-static void __init setup_APIC_timer(unsigned int clocks)
+static void __devinit setup_APIC_timer(unsigned int clocks)
 {
unsigned long flags;
 
@@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void)
local_irq_enable();
 }
 
-void __init setup_secondary_APIC_clock(void)
+void __devinit setup_secondary_APIC_clock(void)
 {
setup_APIC_timer(calibration_result);
 }
diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup
2005-04-12 10:37:07.218977584 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 
10:37:07.244973632 +0800
@@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
 DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
 
-static int cachesize_override __initdata = -1;
-static int disable_x86_fxsr __initdata = 0;
-static int disable_x86_serial_nr __initdata = 1;
+static int cachesize_override __devinitdata = -1;
+static int disable_x86_fxsr __devinitdata = 0;
+static int disable_x86_serial_nr __devinitdata = 1;
 
 struct cpu_dev * cpu_devs[X86_VENDOR_NUM] = {};
 
@@ -59,7 +59,7 @@ static int __init cachesize_setup(char *
 }
 __setup(cachesize=, cachesize_setup);
 
-int __init get_model_name(struct cpuinfo_x86 *c)
+int __devinit get_model_name(struct cpuinfo_x86 *c)
 {
unsigned int *v;
char *p, *q;
@@ -89,7 +89,7 @@ int __init get_model_name(struct cpuinfo
 }
 
 
-void __init display_cacheinfo(struct cpuinfo_x86 *c)
+void __devinit display_cacheinfo(struct cpuinfo_x86 *c)
 {
unsigned int n, dummy, ecx, edx, l2size;
 
@@ -130,7 +130,7 @@ void __init display_cacheinfo(struct cpu
 /* in particular, if CPUID levels 0x8002..4 are supported, this isn't used 
*/
 
 /* Look up CPU names by table lookup. */
-static char __init *table_lookup_model(struct cpuinfo_x86 *c)
+static char __devinit *table_lookup_model(struct cpuinfo_x86 *c)
 {
struct cpu_model_info *info;
 
@@ -151,7 +151,7 @@ static char __init

Re: [PATCH 6/6]suspend/resume SMP support

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 18:51, Pavel Machek wrote:
  Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
  disable/enable_nonboot_cpus API. The S4 part is based on Pavel's
  original S4 SMP patch.
 
 I tested it on 2x PII(?) 550MHz system. Suspend went ok, resume loaded
 image from disk, but then I got
 
 Thawing cpus 
 Booting processor 1/0 eip 3000
 
 ...and very funny effect on keyboard leds. They started to blink
 (panic-like), but with very wrong frequency. It looked like 2 cpus
 doing panic blinks at once...
Check if /sys/device/system/cpu/cpu1/online attribute works. If it
works, then it's other issue. I only tested the patches in two HT based
systems.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6]physical CPU hot add

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote:
 On Tue, 12 Apr 2005, Li Shaohua wrote:
 
   #ifdef CONFIG_HOTPLUG_CPU
  +int __attribute__ ((weak)) smp_prepare_cpu(int cpu)
  +{
  +   return 0;
  +}
  +
 
 Any way for you to avoid using weak attribute?
Just want to avoid more 'ifdef' or 'define empty routine for other
archs' staffs. Someone prefer 'weak' attribute. Either way is ok to me,
but if you think the former is better, I'd change it.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/6]sep initializing rework

2005-04-12 Thread Li Shaohua
On Wed, 2005-04-13 at 01:57, Protasevich, Natalie wrote:
 Hello,
 This is a hotplug CPU patch for i386, done against 2.6.12-rc2-mm3.
 Somewhat alternative to the one posted by Li Shaohua, but not really
 (and I didn't mean that :). If you look closer, our patches are
 different and can complement each other I think. Li did great job on
 sep, after-offline cleanup, __devinit etc., and I have some radical
 changes in the AP bringup mechanism. I left alone __init to __devinit
 part (I was going through it lately, but I think even though I had few
 more than Li did, he covered it sufficiently perhaps). I started
 having
 doubts in free_initmem() vs __devinit because look how many of
 __init's
 left! just a few :). 
Looks quite smart, but people will argue it will keep all __init
sections in this way. I'd like we keep the default behavior of __init. 

 I got rid of do_boot_cpu loop in smpboot.c because
 the loop
 static void __init smp_init(void)
 {
 unsigned int i;
 
 /* FIXME: This should be done in userspace --RR */
 for_each_present_cpu(i) {
 if (num_online_cpus() = max_cpus)
 break;
 if (!cpu_online(i))
 cpu_up(i);
 }
 ...
 does it again so why leave it in smpboot.c to boot AP's twice. 
This is what IA64 does. In this way, you must clean up the bogomips
message, TSC synchronization. And CPU_UP could be called in user
context, so fork_idle possibly should be in workqueue. And please make
sure it doesn't break other things like check_nmi_watchdog. I just
select an easy way (add smp_prepare_cpu) and it doesn't break anything. 

 I also
 found that my system fails sooner or later when I try not to synch
 runtime booted processor with others, so I changed tsc synchronization
 to only sync between booting CPU and the one that boots it. 
IA64 also does like this. It synchronizes one AP's ITC against BP's one
time. But in IA32, TSC's upper 32 bits can be written only on prescott
and above. In earlier CPU, upper 32 bits will become 0 after any write.

 The patch
 works for me on Intel 8x generic box, and on ES7000. I was asked to
 separate my patch into smaller ones by the theme, but I'm posting the
 entire patch for now, because I think it is probably not the final
 one.
 I think (I hope) I will sync up with Li later on.
 My idea was that if we find a CPU core in ACPI (enabled or disabled),
 we
 encounter for it in sibling map and create a sysfs node accordingly,
 and
 cpu_possible_map will reflect that. We take processors up/down
 depending
 on physical presence using the existing node. That's the scenario
 implemented on ES7000 that reports all possible cores in ACPI marking
 absent processors as disabled. Runtime enablement/disablement depends
 on
 sysfs only and the driving agent can be anything (ACPI or user) that
 triggers sysfs node for this processor.
You possibly can refer to IA64's implementation. The goal of my patches
are to support suspend/resume, which actually doesn't really hotremove a
CPU, so I just ignored the sysfs/ACPI issues.

Thanks,
Shaohua

 
 -Original Message-
 From: Zwane Mwaikambo [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, April 12, 2005 6:08 AM
 To: Li Shaohua
 Cc: lkml; ACPI-DEV; Len Brown; Pavel Machek; Andrew Morton;
 Protasevich,
 Natalie; Ryan Harper
 Subject: Re: [PATCH 1/6]sep initializing rework
 
 Hello Shaohua,
 
 On Tue, 12 Apr 2005, Li Shaohua wrote:
 
  These patches (together with 5 patches followed this one) are
 updated 
  suspend/resume SMP patches. The patches fixed some bugs and do clean
  up as suggested. Now they work for both suspend-to-ram and
 suspend-to-disk.
  Patches are against 2.6.12-rc2-mm3.
 
 These patches look good and i think we should go ahead with them. I've
 also cross checked with physical hotplug cpu patches for ES7xxx from
 Natalie (added to Cc) and it does indeed look like a lot of the code
 will work for her too, but i'd appreciate it if she also does a double
 check. 
 Obviously this won't work for other upcoming users of hotplug cpu like
 Xen (Ryan added to Cc) but i think we can abstract things later on to
 cover other special users.
 
 Thanks Shaohua,
 Zwane
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6]init call cleanup

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 17:32, Rolf Eike Beer wrote:
 Li Shaohua wrote:
  Trival patch for CPU hotplug. In CPU identify  part, only did cleaup
 for
  intel CPUs. Need do for other CPUs if they support S3 SMP.
 
  @@ -405,7 +405,7 @@ void __init init_bsp_APIC(void)
apic_write_around(APIC_LVT1, value);
   }
 
  -void __init setup_local_APIC (void)
  +void __devinit setup_local_APIC (void)
   ^
 
   {
unsigned long oldvalue, value, ver, maxlvt;
 
 
 Please remove this space while you are at it.
 
  @@ -556,7 +556,7 @@ void __init early_cpu_init(void)
* and IDT. We reload them nevertheless, this function acts as a
* 'CPU state barrier', nothing should get across.
*/
  -void __init cpu_init (void)
  +void __devinit cpu_init (void)
   {
int cpu = smp_processor_id();
struct tss_struct * t = per_cpu(init_tss, cpu);
 
 This one too.
Removed the space at two places as suggested.

Thanks,
Shaohua

Trival patch for CPU hotplug. In CPU identify  part, only did cleaup for intel
CPUs. Need do for other CPUs if they support S3 SMP.

---

 linux-2.6.11-root/arch/i386/kernel/apic.c|   14 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/common.c  |   30 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/intel.c   |   12 +++---
 linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c  |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c   |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/process.c |2 -
 linux-2.6.11-root/arch/i386/kernel/setup.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   18 -
 linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 -
 12 files changed, 48 insertions(+), 48 deletions(-)

diff -puN arch/i386/kernel/apic.c~init_call_cleanup arch/i386/kernel/apic.c
--- linux-2.6.11/arch/i386/kernel/apic.c~init_call_cleanup  2005-04-12 
10:37:07.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/apic.c   2005-04-13 10:57:55.817365288 
+0800
@@ -405,7 +405,7 @@ void __init init_bsp_APIC(void)
apic_write_around(APIC_LVT1, value);
 }
 
-void __init setup_local_APIC (void)
+void __devinit setup_local_APIC(void)
 {
unsigned long oldvalue, value, ver, maxlvt;
 
@@ -676,7 +676,7 @@ static struct sys_device device_lapic = 
.cls= lapic_sysclass,
 };
 
-static void __init apic_pm_activate(void)
+static void __devinit apic_pm_activate(void)
 {
apic_pm_state.active = 1;
 }
@@ -877,7 +877,7 @@ fake_ioapic_page:
  * but we do not accept timer interrupts yet. We only allow the BP
  * to calibrate.
  */
-static unsigned int __init get_8254_timer_count(void)
+static unsigned int __devinit get_8254_timer_count(void)
 {
extern spinlock_t i8253_lock;
unsigned long flags;
@@ -896,7 +896,7 @@ static unsigned int __init get_8254_time
 }
 
 /* next tick in 8254 can be caught by catching timer wraparound */
-static void __init wait_8254_wraparound(void)
+static void __devinit wait_8254_wraparound(void)
 {
unsigned int curr_count, prev_count;
 
@@ -916,7 +916,7 @@ static void __init wait_8254_wraparound(
  * Default initialization for 8254 timers. If we use other timers like HPET,
  * we override this later
  */
-void (*wait_timer_tick)(void) __initdata = wait_8254_wraparound;
+void (*wait_timer_tick)(void) __devinitdata = wait_8254_wraparound;
 
 /*
  * This function sets up the local APIC timer, with a timeout of
@@ -952,7 +952,7 @@ static void __setup_APIC_LVTT(unsigned i
apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
 }
 
-static void __init setup_APIC_timer(unsigned int clocks)
+static void __devinit setup_APIC_timer(unsigned int clocks)
 {
unsigned long flags;
 
@@ -1065,7 +1065,7 @@ void __init setup_boot_APIC_clock(void)
local_irq_enable();
 }
 
-void __init setup_secondary_APIC_clock(void)
+void __devinit setup_secondary_APIC_clock(void)
 {
setup_APIC_timer(calibration_result);
 }
diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup
2005-04-12 10:37:07.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-13 
10:58:25.777810608 +0800
@@ -24,9 +24,9 @@ EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
 DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
 EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);
 
-static int cachesize_override __initdata = -1;
-static int disable_x86_fxsr __initdata = 0;
-static int disable_x86_serial_nr __initdata = 1;
+static int cachesize_override __devinitdata = -1;
+static int disable_x86_fxsr __devinitdata = 0;
+static int disable_x86_serial_nr __devinitdata = 1;
 
 struct cpu_dev

Re: [PATCH 5/6]physical CPU hot add

2005-04-12 Thread Li Shaohua
On Tue, 2005-04-12 at 20:17, Zwane Mwaikambo wrote:
 On Tue, 12 Apr 2005, Li Shaohua wrote:
 
   #ifdef CONFIG_HOTPLUG_CPU
  +int __attribute__ ((weak)) smp_prepare_cpu(int cpu)
  +{
  +   return 0;
  +}
  +
 
 Any way for you to avoid using weak attribute?
Replace weak attribute with define method as suggested.

Thanks,
Shaohua


---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |  112 ---
 linux-2.6.11-root/drivers/base/cpu.c |7 +
 linux-2.6.11-root/include/asm-i386/smp.h |3 
 3 files changed, 93 insertions(+), 29 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu   2005-04-13 
10:58:37.152081456 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-13 
10:58:37.159080392 +0800
@@ -80,6 +80,12 @@ cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
 static cpumask_t smp_commenced_mask;
 
+/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there
+ * is no way to resync one AP against BP. TBD: for prescott and above, we
+ * should use IA64's algorithm
+ */
+static int __devinitdata tsc_sync_disabled;
+
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
@@ -416,7 +422,7 @@ static void __devinit smp_callin(void)
/*
 *  Synchronize the TSC with the BP
 */
-   if (cpu_has_tsc  cpu_khz)
+   if (cpu_has_tsc  cpu_khz  !tsc_sync_disabled)
synchronize_tsc_ap();
 }
 
@@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void)
return cpu;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS];
+static inline struct task_struct * alloc_idle_task(int cpu)
+{
+   struct task_struct *idle;
+
+   if ((idle = cpu_idle_tasks[cpu]) != NULL) {
+   /* initialize thread_struct.  we really want to avoid destroy
+* idle tread
+*/
+   idle-thread.esp = (unsigned long)(((struct pt_regs *)
+   (THREAD_SIZE + (unsigned long) idle-thread_info)) - 1);
+   init_idle(idle, cpu);
+   return idle;
+   }
+   idle = fork_idle(cpu);
+
+   if (!IS_ERR(idle))
+   cpu_idle_tasks[cpu] = idle;
+   return idle;
+}
+#else
+#define alloc_idle_task(cpu) fork_idle(cpu)
+#endif
+
 static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
@@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
 */
-   idle = fork_idle(cpu);
+   idle = alloc_idle_task(cpu);
if (IS_ERR(idle))
panic(failed fork for CPU %d, cpu);
idle-thread.eip = (unsigned long) start_secondary;
@@ -931,6 +962,55 @@ void cpu_exit_clear(void)
cpu_clear(cpu, smp_commenced_mask);
unmap_cpu_to_logical_apicid(cpu);
 }
+
+struct warm_boot_cpu_info {
+   struct completion *complete;
+   int apicid;
+   int cpu;
+};
+
+static void __devinit do_warm_boot_cpu(void *p)
+{
+   struct warm_boot_cpu_info *info = p;
+   do_boot_cpu(info-apicid, info-cpu);
+   complete(info-complete);
+}
+
+int __devinit smp_prepare_cpu(int cpu)
+{
+   DECLARE_COMPLETION(done);
+   struct warm_boot_cpu_info info;
+   struct work_struct task;
+   int apicid, ret;
+
+   lock_cpu_hotplug();
+   apicid = x86_cpu_to_apicid[cpu];
+   if (apicid == BAD_APICID) {
+   ret = -ENODEV;
+   goto exit;
+   }
+
+   info.complete = done;
+   info.apicid = apicid;
+   info.cpu = cpu;
+   INIT_WORK(task, do_warm_boot_cpu, info);
+
+   tsc_sync_disabled = 1;
+
+   /* init low mem mapping */
+   memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+   sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+   flush_tlb_all();
+   schedule_work(task);
+   wait_for_completion(done);
+
+   tsc_sync_disabled = 0;
+   zap_low_mappings();
+   ret = 0;
+exit:
+   unlock_cpu_hotplug();
+   return ret;
+}
 #endif
 
 static void smp_tune_scheduling (void)
@@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-
-/* must be called with the cpucontrol mutex held */
-static int __devinit cpu_enable(unsigned int cpu)
-{
-   /* get the target out of its holding state */
-   per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
-   wmb();
-
-   /* wait for the processor to ack it. timeout? */
-   while (!cpu_online(cpu))
-   cpu_relax();
-
-   fixup_irqs(cpu_online_map);
-   /* counter the disable in fixup_irqs() */
-   local_irq_enable();
-   return 0;
-}
-
 static void
 remove_siblinginfo(int cpu)
 {
@@ -1270,14 +1332,6

[PATCH 4/6]cpu state clean after hot remove

2005-04-11 Thread Li Shaohua
Clean CPU states in order to reuse smp boot code for CPU hotplug.

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>
---

 linux-2.6.11-root/arch/i386/kernel/cpu/common.c |   12 
 linux-2.6.11-root/arch/i386/kernel/irq.c|5 +
 linux-2.6.11-root/arch/i386/kernel/process.c|   19 +++
 linux-2.6.11-root/arch/i386/kernel/smpboot.c|   62 ++--
 linux-2.6.11-root/include/asm-i386/irq.h|2 
 linux-2.6.11-root/include/asm-i386/smp.h|5 +
 6 files changed, 89 insertions(+), 16 deletions(-)

diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean  2005-04-12 
10:37:50.642376224 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 
10:37:50.654374400 +0800
@@ -644,3 +644,15 @@ void __devinit cpu_init (void)
clear_used_math();
mxcsr_feature_mask_init();
 }
+
+#ifdef CONFIG_HOTPLUG_CPU
+void __devinit cpu_uninit(void)
+{
+   int cpu = _smp_processor_id();
+   cpu_clear(cpu, cpu_initialized);
+
+   /* lazy TLB state */
+   per_cpu(cpu_tlbstate, cpu).state = 0;
+   per_cpu(cpu_tlbstate, cpu).active_mm = _mm;
+}
+#endif
diff -puN arch/i386/kernel/irq.c~cpu_state_clean arch/i386/kernel/irq.c
--- linux-2.6.11/arch/i386/kernel/irq.c~cpu_state_clean 2005-04-12 
10:37:50.643376072 +0800
+++ linux-2.6.11-root/arch/i386/kernel/irq.c2005-04-12 10:37:50.654374400 
+0800
@@ -158,6 +158,11 @@ void irq_ctx_init(int cpu)
cpu,hardirq_ctx[cpu],softirq_ctx[cpu]);
 }
 
+void irq_ctx_exit(int cpu)
+{
+   hardirq_ctx[cpu] = NULL;
+}
+
 extern asmlinkage void __do_softirq(void);
 
 asmlinkage void do_softirq(void)
diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c
--- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-04-12 
10:37:50.645375768 +0800
+++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-12 
10:37:50.655374248 +0800
@@ -148,21 +148,18 @@ static void poll_idle (void)
 /* We don't actually take CPU down, just spin without interrupts. */
 static inline void play_dead(void)
 {
+   /* This must be done before dead CPU ack */
+   cpu_exit_clear();
+   mb();
/* Ack it */
__get_cpu_var(cpu_state) = CPU_DEAD;
 
-   /* We shouldn't have to disable interrupts while dead, but
-* some interrupts just don't seem to go away, and this makes
-* it "work" for testing purposes. */
-   /* Death loop */
-   while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE)
-   cpu_relax();
-
+   /*
+* With physical CPU hotplug, we should halt the cpu
+*/
local_irq_disable();
-   __flush_tlb_all();
-   cpu_set(smp_processor_id(), cpu_online_map);
-   enable_APIC_timer();
-   local_irq_enable();
+   while (1)
+   __asm__ __volatile__("hlt":::"memory");
 }
 #else
 static inline void play_dead(void)
diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-04-12 
10:37:50.646375616 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
10:37:50.656374096 +0800
@@ -798,8 +798,18 @@ wakeup_secondary_cpu(int phys_apicid, un
 #endif /* WAKE_SECONDARY_VIA_INIT */
 
 extern cpumask_t cpu_initialized;
+static inline int alloc_cpu_id(void)
+{
+   cpumask_t   tmp_map;
+   int cpu;
+   cpus_complement(tmp_map, cpu_present_map);
+   cpu = first_cpu(tmp_map);
+   if (cpu >= NR_CPUS)
+   return -ENODEV;
+   return cpu;
+}
 
-static int __devinit do_boot_cpu(int apicid)
+static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -808,11 +818,12 @@ static int __devinit do_boot_cpu(int api
 {
struct task_struct *idle;
unsigned long boot_error;
-   int timeout, cpu;
+   int timeout;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;
 
-   cpu = ++cpucount;
+   ++cpucount;
+
/*
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
@@ -884,13 +895,16 @@ static int __devinit do_boot_cpu(int api
inquire_remote_apic(apicid);
}
}
-   x86_cpu_to_apicid[cpu] = apicid;
+
if (boot_error) {
/* Try to put things back the way they were before ... */
unmap_cpu_to_logical_apicid(cpu);
cpu_clear(cpu, cpu_callout_map); /* was set here 
(do_boot_cpu()) */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpucount--;
+   } else {
+   x86_cpu_to_apicid[cpu] = apicid;
+

[PATCH 6/6]suspend/resume SMP support

2005-04-11 Thread Li Shaohua
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
disable/enable_nonboot_cpus API. The S4 part is based on Pavel's
original S4 SMP patch.

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>
---

 linux-2.6.11-root/drivers/acpi/Kconfig|2 
 linux-2.6.11-root/include/linux/suspend.h |2 
 linux-2.6.11-root/kernel/power/Kconfig|2 
 linux-2.6.11-root/kernel/power/disk.c |   36 ++-
 linux-2.6.11-root/kernel/power/main.c |   16 +++--
 linux-2.6.11-root/kernel/power/smp.c  |   91 +++---
 linux-2.6.11-root/kernel/power/swsusp.c   |2 
 7 files changed, 69 insertions(+), 82 deletions(-)

diff -puN drivers/acpi/Kconfig~smp_sleep drivers/acpi/Kconfig
--- linux-2.6.11/drivers/acpi/Kconfig~smp_sleep 2005-04-12 11:11:14.884685080 
+0800
+++ linux-2.6.11-root/drivers/acpi/Kconfig  2005-04-12 11:11:14.898682952 
+0800
@@ -57,7 +57,7 @@ if ACPI_INTERPRETER
 
 config ACPI_SLEEP
bool "Sleep States (EXPERIMENTAL)"
-   depends on X86
+   depends on X86 && (!SMP || HOTPLUG_CPU)
depends on EXPERIMENTAL
default y
---help---
diff -puN include/linux/suspend.h~smp_sleep include/linux/suspend.h
--- linux-2.6.11/include/linux/suspend.h~smp_sleep  2005-04-12 
11:11:14.885684928 +0800
+++ linux-2.6.11-root/include/linux/suspend.h   2005-04-12 11:11:14.898682952 
+0800
@@ -58,7 +58,7 @@ static inline int software_suspend(void)
 }
 #endif
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_HOTPLUG_CPU
 extern void disable_nonboot_cpus(void);
 extern void enable_nonboot_cpus(void);
 #else
diff -puN kernel/power/disk.c~smp_sleep kernel/power/disk.c
--- linux-2.6.11/kernel/power/disk.c~smp_sleep  2005-04-12 11:11:14.887684624 
+0800
+++ linux-2.6.11-root/kernel/power/disk.c   2005-04-12 11:11:14.899682800 
+0800
@@ -117,8 +117,8 @@ static void finish(void)
 {
device_resume();
platform_finish();
-   enable_nonboot_cpus();
thaw_processes();
+   enable_nonboot_cpus();
pm_restore_console();
 }
 
@@ -131,28 +131,36 @@ static int prepare_processes(void)
 
sys_sync();
 
+   disable_nonboot_cpus();
+
if (freeze_processes()) {
error = -EBUSY;
-   return error;
+   goto enable_cpu;
}
 
if (pm_disk_mode == PM_DISK_PLATFORM) {
if (pm_ops && pm_ops->prepare) {
if ((error = pm_ops->prepare(PM_SUSPEND_DISK)))
-   return error;
+   goto thaw;
}
}
 
/* Free memory before shutting down devices. */
free_some_memory();
-
return 0;
+thaw:
+   thaw_processes();
+enable_cpu:
+   enable_nonboot_cpus();
+   pm_restore_console();
+   return error;
 }
 
 static void unprepare_processes(void)
 {
-   enable_nonboot_cpus();
+   platform_finish();
thaw_processes();
+   enable_nonboot_cpus();
pm_restore_console();
 }
 
@@ -160,15 +168,9 @@ static int prepare_devices(void)
 {
int error;
 
-   disable_nonboot_cpus();
-   if ((error = device_suspend(PMSG_FREEZE))) {
+   if ((error = device_suspend(PMSG_FREEZE)))
printk("Some devices failed to suspend\n");
-   platform_finish();
-   enable_nonboot_cpus();
-   return error;
-   }
-
-   return 0;
+   return error;
 }
 
 /**
@@ -185,9 +187,9 @@ int pm_suspend_disk(void)
int error;
 
error = prepare_processes();
-   if (!error) {
-   error = prepare_devices();
-   }
+   if (error)
+   return error;
+   error = prepare_devices();
 
if (error) {
unprepare_processes();
@@ -250,7 +252,7 @@ static int software_resume(void)
 
if ((error = prepare_processes())) {
swsusp_close();
-   goto Cleanup;
+   goto Done;
}
 
pr_debug("PM: Reading swsusp image.\n");
diff -puN kernel/power/Kconfig~smp_sleep kernel/power/Kconfig
--- linux-2.6.11/kernel/power/Kconfig~smp_sleep 2005-04-12 11:11:14.888684472 
+0800
+++ linux-2.6.11-root/kernel/power/Kconfig  2005-04-12 11:11:14.899682800 
+0800
@@ -28,7 +28,7 @@ config PM_DEBUG
 
 config SOFTWARE_SUSPEND
bool "Software Suspend (EXPERIMENTAL)"
-   depends on EXPERIMENTAL && PM && SWAP
+   depends on EXPERIMENTAL && PM && SWAP && (HOTPLUG_CPU || !SMP)
---help---
  Enable the possibility of suspending the machine.
  It doesn't need APM.
diff -puN kernel/power/main.c~smp_sleep kernel/power/main.c
--- linux-2.6.11/kernel/power/main.c~smp_sleep  2005-04-12 11:11:14.890684168 
+0800
+++ linux-2.6.11-root/kernel/power/main.c   2005-04-12 11:11:14.899682800 
+0800
@@ -59,6 +59,13 @@ static int suspend_prepare(suspend_

[PATCH 5/6]physical CPU hot add

2005-04-11 Thread Li Shaohua
Boot a CPU at runtime.

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>
---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |  112 ---
 linux-2.6.11-root/drivers/base/cpu.c |8 +
 linux-2.6.11-root/include/asm-i386/smp.h |2 
 3 files changed, 93 insertions(+), 29 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu   2005-04-12 
10:38:16.720411760 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
11:11:09.16040 +0800
@@ -80,6 +80,12 @@ cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
 static cpumask_t smp_commenced_mask;
 
+/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there
+ * is no way to resync one AP against BP. TBD: for prescott and above, we
+ * should use IA64's algorithm
+ */
+static int __devinitdata tsc_sync_disabled;
+
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
@@ -416,7 +422,7 @@ static void __devinit smp_callin(void)
/*
 *  Synchronize the TSC with the BP
 */
-   if (cpu_has_tsc && cpu_khz)
+   if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled)
synchronize_tsc_ap();
 }
 
@@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void)
return cpu;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS];
+static inline struct task_struct * alloc_idle_task(int cpu)
+{
+   struct task_struct *idle;
+
+   if ((idle = cpu_idle_tasks[cpu]) != NULL) {
+   /* initialize thread_struct.  we really want to avoid destroy
+* idle tread
+*/
+   idle->thread.esp = (unsigned long)(((struct pt_regs *)
+   (THREAD_SIZE + (unsigned long) idle->thread_info)) - 1);
+   init_idle(idle, cpu);
+   return idle;
+   }
+   idle = fork_idle(cpu);
+
+   if (!IS_ERR(idle))
+   cpu_idle_tasks[cpu] = idle;
+   return idle;
+}
+#else
+#define alloc_idle_task(cpu) fork_idle(cpu)
+#endif
+
 static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
@@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
 */
-   idle = fork_idle(cpu);
+   idle = alloc_idle_task(cpu);
if (IS_ERR(idle))
panic("failed fork for CPU %d", cpu);
idle->thread.eip = (unsigned long) start_secondary;
@@ -931,6 +962,55 @@ void cpu_exit_clear(void)
cpu_clear(cpu, smp_commenced_mask);
unmap_cpu_to_logical_apicid(cpu);
 }
+
+struct warm_boot_cpu_info {
+   struct completion *complete;
+   int apicid;
+   int cpu;
+};
+
+static void __devinit do_warm_boot_cpu(void *p)
+{
+   struct warm_boot_cpu_info *info = p;
+   do_boot_cpu(info->apicid, info->cpu);
+   complete(info->complete);
+}
+
+int __devinit smp_prepare_cpu(int cpu)
+{
+   DECLARE_COMPLETION(done);
+   struct warm_boot_cpu_info info;
+   struct work_struct task;
+   int apicid, ret;
+
+   lock_cpu_hotplug();
+   apicid = x86_cpu_to_apicid[cpu];
+   if (apicid == BAD_APICID) {
+   ret = -ENODEV;
+   goto exit;
+   }
+
+   info.complete = 
+   info.apicid = apicid;
+   info.cpu = cpu;
+   INIT_WORK(, do_warm_boot_cpu, );
+
+   tsc_sync_disabled = 1;
+
+   /* init low mem mapping */
+   memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+   sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+   flush_tlb_all();
+   schedule_work();
+   wait_for_completion();
+
+   tsc_sync_disabled = 0;
+   zap_low_mappings();
+   ret = 0;
+exit:
+   unlock_cpu_hotplug();
+   return ret;
+}
 #endif
 
 static void smp_tune_scheduling (void)
@@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-
-/* must be called with the cpucontrol mutex held */
-static int __devinit cpu_enable(unsigned int cpu)
-{
-   /* get the target out of its holding state */
-   per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
-   wmb();
-
-   /* wait for the processor to ack it. timeout? */
-   while (!cpu_online(cpu))
-   cpu_relax();
-
-   fixup_irqs(cpu_online_map);
-   /* counter the disable in fixup_irqs() */
-   local_irq_enable();
-   return 0;
-}
-
 static void
 remove_siblinginfo(int cpu)
 {
@@ -1270,14 +1332,6 @@ int __devinit __cpu_up(unsigned int cpu)
return -EIO;
}
 
-#ifdef CONFIG_HOTPLUG_CPU
-   /* Already up, and in cpu_quiescent now? */
-   if (cpu_isset(cpu, smp_commenced_mask)) {

[PATCH 5/6]physical CPU hot add

2005-04-11 Thread Li Shaohua
Boot a CPU at runtime.

Signed-off-by: Li Shaohua[EMAIL PROTECTED]
---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |  112 ---
 linux-2.6.11-root/drivers/base/cpu.c |8 +
 linux-2.6.11-root/include/asm-i386/smp.h |2 
 3 files changed, 93 insertions(+), 29 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~warm_boot_cpu arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~warm_boot_cpu   2005-04-12 
10:38:16.720411760 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
11:11:09.16040 +0800
@@ -80,6 +80,12 @@ cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
 static cpumask_t smp_commenced_mask;
 
+/* TSC's upper 32 bits can't be written in eariler CPU (before prescott), there
+ * is no way to resync one AP against BP. TBD: for prescott and above, we
+ * should use IA64's algorithm
+ */
+static int __devinitdata tsc_sync_disabled;
+
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
@@ -416,7 +422,7 @@ static void __devinit smp_callin(void)
/*
 *  Synchronize the TSC with the BP
 */
-   if (cpu_has_tsc  cpu_khz)
+   if (cpu_has_tsc  cpu_khz  !tsc_sync_disabled)
synchronize_tsc_ap();
 }
 
@@ -809,6 +815,31 @@ static inline int alloc_cpu_id(void)
return cpu;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static struct task_struct * __devinitdata cpu_idle_tasks[NR_CPUS];
+static inline struct task_struct * alloc_idle_task(int cpu)
+{
+   struct task_struct *idle;
+
+   if ((idle = cpu_idle_tasks[cpu]) != NULL) {
+   /* initialize thread_struct.  we really want to avoid destroy
+* idle tread
+*/
+   idle-thread.esp = (unsigned long)(((struct pt_regs *)
+   (THREAD_SIZE + (unsigned long) idle-thread_info)) - 1);
+   init_idle(idle, cpu);
+   return idle;
+   }
+   idle = fork_idle(cpu);
+
+   if (!IS_ERR(idle))
+   cpu_idle_tasks[cpu] = idle;
+   return idle;
+}
+#else
+#define alloc_idle_task(cpu) fork_idle(cpu)
+#endif
+
 static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
@@ -828,7 +859,7 @@ static int __devinit do_boot_cpu(int api
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
 */
-   idle = fork_idle(cpu);
+   idle = alloc_idle_task(cpu);
if (IS_ERR(idle))
panic(failed fork for CPU %d, cpu);
idle-thread.eip = (unsigned long) start_secondary;
@@ -931,6 +962,55 @@ void cpu_exit_clear(void)
cpu_clear(cpu, smp_commenced_mask);
unmap_cpu_to_logical_apicid(cpu);
 }
+
+struct warm_boot_cpu_info {
+   struct completion *complete;
+   int apicid;
+   int cpu;
+};
+
+static void __devinit do_warm_boot_cpu(void *p)
+{
+   struct warm_boot_cpu_info *info = p;
+   do_boot_cpu(info-apicid, info-cpu);
+   complete(info-complete);
+}
+
+int __devinit smp_prepare_cpu(int cpu)
+{
+   DECLARE_COMPLETION(done);
+   struct warm_boot_cpu_info info;
+   struct work_struct task;
+   int apicid, ret;
+
+   lock_cpu_hotplug();
+   apicid = x86_cpu_to_apicid[cpu];
+   if (apicid == BAD_APICID) {
+   ret = -ENODEV;
+   goto exit;
+   }
+
+   info.complete = done;
+   info.apicid = apicid;
+   info.cpu = cpu;
+   INIT_WORK(task, do_warm_boot_cpu, info);
+
+   tsc_sync_disabled = 1;
+
+   /* init low mem mapping */
+   memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+   sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+   flush_tlb_all();
+   schedule_work(task);
+   wait_for_completion(done);
+
+   tsc_sync_disabled = 0;
+   zap_low_mappings();
+   ret = 0;
+exit:
+   unlock_cpu_hotplug();
+   return ret;
+}
 #endif
 
 static void smp_tune_scheduling (void)
@@ -1169,24 +1249,6 @@ void __devinit smp_prepare_boot_cpu(void
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-
-/* must be called with the cpucontrol mutex held */
-static int __devinit cpu_enable(unsigned int cpu)
-{
-   /* get the target out of its holding state */
-   per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
-   wmb();
-
-   /* wait for the processor to ack it. timeout? */
-   while (!cpu_online(cpu))
-   cpu_relax();
-
-   fixup_irqs(cpu_online_map);
-   /* counter the disable in fixup_irqs() */
-   local_irq_enable();
-   return 0;
-}
-
 static void
 remove_siblinginfo(int cpu)
 {
@@ -1270,14 +1332,6 @@ int __devinit __cpu_up(unsigned int cpu)
return -EIO;
}
 
-#ifdef CONFIG_HOTPLUG_CPU
-   /* Already up, and in cpu_quiescent now? */
-   if (cpu_isset(cpu, smp_commenced_mask)) {
-   cpu_enable(cpu);
-   return

[PATCH 6/6]suspend/resume SMP support

2005-04-11 Thread Li Shaohua
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
disable/enable_nonboot_cpus API. The S4 part is based on Pavel's
original S4 SMP patch.

Signed-off-by: Li Shaohua[EMAIL PROTECTED]
---

 linux-2.6.11-root/drivers/acpi/Kconfig|2 
 linux-2.6.11-root/include/linux/suspend.h |2 
 linux-2.6.11-root/kernel/power/Kconfig|2 
 linux-2.6.11-root/kernel/power/disk.c |   36 ++-
 linux-2.6.11-root/kernel/power/main.c |   16 +++--
 linux-2.6.11-root/kernel/power/smp.c  |   91 +++---
 linux-2.6.11-root/kernel/power/swsusp.c   |2 
 7 files changed, 69 insertions(+), 82 deletions(-)

diff -puN drivers/acpi/Kconfig~smp_sleep drivers/acpi/Kconfig
--- linux-2.6.11/drivers/acpi/Kconfig~smp_sleep 2005-04-12 11:11:14.884685080 
+0800
+++ linux-2.6.11-root/drivers/acpi/Kconfig  2005-04-12 11:11:14.898682952 
+0800
@@ -57,7 +57,7 @@ if ACPI_INTERPRETER
 
 config ACPI_SLEEP
bool Sleep States (EXPERIMENTAL)
-   depends on X86
+   depends on X86  (!SMP || HOTPLUG_CPU)
depends on EXPERIMENTAL
default y
---help---
diff -puN include/linux/suspend.h~smp_sleep include/linux/suspend.h
--- linux-2.6.11/include/linux/suspend.h~smp_sleep  2005-04-12 
11:11:14.885684928 +0800
+++ linux-2.6.11-root/include/linux/suspend.h   2005-04-12 11:11:14.898682952 
+0800
@@ -58,7 +58,7 @@ static inline int software_suspend(void)
 }
 #endif
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_HOTPLUG_CPU
 extern void disable_nonboot_cpus(void);
 extern void enable_nonboot_cpus(void);
 #else
diff -puN kernel/power/disk.c~smp_sleep kernel/power/disk.c
--- linux-2.6.11/kernel/power/disk.c~smp_sleep  2005-04-12 11:11:14.887684624 
+0800
+++ linux-2.6.11-root/kernel/power/disk.c   2005-04-12 11:11:14.899682800 
+0800
@@ -117,8 +117,8 @@ static void finish(void)
 {
device_resume();
platform_finish();
-   enable_nonboot_cpus();
thaw_processes();
+   enable_nonboot_cpus();
pm_restore_console();
 }
 
@@ -131,28 +131,36 @@ static int prepare_processes(void)
 
sys_sync();
 
+   disable_nonboot_cpus();
+
if (freeze_processes()) {
error = -EBUSY;
-   return error;
+   goto enable_cpu;
}
 
if (pm_disk_mode == PM_DISK_PLATFORM) {
if (pm_ops  pm_ops-prepare) {
if ((error = pm_ops-prepare(PM_SUSPEND_DISK)))
-   return error;
+   goto thaw;
}
}
 
/* Free memory before shutting down devices. */
free_some_memory();
-
return 0;
+thaw:
+   thaw_processes();
+enable_cpu:
+   enable_nonboot_cpus();
+   pm_restore_console();
+   return error;
 }
 
 static void unprepare_processes(void)
 {
-   enable_nonboot_cpus();
+   platform_finish();
thaw_processes();
+   enable_nonboot_cpus();
pm_restore_console();
 }
 
@@ -160,15 +168,9 @@ static int prepare_devices(void)
 {
int error;
 
-   disable_nonboot_cpus();
-   if ((error = device_suspend(PMSG_FREEZE))) {
+   if ((error = device_suspend(PMSG_FREEZE)))
printk(Some devices failed to suspend\n);
-   platform_finish();
-   enable_nonboot_cpus();
-   return error;
-   }
-
-   return 0;
+   return error;
 }
 
 /**
@@ -185,9 +187,9 @@ int pm_suspend_disk(void)
int error;
 
error = prepare_processes();
-   if (!error) {
-   error = prepare_devices();
-   }
+   if (error)
+   return error;
+   error = prepare_devices();
 
if (error) {
unprepare_processes();
@@ -250,7 +252,7 @@ static int software_resume(void)
 
if ((error = prepare_processes())) {
swsusp_close();
-   goto Cleanup;
+   goto Done;
}
 
pr_debug(PM: Reading swsusp image.\n);
diff -puN kernel/power/Kconfig~smp_sleep kernel/power/Kconfig
--- linux-2.6.11/kernel/power/Kconfig~smp_sleep 2005-04-12 11:11:14.888684472 
+0800
+++ linux-2.6.11-root/kernel/power/Kconfig  2005-04-12 11:11:14.899682800 
+0800
@@ -28,7 +28,7 @@ config PM_DEBUG
 
 config SOFTWARE_SUSPEND
bool Software Suspend (EXPERIMENTAL)
-   depends on EXPERIMENTAL  PM  SWAP
+   depends on EXPERIMENTAL  PM  SWAP  (HOTPLUG_CPU || !SMP)
---help---
  Enable the possibility of suspending the machine.
  It doesn't need APM.
diff -puN kernel/power/main.c~smp_sleep kernel/power/main.c
--- linux-2.6.11/kernel/power/main.c~smp_sleep  2005-04-12 11:11:14.890684168 
+0800
+++ linux-2.6.11-root/kernel/power/main.c   2005-04-12 11:11:14.899682800 
+0800
@@ -59,6 +59,13 @@ static int suspend_prepare(suspend_state
 
pm_prepare_console();
 
+   disable_nonboot_cpus();
+
+   if (num_online_cpus() != 1) {
+   error = -EPERM

[PATCH 4/6]cpu state clean after hot remove

2005-04-11 Thread Li Shaohua
Clean CPU states in order to reuse smp boot code for CPU hotplug.

Signed-off-by: Li Shaohua[EMAIL PROTECTED]
---

 linux-2.6.11-root/arch/i386/kernel/cpu/common.c |   12 
 linux-2.6.11-root/arch/i386/kernel/irq.c|5 +
 linux-2.6.11-root/arch/i386/kernel/process.c|   19 +++
 linux-2.6.11-root/arch/i386/kernel/smpboot.c|   62 ++--
 linux-2.6.11-root/include/asm-i386/irq.h|2 
 linux-2.6.11-root/include/asm-i386/smp.h|5 +
 6 files changed, 89 insertions(+), 16 deletions(-)

diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean  2005-04-12 
10:37:50.642376224 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-04-12 
10:37:50.654374400 +0800
@@ -644,3 +644,15 @@ void __devinit cpu_init (void)
clear_used_math();
mxcsr_feature_mask_init();
 }
+
+#ifdef CONFIG_HOTPLUG_CPU
+void __devinit cpu_uninit(void)
+{
+   int cpu = _smp_processor_id();
+   cpu_clear(cpu, cpu_initialized);
+
+   /* lazy TLB state */
+   per_cpu(cpu_tlbstate, cpu).state = 0;
+   per_cpu(cpu_tlbstate, cpu).active_mm = init_mm;
+}
+#endif
diff -puN arch/i386/kernel/irq.c~cpu_state_clean arch/i386/kernel/irq.c
--- linux-2.6.11/arch/i386/kernel/irq.c~cpu_state_clean 2005-04-12 
10:37:50.643376072 +0800
+++ linux-2.6.11-root/arch/i386/kernel/irq.c2005-04-12 10:37:50.654374400 
+0800
@@ -158,6 +158,11 @@ void irq_ctx_init(int cpu)
cpu,hardirq_ctx[cpu],softirq_ctx[cpu]);
 }
 
+void irq_ctx_exit(int cpu)
+{
+   hardirq_ctx[cpu] = NULL;
+}
+
 extern asmlinkage void __do_softirq(void);
 
 asmlinkage void do_softirq(void)
diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c
--- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-04-12 
10:37:50.645375768 +0800
+++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-12 
10:37:50.655374248 +0800
@@ -148,21 +148,18 @@ static void poll_idle (void)
 /* We don't actually take CPU down, just spin without interrupts. */
 static inline void play_dead(void)
 {
+   /* This must be done before dead CPU ack */
+   cpu_exit_clear();
+   mb();
/* Ack it */
__get_cpu_var(cpu_state) = CPU_DEAD;
 
-   /* We shouldn't have to disable interrupts while dead, but
-* some interrupts just don't seem to go away, and this makes
-* it work for testing purposes. */
-   /* Death loop */
-   while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE)
-   cpu_relax();
-
+   /*
+* With physical CPU hotplug, we should halt the cpu
+*/
local_irq_disable();
-   __flush_tlb_all();
-   cpu_set(smp_processor_id(), cpu_online_map);
-   enable_APIC_timer();
-   local_irq_enable();
+   while (1)
+   __asm__ __volatile__(hlt:::memory);
 }
 #else
 static inline void play_dead(void)
diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-04-12 
10:37:50.646375616 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-12 
10:37:50.656374096 +0800
@@ -798,8 +798,18 @@ wakeup_secondary_cpu(int phys_apicid, un
 #endif /* WAKE_SECONDARY_VIA_INIT */
 
 extern cpumask_t cpu_initialized;
+static inline int alloc_cpu_id(void)
+{
+   cpumask_t   tmp_map;
+   int cpu;
+   cpus_complement(tmp_map, cpu_present_map);
+   cpu = first_cpu(tmp_map);
+   if (cpu = NR_CPUS)
+   return -ENODEV;
+   return cpu;
+}
 
-static int __devinit do_boot_cpu(int apicid)
+static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -808,11 +818,12 @@ static int __devinit do_boot_cpu(int api
 {
struct task_struct *idle;
unsigned long boot_error;
-   int timeout, cpu;
+   int timeout;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;
 
-   cpu = ++cpucount;
+   ++cpucount;
+
/*
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
@@ -884,13 +895,16 @@ static int __devinit do_boot_cpu(int api
inquire_remote_apic(apicid);
}
}
-   x86_cpu_to_apicid[cpu] = apicid;
+
if (boot_error) {
/* Try to put things back the way they were before ... */
unmap_cpu_to_logical_apicid(cpu);
cpu_clear(cpu, cpu_callout_map); /* was set here 
(do_boot_cpu()) */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpucount--;
+   } else {
+   x86_cpu_to_apicid[cpu] = apicid;
+   cpu_set(cpu, cpu_present_map

Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-05 Thread Li Shaohua
On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote:
> > > 
> > > I don't understand why this is needed at all.  It looks like a fair
> > > amount of code from do_exit is being duplicated here.  
> > Yes, exactly. Someone who understand do_exit please help clean up the
> > code. I'd like to remove the idle thread, since the smpboot code will
> > create a new idle thread.
> 
> I'd say fix the smpboot code so that it doesn't create new idle tasks
> except during boot.
I tried what you said. But I must use a ugly method to adjust
idle->thread.esp (stack pointer in IA32). otherwise, the stack will soon
overflow after several rounds of hotplug. I'll take close look at if
other fields in thread_info cause problems.
Did you reinitialize the idle's thread_info in ppc? I have no problem to
do it in IA32, but is this a good approach? Creating a new idle thread
for upcoming CPU looks more graceful to me.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-05 Thread Li Shaohua
On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote:
   
   I don't understand why this is needed at all.  It looks like a fair
   amount of code from do_exit is being duplicated here.  
  Yes, exactly. Someone who understand do_exit please help clean up the
  code. I'd like to remove the idle thread, since the smpboot code will
  create a new idle thread.
 
 I'd say fix the smpboot code so that it doesn't create new idle tasks
 except during boot.
I tried what you said. But I must use a ugly method to adjust
idle-thread.esp (stack pointer in IA32). otherwise, the stack will soon
overflow after several rounds of hotplug. I'll take close look at if
other fields in thread_info cause problems.
Did you reinitialize the idle's thread_info in ppc? I have no problem to
do it in IA32, but is this a good approach? Creating a new idle thread
for upcoming CPU looks more graceful to me.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/6]SEP initialization rework

2005-04-04 Thread Li Shaohua
On Tue, 2005-04-05 at 03:10, Zwane Mwaikambo wrote:
> On Mon, 4 Apr 2005, Li Shaohua wrote:
> 
> >  linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
> >  linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   10 ++
> >  linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++
> >  3 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
> > arch/i386/kernel/sysenter.c
> > --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   
> > 2005-03-28 09:32:30.936304248 +0800
> > +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-03-28 
> > 09:58:20.703703792 +0800
> > @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info)
> > int cpu = get_cpu();
> > struct tss_struct *tss = _cpu(init_tss, cpu);
> >  
> > +   if (!boot_cpu_has(X86_FEATURE_SEP)) {
> > +   put_cpu();
> > +   return;
> > +   }
> > +
> 
> Do you have systems like this? Is it really skipping SEP if the boot 
> processor doesn't have SEP?
No, I haven't such system. This is the logic of original SEP
initialization. If the CPU hasn't SEP, original logic doesn't call
'on_each_cpu(enable_sep_cpu,...)'.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-04 Thread Li Shaohua
Hi,
On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote:
> 
> I'd say fix the smpboot code so that it doesn't create new idle tasks
> except during boot.
I'd like the the CPU hotremove case just likes the case that CPU isn't
boot. A non-boot CPU hasn't a idle thread. But you may think it's not
worthy doing. Anyway, I will keep the idle thread in a updated patch
like what you said.

> > > We've been
> > > doing cpu removal on ppc64 logical partitions for a while and never
> > > needed to do anything like this. 
> > Did it remove idle thread? or dead cpu is in a busy loop of idle?
> 
> Neither.  The cpu is definitely offline, but there is no reason to
> free the idle thread.
> 
> > 
> > >  Maybe idle_task_exit would suffice?
> > idle_task_exit seems just drop mm. We need destroy the idle task for
> > physical CPU hotplug, right?
> 
> No.
> 
> > > 
> > > I don't understand the need for this, either.  The existing cpu
> > > hotplug notifier in the scheduler takes care of initializing the sched
> > > domains and groups appropriately for online/offline events; why do you
> > > need to touch the runqueue structures?
> > If a CPU is physically hotremoved from the system, shouldn't we clean
> > its runqueue?
> 
> No.  It should make zero difference to the scheduler whether the "play
> dead" cpu hotplug or "physical" hotplug is being used.  
Keeping some fields like 'cpu_load' are meanless for a hotadded CPU to
me. Just ignore them?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-04 Thread Li Shaohua
On Tue, 2005-04-05 at 03:11, Zwane Mwaikambo wrote:
> On Mon, 4 Apr 2005, Li Shaohua wrote:
> 
> > Clean up all CPU states including its runqueue and idle thread, 
> > so we can use boot time code without any changes.
> > Note this makes /sys/devices/system/cpu/cpux/online unworkable.
> >  
> >  #ifdef CONFIG_HOTPLUG_CPU
> >  #include 
> > +
> > +#ifdef CONFIG_STR_SMP
> > +extern void cpu_exit_clear(int);
> > +#endif
> 
> Perhaps change that ifdef to denote something which clearly shows that its 
> physical hotplug as we'll need this for other users too.
Ok.

> > +#ifdef CONFIG_STR_SMP
> > +extern void do_exit_idle(void);
> > +extern void cpu_uninit(void);
> > +void cpu_exit_clear(int cpu)
> > +{
> > +   int sibling;
> > +   cpucount --;
> 
> Is that protected by the cpu_control semaphore?
cpu_exit_clear is called before the dead CPU ack CPU_DEAD, so it's
finished before __cpu_die returns, which is protected by cpu_control.
Maybe I should add comments for it.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-04 Thread Li Shaohua
On Mon, 2005-04-04 at 17:10, Pavel Machek wrote:
> Hi!
> 
> > > I'm switching suspend2 to use hotplug too. Li, I'll try adding your
> > > patches as well as Zwane's if you like 
> > Great!
> > 
> > > (suspend2 can enter S3, S4 or S5
> > > after writing the image). I'd love to try it on my HT desktop, and
> > > hotplug will get more testing too :>
> > Unfortunately, my patches break Pavel's swsusp SMP, as my patches break
> > current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real
> > mode.
> 
> Uh, I don't like that one. Is it possible to put secondary CPUs back
> to the real mode 
Possibly doesn't need the trouble. Send a SIPI also can wakeup the a CPU
in protected mode.

> so that cpu_up mechanism can handle them?
If S4 also calls a smp_prepare_cpu, then the patches don't break S4. If
people don't complain warm boot a CPU is slow, I'd like S4 also use
smp_prepare_cpu.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 4/6]Add kconfig for S3 SMP

2005-04-04 Thread Li Shaohua
On Mon, 2005-04-04 at 16:59, Pavel Machek wrote:
> Hi!
> 
> > Add kconfig for IA32 S3 SMP.
> > 
> > Thanks,
> > Shaohua
> > 
> > ---
> > 
> >  linux-2.6.11-root/kernel/power/Kconfig |7 +++
> >  1 files changed, 7 insertions(+)
> > 
> > diff -puN kernel/power/Kconfig~smp_s3_kconfig kernel/power/Kconfig
> > --- linux-2.6.11/kernel/power/Kconfig~smp_s3_kconfig2005-03-31 
> > 10:49:57.156487160 +0800
> > +++ linux-2.6.11-root/kernel/power/Kconfig  2005-03-31 10:49:57.158486856 
> > +0800
> > @@ -72,3 +72,10 @@ config PM_STD_PARTITION
> >   suspended image to. It will simply pick the first available swap 
> >   device.
> >  
> > +config STR_SMP
> > +   bool "Suspend to RAM SMP support (EXPERIMENTAL)"
> > +   depends on EXPERIMENTAL && ACPI_SLEEP && !X86_64
> > +   depends on HOTPLUG_CPU
> > +   default y
> > +   ---help---
> > +enable Suspend to RAM SMP support. Some HT systems require this.
> 
> Should this be config option? If we have ACPI_SLEEP and SMP set, we
> should probably require this one (so that user does not have to
> care)
Sure, quite reasonable!

>  Also name is "interesting", perhaps CONFIG_SMP_SLEEP or
> something?
Just because my patches break S4 currently. After we figure out how  to
make both S3 and S4 work, I'll change it like you said.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/6]SEP initialization rework

2005-04-04 Thread Li Shaohua
Hi,

On Mon, 2005-04-04 at 16:46, Pavel Machek wrote:
> > ---
> > 
> >  linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
> >  linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   10 ++
> >  linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++
> >  3 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
> > arch/i386/kernel/sysenter.c
> > --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   
> > 2005-03-28 09:32:30.936304248 +0800
> > +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-03-28 
> > 09:58:20.703703792 +0800
> > @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info)
> > int cpu = get_cpu();
> > struct tss_struct *tss = _cpu(init_tss, cpu);
> >  
> > +   if (!boot_cpu_has(X86_FEATURE_SEP)) {
> > +   put_cpu();
> > +   return;
> > +   }
> > +
> > tss->ss1 = __KERNEL_CS;
> > tss->esp1 = sizeof(struct tss_struct) + (unsigned long) tss;
> > wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
> > @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info)
> >  extern const char vsyscall_int80_start, vsyscall_int80_end;
> >  extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
> >  
> > -static int __init sysenter_setup(void)
> > +int __init sysenter_setup(void)
> >  {
> > void *page = (void *)get_zeroed_page(GFP_ATOMIC);
> >  
> 
> Can this still be __init? I think you are calling it from hotplug code
> now, right?
Only BP executes it. AP calls enable_sep_cpu.

> 
> > diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup 
> > arch/i386/kernel/smpboot.c
> > --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup
> > 2005-03-28 09:33:49.972288952 +0800
> > +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 
> > 09:46:01.814032096 +0800
> > @@ -415,6 +415,8 @@ static void __init smp_callin(void)
> >  
> >  static int cpucount;
> >  
> > +extern int sysenter_setup(void);
> > +extern void enable_sep_cpu(void *);
> >  /*
> >   * Activate a secondary processor.
> >   */
> 
> Perhaps these should go to header file somewhere?
in asm-i386/smp.h?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-04 Thread Li Shaohua
On Mon, 2005-04-04 at 16:01, Nigel Cunningham wrote:
> Hi.
> 
> I'm switching suspend2 to use hotplug too. Li, I'll try adding your
> patches as well as Zwane's if you like 
Great!

> (suspend2 can enter S3, S4 or S5
> after writing the image). I'd love to try it on my HT desktop, and
> hotplug will get more testing too :>
Unfortunately, my patches break Pavel's swsusp SMP, as my patches break
current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real
mode.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-04 Thread Li Shaohua
On Mon, 2005-04-04 at 16:01, Nigel Cunningham wrote:
 Hi.
 
 I'm switching suspend2 to use hotplug too. Li, I'll try adding your
 patches as well as Zwane's if you like 
Great!

 (suspend2 can enter S3, S4 or S5
 after writing the image). I'd love to try it on my HT desktop, and
 hotplug will get more testing too :
Unfortunately, my patches break Pavel's swsusp SMP, as my patches break
current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real
mode.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/6]SEP initialization rework

2005-04-04 Thread Li Shaohua
Hi,

On Mon, 2005-04-04 at 16:46, Pavel Machek wrote:
  ---
  
   linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
   linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   10 ++
   linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++
   3 files changed, 18 insertions(+), 4 deletions(-)
  
  diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
  arch/i386/kernel/sysenter.c
  --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   
  2005-03-28 09:32:30.936304248 +0800
  +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-03-28 
  09:58:20.703703792 +0800
  @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info)
  int cpu = get_cpu();
  struct tss_struct *tss = per_cpu(init_tss, cpu);
   
  +   if (!boot_cpu_has(X86_FEATURE_SEP)) {
  +   put_cpu();
  +   return;
  +   }
  +
  tss-ss1 = __KERNEL_CS;
  tss-esp1 = sizeof(struct tss_struct) + (unsigned long) tss;
  wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
  @@ -41,7 +46,7 @@ void enable_sep_cpu(void *info)
   extern const char vsyscall_int80_start, vsyscall_int80_end;
   extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
   
  -static int __init sysenter_setup(void)
  +int __init sysenter_setup(void)
   {
  void *page = (void *)get_zeroed_page(GFP_ATOMIC);
   
 
 Can this still be __init? I think you are calling it from hotplug code
 now, right?
Only BP executes it. AP calls enable_sep_cpu.

 
  diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup 
  arch/i386/kernel/smpboot.c
  --- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup
  2005-03-28 09:33:49.972288952 +0800
  +++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 
  09:46:01.814032096 +0800
  @@ -415,6 +415,8 @@ static void __init smp_callin(void)
   
   static int cpucount;
   
  +extern int sysenter_setup(void);
  +extern void enable_sep_cpu(void *);
   /*
* Activate a secondary processor.
*/
 
 Perhaps these should go to header file somewhere?
in asm-i386/smp.h?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 4/6]Add kconfig for S3 SMP

2005-04-04 Thread Li Shaohua
On Mon, 2005-04-04 at 16:59, Pavel Machek wrote:
 Hi!
 
  Add kconfig for IA32 S3 SMP.
  
  Thanks,
  Shaohua
  
  ---
  
   linux-2.6.11-root/kernel/power/Kconfig |7 +++
   1 files changed, 7 insertions(+)
  
  diff -puN kernel/power/Kconfig~smp_s3_kconfig kernel/power/Kconfig
  --- linux-2.6.11/kernel/power/Kconfig~smp_s3_kconfig2005-03-31 
  10:49:57.156487160 +0800
  +++ linux-2.6.11-root/kernel/power/Kconfig  2005-03-31 10:49:57.158486856 
  +0800
  @@ -72,3 +72,10 @@ config PM_STD_PARTITION
suspended image to. It will simply pick the first available swap 
device.
   
  +config STR_SMP
  +   bool Suspend to RAM SMP support (EXPERIMENTAL)
  +   depends on EXPERIMENTAL  ACPI_SLEEP  !X86_64
  +   depends on HOTPLUG_CPU
  +   default y
  +   ---help---
  +enable Suspend to RAM SMP support. Some HT systems require this.
 
 Should this be config option? If we have ACPI_SLEEP and SMP set, we
 should probably require this one (so that user does not have to
 care)
Sure, quite reasonable!

  Also name is interesting, perhaps CONFIG_SMP_SLEEP or
 something?
Just because my patches break S4 currently. After we figure out how  to
make both S3 and S4 work, I'll change it like you said.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-04 Thread Li Shaohua
On Mon, 2005-04-04 at 17:10, Pavel Machek wrote:
 Hi!
 
   I'm switching suspend2 to use hotplug too. Li, I'll try adding your
   patches as well as Zwane's if you like 
  Great!
  
   (suspend2 can enter S3, S4 or S5
   after writing the image). I'd love to try it on my HT desktop, and
   hotplug will get more testing too :
  Unfortunately, my patches break Pavel's swsusp SMP, as my patches break
  current 'cpu_up' mechanism. S4 doesn't require to boot AP CPUs from real
  mode.
 
 Uh, I don't like that one. Is it possible to put secondary CPUs back
 to the real mode 
Possibly doesn't need the trouble. Send a SIPI also can wakeup the a CPU
in protected mode.

 so that cpu_up mechanism can handle them?
If S4 also calls a smp_prepare_cpu, then the patches don't break S4. If
people don't complain warm boot a CPU is slow, I'd like S4 also use
smp_prepare_cpu.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-04 Thread Li Shaohua
On Tue, 2005-04-05 at 03:11, Zwane Mwaikambo wrote:
 On Mon, 4 Apr 2005, Li Shaohua wrote:
 
  Clean up all CPU states including its runqueue and idle thread, 
  so we can use boot time code without any changes.
  Note this makes /sys/devices/system/cpu/cpux/online unworkable.
   
   #ifdef CONFIG_HOTPLUG_CPU
   #include asm/nmi.h
  +
  +#ifdef CONFIG_STR_SMP
  +extern void cpu_exit_clear(int);
  +#endif
 
 Perhaps change that ifdef to denote something which clearly shows that its 
 physical hotplug as we'll need this for other users too.
Ok.

  +#ifdef CONFIG_STR_SMP
  +extern void do_exit_idle(void);
  +extern void cpu_uninit(void);
  +void cpu_exit_clear(int cpu)
  +{
  +   int sibling;
  +   cpucount --;
 
 Is that protected by the cpu_control semaphore?
cpu_exit_clear is called before the dead CPU ack CPU_DEAD, so it's
finished before __cpu_die returns, which is protected by cpu_control.
Maybe I should add comments for it.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-04 Thread Li Shaohua
Hi,
On Mon, 2005-04-04 at 23:33, Nathan Lynch wrote:
 
 I'd say fix the smpboot code so that it doesn't create new idle tasks
 except during boot.
I'd like the the CPU hotremove case just likes the case that CPU isn't
boot. A non-boot CPU hasn't a idle thread. But you may think it's not
worthy doing. Anyway, I will keep the idle thread in a updated patch
like what you said.

   We've been
   doing cpu removal on ppc64 logical partitions for a while and never
   needed to do anything like this. 
  Did it remove idle thread? or dead cpu is in a busy loop of idle?
 
 Neither.  The cpu is definitely offline, but there is no reason to
 free the idle thread.
 
  
Maybe idle_task_exit would suffice?
  idle_task_exit seems just drop mm. We need destroy the idle task for
  physical CPU hotplug, right?
 
 No.
 
   
   I don't understand the need for this, either.  The existing cpu
   hotplug notifier in the scheduler takes care of initializing the sched
   domains and groups appropriately for online/offline events; why do you
   need to touch the runqueue structures?
  If a CPU is physically hotremoved from the system, shouldn't we clean
  its runqueue?
 
 No.  It should make zero difference to the scheduler whether the play
 dead cpu hotplug or physical hotplug is being used.  
Keeping some fields like 'cpu_load' are meanless for a hotadded CPU to
me. Just ignore them?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/6]SEP initialization rework

2005-04-04 Thread Li Shaohua
On Tue, 2005-04-05 at 03:10, Zwane Mwaikambo wrote:
 On Mon, 4 Apr 2005, Li Shaohua wrote:
 
   linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
   linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   10 ++
   linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++
   3 files changed, 18 insertions(+), 4 deletions(-)
  
  diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
  arch/i386/kernel/sysenter.c
  --- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   
  2005-03-28 09:32:30.936304248 +0800
  +++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-03-28 
  09:58:20.703703792 +0800
  @@ -26,6 +26,11 @@ void enable_sep_cpu(void *info)
  int cpu = get_cpu();
  struct tss_struct *tss = per_cpu(init_tss, cpu);
   
  +   if (!boot_cpu_has(X86_FEATURE_SEP)) {
  +   put_cpu();
  +   return;
  +   }
  +
 
 Do you have systems like this? Is it really skipping SEP if the boot 
 processor doesn't have SEP?
No, I haven't such system. This is the logic of original SEP
initialization. If the CPU hasn't SEP, original logic doesn't call
'on_each_cpu(enable_sep_cpu,...)'.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-03 Thread Li Shaohua
Hi,
On Mon, 2005-04-04 at 13:28, Nathan Lynch wrote:
> On Mon, Apr 04, 2005 at 10:07:02AM +0800, Li Shaohua wrote:
> > Clean up all CPU states including its runqueue and idle thread, 
> > so we can use boot time code without any changes.
> > Note this makes /sys/devices/system/cpu/cpux/online unworkable.
> 
> In what sense does it make the online attribute unworkable?
I removed the idle thread and other CPU states, and makes the dead CPU
into a 'halt' busy loop. 

> 
> > diff -puN kernel/exit.c~cpu_state_clean kernel/exit.c
> > --- linux-2.6.11/kernel/exit.c~cpu_state_clean  2005-03-31 
> > 10:50:27.0 +0800
> > +++ linux-2.6.11-root/kernel/exit.c 2005-03-31 10:50:27.0 +0800
> > @@ -845,6 +845,65 @@ fastcall NORET_TYPE void do_exit(long co
> > for (;;) ;
> >  }
> >  
> > +#ifdef CONFIG_STR_SMP
> > +void do_exit_idle(void)
> > +{
> > +   struct task_struct *tsk = current;
> > +   int group_dead;
> > +
> > +   BUG_ON(tsk->pid);
> > +   BUG_ON(tsk->mm);
> > +
> > +   if (tsk->io_context)
> > +   exit_io_context();
> > +   tsk->flags |= PF_EXITING;
> > +   tsk->it_virt_expires = cputime_zero;
> > +   tsk->it_prof_expires = cputime_zero;
> > +   tsk->it_sched_expires = 0;
> > +
> > +   acct_update_integrals(tsk);
> > +   update_mem_hiwater(tsk);
> > +   group_dead = atomic_dec_and_test(>signal->live);
> > +   if (group_dead) {
> > +   del_timer_sync(>signal->real_timer);
> > +   acct_process(-1);
> > +   }
> > +   exit_mm(tsk);
> > +
> > +   exit_sem(tsk);
> > +   __exit_files(tsk);
> > +   __exit_fs(tsk);
> > +   exit_namespace(tsk);
> > +   exit_thread();
> > +   exit_keys(tsk);
> > +
> > +   if (group_dead && tsk->signal->leader)
> > +   disassociate_ctty(1);
> > +
> > +   module_put(tsk->thread_info->exec_domain->module);
> > +   if (tsk->binfmt)
> > +   module_put(tsk->binfmt->module);
> > +
> > +   tsk->exit_code = -1;
> > +   tsk->exit_state = EXIT_DEAD;
> > +
> > +   /* in release_task */
> > +   atomic_dec(>user->processes);
> > +   write_lock_irq(_lock);
> > +   __exit_signal(tsk);
> > +   __exit_sighand(tsk);
> > +   write_unlock_irq(_lock);
> > +   release_thread(tsk);
> > +   put_task_struct(tsk);
> > +
> > +   tsk->flags |= PF_DEAD;
> > +#ifdef CONFIG_NUMA
> > +   mpol_free(tsk->mempolicy);
> > +   tsk->mempolicy = NULL;
> > +#endif
> > +}
> > +#endif
> 
> I don't understand why this is needed at all.  It looks like a fair
> amount of code from do_exit is being duplicated here.  
Yes, exactly. Someone who understand do_exit please help clean up the
code. I'd like to remove the idle thread, since the smpboot code will
create a new idle thread.

> We've been
> doing cpu removal on ppc64 logical partitions for a while and never
> needed to do anything like this. 
Did it remove idle thread? or dead cpu is in a busy loop of idle?

>  Maybe idle_task_exit would suffice?
idle_task_exit seems just drop mm. We need destroy the idle task for
physical CPU hotplug, right?

> 
> 
> > diff -puN kernel/sched.c~cpu_state_clean kernel/sched.c
> > --- linux-2.6.11/kernel/sched.c~cpu_state_clean 2005-03-31 
> > 10:50:27.0 +0800
> > +++ linux-2.6.11-root/kernel/sched.c2005-04-04 09:06:40.362357104 
> > +0800
> > @@ -4028,6 +4028,58 @@ void __devinit init_idle(task_t *idle, i
> >  }
> >  
> >  /*
> > + * Initial dummy domain for early boot and for hotplug cpu. Being static,
> > + * it is initialized to zero, so all balancing flags are cleared which is
> > + * what we want.
> > + */
> > +static struct sched_domain sched_domain_dummy;
> > +
> > +#ifdef CONFIG_STR_SMP
> > +static void __devinit exit_idle(int cpu)
> > +{
> > +   runqueue_t *rq = cpu_rq(cpu);
> > +   struct task_struct *p = rq->idle;
> > +   int j, k;
> > +   prio_array_t *array;
> > +
> > +   /* init runqueue */
> > +   spin_lock_init(>lock);
> > +   rq->active = rq->arrays;
> > +   rq->expired = rq->arrays + 1;
> > +   rq->best_expired_prio = MAX_PRIO;
> > +
> > +   rq->prev_mm = NULL;
> > +   rq->curr = rq->idle = NULL;
> > +   rq->expired_timestamp = 0;
> > +
> > +   rq->sd = _domain_dummy;
> > +   rq->cpu_load = 0;
> > +   rq->active_balance = 0;
> > +   rq->push_cpu = 0;
>

Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-03 Thread Li Shaohua
On Mon, 2005-04-04 at 10:48, Andrew Morton wrote:
> Li Shaohua <[EMAIL PROTECTED]> wrote:
> >
> > On Mon, 2005-04-04 at 10:37, Andrew Morton wrote:
> > > Li Shaohua <[EMAIL PROTECTED]> wrote:
> > > >
> > > > The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm
> > > >  tree.
> > > 
> > > Should I merge that thing into mainline?  It seems that a few people are
> > > needing it.
> > I'd like to listen to some comments first. There are still some things
> > I'm not sure, such as the do_exit_idle.
> > 
> 
> I was referring to Zwane's i386-cpu-hotplug-updated-for-mm.patch
Yep, great. Pavel's swsusp also need it.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-03 Thread Li Shaohua
On Mon, 2005-04-04 at 10:37, Andrew Morton wrote:
> Li Shaohua <[EMAIL PROTECTED]> wrote:
> >
> > The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm
> >  tree.
> 
> Should I merge that thing into mainline?  It seems that a few people are
> needing it.
I'd like to listen to some comments first. There are still some things
I'm not sure, such as the do_exit_idle.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 6/6]Physcial CPU hotadd and S3 SMP support

2005-04-03 Thread Li Shaohua
Boot a CPU at runtime and use it to support S3 SMP.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   79 +++
 linux-2.6.11-root/include/asm-i386/smp.h |4 +
 linux-2.6.11-root/kernel/power/main.c|   30 ++
 3 files changed, 104 insertions(+), 9 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~warmboot_cpu arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~warmboot_cpu2005-04-04 
09:13:48.600255048 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 
09:13:48.607253984 +0800
@@ -76,6 +76,12 @@ cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
 static cpumask_t smp_commenced_mask;
 
+/* This is ugly, but TSC's upper 32 bits can't be written in eariler CPU
+ * (before prescott), there is no way to resync one AP against BP
+ * TBD: for prescott and above, we should use IA64's algorithm
+ */
+static int __devinit tsc_sync_disabled;
+
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
@@ -412,7 +418,7 @@ static void __devinit smp_callin(void)
/*
 *  Synchronize the TSC with the BP
 */
-   if (cpu_has_tsc && cpu_khz)
+   if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled)
synchronize_tsc_ap();
 }
 
@@ -781,8 +787,19 @@ wakeup_secondary_cpu(int phys_apicid, un
 #endif /* WAKE_SECONDARY_VIA_INIT */
 
 extern cpumask_t cpu_initialized;
+static inline int alloc_cpu_id(void)
+{
+   cpumask_t   tmp_map;
+   int cpu;
 
-static int __devinit do_boot_cpu(int apicid)
+   cpus_complement(tmp_map, cpu_present_map);
+   cpu = first_cpu(tmp_map);
+   if (cpu >= NR_CPUS)
+   return -ENODEV;
+   return cpu;
+}
+
+static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -791,15 +808,10 @@ static int __devinit do_boot_cpu(int api
 {
struct task_struct *idle;
unsigned long boot_error;
-   int timeout, cpu;
+   int timeout;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;
-   cpumask_t   tmp_map;
 
-   cpus_complement(tmp_map, cpu_present_map);
-   cpu = first_cpu(tmp_map);
-   if (cpu >= NR_CPUS)
-   return -ENODEV;
++cpucount;
/*
 * We can't use kernel_thread since we must avoid to
@@ -920,6 +932,53 @@ void cpu_exit_clear(int cpu)
 
do_exit_idle();
 }
+
+struct warm_boot_cpu_info {
+   struct completion *complete;
+   int apicid;
+   int cpu;
+};
+
+static void __devinit do_warm_boot_cpu(void *p)
+{
+   struct warm_boot_cpu_info *info = p;
+   do_boot_cpu(info->apicid, info->cpu);
+   complete(info->complete);
+}
+
+int __devinit smp_prepare_cpu(int apicid)
+{
+   DECLARE_COMPLETION(done);
+   struct warm_boot_cpu_info info;
+   struct work_struct task;
+   int cpu;
+
+   lock_cpu_hotplug();
+   cpu = alloc_cpu_id();
+
+   if (cpu < 0)
+   goto exit;
+
+   info.complete = 
+   info.apicid = apicid;
+   info.cpu = cpu;
+   INIT_WORK(, do_warm_boot_cpu, );
+
+   tsc_sync_disabled = 1;
+
+   /* init low mem mapping */
+   memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+   sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+   flush_tlb_all();
+   schedule_work();
+   wait_for_completion();
+
+   tsc_sync_disabled = 0;
+   zap_low_mappings();
+exit:
+   unlock_cpu_hotplug();
+   return cpu;
+}
 #endif
 static void smp_tune_scheduling (void)
 {
@@ -1064,7 +1123,7 @@ static void __init smp_boot_cpus(unsigne
if (max_cpus <= cpucount+1)
continue;
 
-   if (do_boot_cpu(apicid))
+   if (((cpu = alloc_cpu_id()) > 0) && do_boot_cpu(apicid, cpu))
printk("CPU #%d not responding - cannot use it.\n",
apicid);
else
@@ -1253,10 +1312,12 @@ void __init smp_cpus_done(unsigned int m
setup_ioapic_dest();
 #endif
zap_low_mappings();
+#ifndef CONFIG_STR_SMP
/*
 * Disable executability of the SMP trampoline:
 */
set_kernel_exec((unsigned long)trampoline_base, trampoline_exec);
+#endif
 }
 
 void __init smp_intr_init(void)
diff -puN kernel/power/main.c~warmboot_cpu kernel/power/main.c
--- linux-2.6.11/kernel/power/main.c~warmboot_cpu   2005-04-04 
09:13:48.601254896 +0800
+++ linux-2.6.11-root/kernel/power/main.c   2005-04-04 09:13:48.607253984 
+0800
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 #include "power.h"
@@ -137,6 +138,24 @@ static char * pm_states[] = {
 static int enter_state(suspend_state_t state)
 {
int error;
+#ifdef CONFIG_STR_SMP
+  

[RFC 5/6]clean cpu state after hotremove CPU

2005-04-03 Thread Li Shaohua
Clean up all CPU states including its runqueue and idle thread, 
so we can use boot time code without any changes.
Note this makes /sys/devices/system/cpu/cpux/online unworkable.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/cpu/common.c |   12 
 linux-2.6.11-root/arch/i386/kernel/irq.c|5 +
 linux-2.6.11-root/arch/i386/kernel/process.c|   20 +++
 linux-2.6.11-root/arch/i386/kernel/smpboot.c|   44 -
 linux-2.6.11-root/include/asm-i386/irq.h|2 
 linux-2.6.11-root/kernel/exit.c |   59 +++
 linux-2.6.11-root/kernel/sched.c|   61 +---
 7 files changed, 195 insertions(+), 8 deletions(-)

diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c
--- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-03-31 
10:50:27.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-04 
09:07:29.172936768 +0800
@@ -144,12 +144,32 @@ static void poll_idle (void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 #include 
+
+#ifdef CONFIG_STR_SMP
+extern void cpu_exit_clear(int);
+#endif
+
 /* We don't actually take CPU down, just spin without interrupts. */
 static inline void play_dead(void)
 {
+#ifdef CONFIG_STR_SMP
+   cpu_exit_clear(_smp_processor_id());
+#endif
+
/* Ack it */
__get_cpu_var(cpu_state) = CPU_DEAD;
 
+#ifdef CONFIG_STR_SMP
+   /*
+* With physical CPU hotplug, we should halt the CPU
+* Note: release idle task struct requires the CPU doesn't
+* touch stack or anything else.
+*/
+   local_irq_disable();
+   while (1)
+   __asm__ __volatile__ ("hlt": : :"memory");
+#endif
+
/* We shouldn't have to disable interrupts while dead, but
 * some interrupts just don't seem to go away, and this makes
 * it "work" for testing purposes. */
diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-03-31 
10:50:27.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 
09:05:41.699275248 +0800
@@ -794,8 +794,13 @@ static int __devinit do_boot_cpu(int api
int timeout, cpu;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;
+   cpumask_t   tmp_map;
 
-   cpu = ++cpucount;
+   cpus_complement(tmp_map, cpu_present_map);
+   cpu = first_cpu(tmp_map);
+   if (cpu >= NR_CPUS)
+   return -ENODEV;
+   ++cpucount;
/*
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
@@ -867,13 +872,16 @@ static int __devinit do_boot_cpu(int api
inquire_remote_apic(apicid);
}
}
-   x86_cpu_to_apicid[cpu] = apicid;
+
if (boot_error) {
/* Try to put things back the way they were before ... */
unmap_cpu_to_logical_apicid(cpu);
cpu_clear(cpu, cpu_callout_map); /* was set here 
(do_boot_cpu()) */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpucount--;
+   } else {
+   x86_cpu_to_apicid[cpu] = apicid;
+   cpu_set(cpu, cpu_present_map);
}
 
/* mark "stuck" area as not stuck */
@@ -882,6 +890,37 @@ static int __devinit do_boot_cpu(int api
return boot_error;
 }
 
+#ifdef CONFIG_STR_SMP
+extern void do_exit_idle(void);
+extern void cpu_uninit(void);
+void cpu_exit_clear(int cpu)
+{
+   int sibling;
+   cpucount --;
+
+   cpu_uninit();
+
+   irq_ctx_exit(cpu);
+
+   cpu_clear(cpu, cpu_callout_map);
+   cpu_clear(cpu, cpu_callin_map);
+   cpu_clear(cpu, cpu_present_map);
+
+   x86_cpu_to_apicid[cpu] = BAD_APICID;
+
+   for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
+   cpu_clear(cpu, cpu_sibling_map[sibling]);
+   cpus_clear(cpu_sibling_map[cpu]);
+
+   phys_proc_id[cpu] = BAD_APICID;
+
+   cpu_clear(cpu, smp_commenced_mask);
+
+   unmap_cpu_to_logical_apicid(cpu);
+
+   do_exit_idle();
+}
+#endif
 static void smp_tune_scheduling (void)
 {
unsigned long cachesize;   /* kB   */
@@ -1104,6 +1143,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
cpu_set(smp_processor_id(), cpu_online_map);
cpu_set(smp_processor_id(), cpu_callout_map);
+   cpu_set(smp_processor_id(), cpu_present_map);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean  2005-03-31 
10:50:27.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-03-31 
10:50:27.0 +0800
@@ -621,3 +621,15 @@ void __devinit cpu_init (void)
clear_used_math();
mxcsr_feature_mask_init();
 }
+
+#ifdef CONFIG_STR_SMP
+void 

[RFC 3/6]init call cleanup

2005-04-03 Thread Li Shaohua
Trival patch for CPU hotplug. In CPU identify part, only did cleanup for intel
CPUs. Need do for other CPUs if they support S3 SMP.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/apic.c|   14 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/common.c  |   30 +++
 linux-2.6.11-root/arch/i386/kernel/cpu/intel.c   |   10 ++---
 linux-2.6.11-root/arch/i386/kernel/cpu/intel_cacheinfo.c |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/mce.c  |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p4.c   |4 +-
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p5.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/cpu/mcheck/p6.c   |2 -
 linux-2.6.11-root/arch/i386/kernel/process.c |2 -
 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   18 -
 linux-2.6.11-root/arch/i386/kernel/timers/timer_tsc.c|2 -
 11 files changed, 46 insertions(+), 46 deletions(-)

diff -puN arch/i386/kernel/process.c~init_call_cleanup 
arch/i386/kernel/process.c
--- linux-2.6.11/arch/i386/kernel/process.c~init_call_cleanup   2005-03-31 
10:48:40.721107104 +0800
+++ linux-2.6.11-root/arch/i386/kernel/process.c2005-03-31 
10:48:40.745103456 +0800
@@ -242,7 +242,7 @@ static void mwait_idle(void)
}
 }
 
-void __init select_idle_routine(const struct cpuinfo_x86 *c)
+void __devinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_MWAIT)) {
printk("monitor/mwait feature present.\n");
diff -puN arch/i386/kernel/smpboot.c~init_call_cleanup 
arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~init_call_cleanup   2005-03-31 
10:48:40.722106952 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-31 
10:48:40.746103304 +0800
@@ -59,7 +59,7 @@
 #include 
 
 /* Set if we find a B stepping CPU */
-static int __initdata smp_b_stepping;
+static int __devinitdata smp_b_stepping;
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
@@ -103,7 +103,7 @@ DEFINE_PER_CPU(int, cpu_state) = { 0 };
  * has made sure it's suitably aligned.
  */
 
-static unsigned long __init setup_trampoline(void)
+static unsigned long __devinit setup_trampoline(void)
 {
memcpy(trampoline_base, trampoline_data, trampoline_end - 
trampoline_data);
return virt_to_phys(trampoline_base);
@@ -133,7 +133,7 @@ void __init smp_alloc_memory(void)
  * a given CPU
  */
 
-static void __init smp_store_cpu_info(int id)
+static void __devinit smp_store_cpu_info(int id)
 {
struct cpuinfo_x86 *c = cpu_data + id;
 
@@ -327,7 +327,7 @@ extern void calibrate_delay(void);
 
 static atomic_t init_deasserted;
 
-static void __init smp_callin(void)
+static void __devinit smp_callin(void)
 {
int cpuid, phys_id;
unsigned long timeout;
@@ -423,7 +423,7 @@ extern void enable_sep_cpu(void *);
 /*
  * Activate a secondary processor.
  */
-static void __init start_secondary(void *unused)
+static void __devinit start_secondary(void *unused)
 {
int siblings = 0;
int i;
@@ -486,7 +486,7 @@ static void __init start_secondary(void 
  * from the task structure
  * This function must not return.
  */
-void __init initialize_secondary(void)
+void __devinit initialize_secondary(void)
 {
/*
 * We don't actually need to load the full TSS,
@@ -600,7 +600,7 @@ static inline void __inquire_remote_apic
  * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
  * won't ... remember to clear down the APIC, etc later.
  */
-static int __init
+static int __devinit
 wakeup_secondary_cpu(int logical_apicid, unsigned long start_eip)
 {
unsigned long send_status = 0, accept_status = 0;
@@ -646,7 +646,7 @@ wakeup_secondary_cpu(int logical_apicid,
 #endif /* WAKE_SECONDARY_VIA_NMI */
 
 #ifdef WAKE_SECONDARY_VIA_INIT
-static int __init
+static int __devinit
 wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip)
 {
unsigned long send_status = 0, accept_status = 0;
@@ -782,7 +782,7 @@ wakeup_secondary_cpu(int phys_apicid, un
 
 extern cpumask_t cpu_initialized;
 
-static int __init do_boot_cpu(int apicid)
+static int __devinit do_boot_cpu(int apicid)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
diff -puN arch/i386/kernel/cpu/common.c~init_call_cleanup 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~init_call_cleanup
2005-03-31 10:48:40.724106648 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-03-31 
10:48:40.747103152 +0800
@@ -21,9 +21,9 @@
 DEFINE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]);
 EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
 
-static int cachesize_override __initdata = -1;
-static int disable_x86_fxsr __initdata = 0;
-static int disable_x86_serial_nr __initdata = 1;
+static int cachesize_override __devinitdata = 

[RFC 4/6]Add kconfig for S3 SMP

2005-04-03 Thread Li Shaohua
Add kconfig for IA32 S3 SMP.

Thanks,
Shaohua

---

 linux-2.6.11-root/kernel/power/Kconfig |7 +++
 1 files changed, 7 insertions(+)

diff -puN kernel/power/Kconfig~smp_s3_kconfig kernel/power/Kconfig
--- linux-2.6.11/kernel/power/Kconfig~smp_s3_kconfig2005-03-31 
10:49:57.156487160 +0800
+++ linux-2.6.11-root/kernel/power/Kconfig  2005-03-31 10:49:57.158486856 
+0800
@@ -72,3 +72,10 @@ config PM_STD_PARTITION
  suspended image to. It will simply pick the first available swap 
  device.
 
+config STR_SMP
+   bool "Suspend to RAM SMP support (EXPERIMENTAL)"
+   depends on EXPERIMENTAL && ACPI_SLEEP && !X86_64
+   depends on HOTPLUG_CPU
+   default y
+   ---help---
+enable Suspend to RAM SMP support. Some HT systems require this.
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 2/6]cpu_sibling_map rework

2005-04-03 Thread Li Shaohua

Make sibling map init per-cpu. Hotplug CPU may change the map at runtime.
cpuhotplug semaphore should be used to protect the map.


Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   56 +--
 1 files changed, 29 insertions(+), 27 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 
arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup
2005-03-28 16:29:55.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-31 
10:46:51.572700184 +0800
@@ -63,9 +63,12 @@ static int __initdata smp_b_stepping;
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
-int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */
+/* Package ID of each logical CPU */
+int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID};
 EXPORT_SYMBOL(phys_proc_id);
 
+cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
+
 /* bitmap of online cpus */
 cpumask_t cpu_online_map;
 
@@ -422,6 +425,9 @@ extern void enable_sep_cpu(void *);
  */
 static void __init start_secondary(void *unused)
 {
+   int siblings = 0;
+   int i;
+   int self = smp_processor_id();
/*
 * Dont put anything before smp_callin(), SMP
 * booting is too fragile that we want to limit the
@@ -443,6 +449,27 @@ static void __init start_secondary(void 
 * the local TLBs too.
 */
local_flush_tlb();
+
+   /* This must be doen before setting cpu_online_map */
+   if (smp_num_siblings > 1) {
+   for (i = 0; i < NR_CPUS; i++) {
+   if (!cpu_isset(i, cpu_callout_map))
+   continue;
+   if (phys_proc_id[self] == phys_proc_id[i]) {
+   siblings ++;
+   cpu_set(i, cpu_sibling_map[self]);
+   cpu_set(self, cpu_sibling_map[i]);
+   }
+   }
+   } else {
+   siblings ++;
+   cpu_set(self, cpu_sibling_map[self]);
+   }
+
+   if (siblings != smp_num_siblings)
+   printk(KERN_WARNING "WARNING: %d siblings found for CPU%d, 
should be %d\n", siblings, self, smp_num_siblings);
+   wmb();
+
cpu_set(smp_processor_id(), cpu_online_map);
 
/* We can take interrupts now: we're officially "up". */
@@ -893,8 +920,6 @@ static int boot_cpu_logical_apicid;
 /* Where the IO area was mapped on multiquad, always 0 otherwise */
 void *xquad_portio;
 
-cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
-
 static void __init smp_boot_cpus(unsigned int max_cpus)
 {
int apicid, cpu, bit, kicked;
@@ -1049,30 +1074,7 @@ static void __init smp_boot_cpus(unsigne
 */
for (cpu = 0; cpu < NR_CPUS; cpu++)
cpus_clear(cpu_sibling_map[cpu]);
-
-   for (cpu = 0; cpu < NR_CPUS; cpu++) {
-   int siblings = 0;
-   int i;
-   if (!cpu_isset(cpu, cpu_callout_map))
-   continue;
-
-   if (smp_num_siblings > 1) {
-   for (i = 0; i < NR_CPUS; i++) {
-   if (!cpu_isset(i, cpu_callout_map))
-   continue;
-   if (phys_proc_id[cpu] == phys_proc_id[i]) {
-   siblings++;
-   cpu_set(i, cpu_sibling_map[cpu]);
-   }
-   }
-   } else {
-   siblings++;
-   cpu_set(cpu, cpu_sibling_map[cpu]);
-   }
-
-   if (siblings != smp_num_siblings)
-   printk(KERN_WARNING "WARNING: %d siblings found for 
CPU%d, should be %d\n", siblings, cpu, smp_num_siblings);
-   }
+   cpu_set(0, cpu_sibling_map[0]);
 
if (nmi_watchdog == NMI_LOCAL_APIC)
check_nmi_watchdog();
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-03 Thread Li Shaohua
Hi,
The following 6 patches try to add suspend-to-ram (or S3) SMP support
for IA32. It's for support HT based system suspend/resume currently and
most of the code are also useful for physical CPU hotplug.

In a SMP system, after S3 resume, the BP is starting to execute the ACPI
wakeup address just like the UP case. And the APs possibly are in a
BIOS's busy loop. This just looks like the boot time case, we must use a
SIPI circle to wakeup the APs.

We uses the CPU hotplug infrastructure. In order to reuse the SMP boot
code, we clean up all CPU states after the CPU is dead, including its
idle thread, runqueue and other CPU states. Since the CPU is in idle
thread before suspend, we don't require to save and restore after resume
most of the CPU states.

Now the sequences of S3 are:
1. hotremove all APs, put them into idle thread.
2. follow UP S3 code path.
3. warm boot all APs.
4. UP all APs.

The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm
tree. To test the SMP S3, please don't enable MTRR driver (it's SMP
broken for Suspend/resume). And please kill syslogd, there is a bug in
the sususpend/resume refrigerator mechanism, which can be fixed by
swsusp2 refrigerator.
I'm looking forward to your comments. Thanks in advance!

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 1/6]SEP initialization rework

2005-04-03 Thread Li Shaohua

Make SEP init per-cpu, so is hotplug safe.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
 linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   10 ++
 linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
arch/i386/kernel/sysenter.c
--- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   2005-03-28 
09:32:30.936304248 +0800
+++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-03-28 
09:58:20.703703792 +0800
@@ -26,6 +26,11 @@ void enable_sep_cpu(void *info)
int cpu = get_cpu();
struct tss_struct *tss = _cpu(init_tss, cpu);
 
+   if (!boot_cpu_has(X86_FEATURE_SEP)) {
+   put_cpu();
+   return;
+   }
+
tss->ss1 = __KERNEL_CS;
tss->esp1 = sizeof(struct tss_struct) + (unsigned long) tss;
wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
@@ -41,7 +46,7 @@ void enable_sep_cpu(void *info)
 extern const char vsyscall_int80_start, vsyscall_int80_end;
 extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
 
-static int __init sysenter_setup(void)
+int __init sysenter_setup(void)
 {
void *page = (void *)get_zeroed_page(GFP_ATOMIC);
 
@@ -58,8 +63,5 @@ static int __init sysenter_setup(void)
   _sysenter_start,
   _sysenter_end - _sysenter_start);
 
-   on_each_cpu(enable_sep_cpu, NULL, 1, 1);
return 0;
 }
-
-__initcall(sysenter_setup);
diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-03-28 
09:33:49.972288952 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 
09:46:01.814032096 +0800
@@ -415,6 +415,8 @@ static void __init smp_callin(void)
 
 static int cpucount;
 
+extern int sysenter_setup(void);
+extern void enable_sep_cpu(void *);
 /*
  * Activate a secondary processor.
  */
@@ -445,6 +447,7 @@ static void __init start_secondary(void 
 
/* We can take interrupts now: we're officially "up". */
local_irq_enable();
+   enable_sep_cpu(NULL);
 
wmb();
cpu_idle();
@@ -913,6 +916,9 @@ static void __init smp_boot_cpus(unsigne
cpus_clear(cpu_sibling_map[0]);
cpu_set(0, cpu_sibling_map[0]);
 
+   sysenter_setup();
+   enable_sep_cpu(NULL);
+
/*
 * If we couldn't find an SMP configuration at boot time,
 * get out of here now!
diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 
arch/i386/mach-voyager/voyager_smp.c
--- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup  
2005-03-28 09:48:27.909822160 +0800
+++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c  2005-03-28 
09:51:37.896939728 +0800
@@ -441,6 +441,8 @@ setup_trampoline(void)
return virt_to_phys((__u8 *)trampoline_base);
 }
 
+extern void enable_sep_cpu(void *);
+extern int sysenter_setup(void);
 /* Routine initially called when a non-boot CPU is brought online */
 static void __init
 start_secondary(void *unused)
@@ -499,6 +501,7 @@ start_secondary(void *unused)
while (!cpu_isset(cpuid, smp_commenced_mask))
rep_nop();
local_irq_enable();
+   enable_sep_cpu(NULL);
 
local_flush_tlb();
 
@@ -696,6 +699,9 @@ smp_boot_cpus(void)
printk("CPU%d: ", boot_cpu_id);
print_cpu_info(_data[boot_cpu_id]);
 
+   sysenter_setup();
+   enable_sep_cpu(NULL);
+
if(is_cpu_quad()) {
/* booting on a Quad CPU */
printk("VOYAGER SMP: Boot CPU is Quad\n");
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 1/6]SEP initialization rework

2005-04-03 Thread Li Shaohua

Make SEP init per-cpu, so is hotplug safe.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c   |6 ++
 linux-2.6.11-root/arch/i386/kernel/sysenter.c  |   10 ++
 linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c |6 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff -puN arch/i386/kernel/sysenter.c~sep_init_cleanup 
arch/i386/kernel/sysenter.c
--- linux-2.6.11/arch/i386/kernel/sysenter.c~sep_init_cleanup   2005-03-28 
09:32:30.936304248 +0800
+++ linux-2.6.11-root/arch/i386/kernel/sysenter.c   2005-03-28 
09:58:20.703703792 +0800
@@ -26,6 +26,11 @@ void enable_sep_cpu(void *info)
int cpu = get_cpu();
struct tss_struct *tss = per_cpu(init_tss, cpu);
 
+   if (!boot_cpu_has(X86_FEATURE_SEP)) {
+   put_cpu();
+   return;
+   }
+
tss-ss1 = __KERNEL_CS;
tss-esp1 = sizeof(struct tss_struct) + (unsigned long) tss;
wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
@@ -41,7 +46,7 @@ void enable_sep_cpu(void *info)
 extern const char vsyscall_int80_start, vsyscall_int80_end;
 extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
 
-static int __init sysenter_setup(void)
+int __init sysenter_setup(void)
 {
void *page = (void *)get_zeroed_page(GFP_ATOMIC);
 
@@ -58,8 +63,5 @@ static int __init sysenter_setup(void)
   vsyscall_sysenter_start,
   vsyscall_sysenter_end - vsyscall_sysenter_start);
 
-   on_each_cpu(enable_sep_cpu, NULL, 1, 1);
return 0;
 }
-
-__initcall(sysenter_setup);
diff -puN arch/i386/kernel/smpboot.c~sep_init_cleanup arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sep_init_cleanup2005-03-28 
09:33:49.972288952 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-28 
09:46:01.814032096 +0800
@@ -415,6 +415,8 @@ static void __init smp_callin(void)
 
 static int cpucount;
 
+extern int sysenter_setup(void);
+extern void enable_sep_cpu(void *);
 /*
  * Activate a secondary processor.
  */
@@ -445,6 +447,7 @@ static void __init start_secondary(void 
 
/* We can take interrupts now: we're officially up. */
local_irq_enable();
+   enable_sep_cpu(NULL);
 
wmb();
cpu_idle();
@@ -913,6 +916,9 @@ static void __init smp_boot_cpus(unsigne
cpus_clear(cpu_sibling_map[0]);
cpu_set(0, cpu_sibling_map[0]);
 
+   sysenter_setup();
+   enable_sep_cpu(NULL);
+
/*
 * If we couldn't find an SMP configuration at boot time,
 * get out of here now!
diff -puN arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup 
arch/i386/mach-voyager/voyager_smp.c
--- linux-2.6.11/arch/i386/mach-voyager/voyager_smp.c~sep_init_cleanup  
2005-03-28 09:48:27.909822160 +0800
+++ linux-2.6.11-root/arch/i386/mach-voyager/voyager_smp.c  2005-03-28 
09:51:37.896939728 +0800
@@ -441,6 +441,8 @@ setup_trampoline(void)
return virt_to_phys((__u8 *)trampoline_base);
 }
 
+extern void enable_sep_cpu(void *);
+extern int sysenter_setup(void);
 /* Routine initially called when a non-boot CPU is brought online */
 static void __init
 start_secondary(void *unused)
@@ -499,6 +501,7 @@ start_secondary(void *unused)
while (!cpu_isset(cpuid, smp_commenced_mask))
rep_nop();
local_irq_enable();
+   enable_sep_cpu(NULL);
 
local_flush_tlb();
 
@@ -696,6 +699,9 @@ smp_boot_cpus(void)
printk(CPU%d: , boot_cpu_id);
print_cpu_info(cpu_data[boot_cpu_id]);
 
+   sysenter_setup();
+   enable_sep_cpu(NULL);
+
if(is_cpu_quad()) {
/* booting on a Quad CPU */
printk(VOYAGER SMP: Boot CPU is Quad\n);
_


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-03 Thread Li Shaohua
Hi,
The following 6 patches try to add suspend-to-ram (or S3) SMP support
for IA32. It's for support HT based system suspend/resume currently and
most of the code are also useful for physical CPU hotplug.

In a SMP system, after S3 resume, the BP is starting to execute the ACPI
wakeup address just like the UP case. And the APs possibly are in a
BIOS's busy loop. This just looks like the boot time case, we must use a
SIPI circle to wakeup the APs.

We uses the CPU hotplug infrastructure. In order to reuse the SMP boot
code, we clean up all CPU states after the CPU is dead, including its
idle thread, runqueue and other CPU states. Since the CPU is in idle
thread before suspend, we don't require to save and restore after resume
most of the CPU states.

Now the sequences of S3 are:
1. hotremove all APs, put them into idle thread.
2. follow UP S3 code path.
3. warm boot all APs.
4. UP all APs.

The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm
tree. To test the SMP S3, please don't enable MTRR driver (it's SMP
broken for Suspend/resume). And please kill syslogd, there is a bug in
the sususpend/resume refrigerator mechanism, which can be fixed by
swsusp2 refrigerator.
I'm looking forward to your comments. Thanks in advance!

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 2/6]cpu_sibling_map rework

2005-04-03 Thread Li Shaohua

Make sibling map init per-cpu. Hotplug CPU may change the map at runtime.
cpuhotplug semaphore should be used to protect the map.


Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   56 +--
 1 files changed, 29 insertions(+), 27 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~sibling_map_init_cleanup 
arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~sibling_map_init_cleanup
2005-03-28 16:29:55.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-03-31 
10:46:51.572700184 +0800
@@ -63,9 +63,12 @@ static int __initdata smp_b_stepping;
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
-int phys_proc_id[NR_CPUS]; /* Package ID of each logical CPU */
+/* Package ID of each logical CPU */
+int phys_proc_id[NR_CPUS] = {[0 ... NR_CPUS-1] = BAD_APICID};
 EXPORT_SYMBOL(phys_proc_id);
 
+cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
+
 /* bitmap of online cpus */
 cpumask_t cpu_online_map;
 
@@ -422,6 +425,9 @@ extern void enable_sep_cpu(void *);
  */
 static void __init start_secondary(void *unused)
 {
+   int siblings = 0;
+   int i;
+   int self = smp_processor_id();
/*
 * Dont put anything before smp_callin(), SMP
 * booting is too fragile that we want to limit the
@@ -443,6 +449,27 @@ static void __init start_secondary(void 
 * the local TLBs too.
 */
local_flush_tlb();
+
+   /* This must be doen before setting cpu_online_map */
+   if (smp_num_siblings  1) {
+   for (i = 0; i  NR_CPUS; i++) {
+   if (!cpu_isset(i, cpu_callout_map))
+   continue;
+   if (phys_proc_id[self] == phys_proc_id[i]) {
+   siblings ++;
+   cpu_set(i, cpu_sibling_map[self]);
+   cpu_set(self, cpu_sibling_map[i]);
+   }
+   }
+   } else {
+   siblings ++;
+   cpu_set(self, cpu_sibling_map[self]);
+   }
+
+   if (siblings != smp_num_siblings)
+   printk(KERN_WARNING WARNING: %d siblings found for CPU%d, 
should be %d\n, siblings, self, smp_num_siblings);
+   wmb();
+
cpu_set(smp_processor_id(), cpu_online_map);
 
/* We can take interrupts now: we're officially up. */
@@ -893,8 +920,6 @@ static int boot_cpu_logical_apicid;
 /* Where the IO area was mapped on multiquad, always 0 otherwise */
 void *xquad_portio;
 
-cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
-
 static void __init smp_boot_cpus(unsigned int max_cpus)
 {
int apicid, cpu, bit, kicked;
@@ -1049,30 +1074,7 @@ static void __init smp_boot_cpus(unsigne
 */
for (cpu = 0; cpu  NR_CPUS; cpu++)
cpus_clear(cpu_sibling_map[cpu]);
-
-   for (cpu = 0; cpu  NR_CPUS; cpu++) {
-   int siblings = 0;
-   int i;
-   if (!cpu_isset(cpu, cpu_callout_map))
-   continue;
-
-   if (smp_num_siblings  1) {
-   for (i = 0; i  NR_CPUS; i++) {
-   if (!cpu_isset(i, cpu_callout_map))
-   continue;
-   if (phys_proc_id[cpu] == phys_proc_id[i]) {
-   siblings++;
-   cpu_set(i, cpu_sibling_map[cpu]);
-   }
-   }
-   } else {
-   siblings++;
-   cpu_set(cpu, cpu_sibling_map[cpu]);
-   }
-
-   if (siblings != smp_num_siblings)
-   printk(KERN_WARNING WARNING: %d siblings found for 
CPU%d, should be %d\n, siblings, cpu, smp_num_siblings);
-   }
+   cpu_set(0, cpu_sibling_map[0]);
 
if (nmi_watchdog == NMI_LOCAL_APIC)
check_nmi_watchdog();
_


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 5/6]clean cpu state after hotremove CPU

2005-04-03 Thread Li Shaohua
Clean up all CPU states including its runqueue and idle thread, 
so we can use boot time code without any changes.
Note this makes /sys/devices/system/cpu/cpux/online unworkable.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/cpu/common.c |   12 
 linux-2.6.11-root/arch/i386/kernel/irq.c|5 +
 linux-2.6.11-root/arch/i386/kernel/process.c|   20 +++
 linux-2.6.11-root/arch/i386/kernel/smpboot.c|   44 -
 linux-2.6.11-root/include/asm-i386/irq.h|2 
 linux-2.6.11-root/kernel/exit.c |   59 +++
 linux-2.6.11-root/kernel/sched.c|   61 +---
 7 files changed, 195 insertions(+), 8 deletions(-)

diff -puN arch/i386/kernel/process.c~cpu_state_clean arch/i386/kernel/process.c
--- linux-2.6.11/arch/i386/kernel/process.c~cpu_state_clean 2005-03-31 
10:50:27.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/process.c2005-04-04 
09:07:29.172936768 +0800
@@ -144,12 +144,32 @@ static void poll_idle (void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 #include asm/nmi.h
+
+#ifdef CONFIG_STR_SMP
+extern void cpu_exit_clear(int);
+#endif
+
 /* We don't actually take CPU down, just spin without interrupts. */
 static inline void play_dead(void)
 {
+#ifdef CONFIG_STR_SMP
+   cpu_exit_clear(_smp_processor_id());
+#endif
+
/* Ack it */
__get_cpu_var(cpu_state) = CPU_DEAD;
 
+#ifdef CONFIG_STR_SMP
+   /*
+* With physical CPU hotplug, we should halt the CPU
+* Note: release idle task struct requires the CPU doesn't
+* touch stack or anything else.
+*/
+   local_irq_disable();
+   while (1)
+   __asm__ __volatile__ (hlt: : :memory);
+#endif
+
/* We shouldn't have to disable interrupts while dead, but
 * some interrupts just don't seem to go away, and this makes
 * it work for testing purposes. */
diff -puN arch/i386/kernel/smpboot.c~cpu_state_clean arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~cpu_state_clean 2005-03-31 
10:50:27.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 
09:05:41.699275248 +0800
@@ -794,8 +794,13 @@ static int __devinit do_boot_cpu(int api
int timeout, cpu;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;
+   cpumask_t   tmp_map;
 
-   cpu = ++cpucount;
+   cpus_complement(tmp_map, cpu_present_map);
+   cpu = first_cpu(tmp_map);
+   if (cpu = NR_CPUS)
+   return -ENODEV;
+   ++cpucount;
/*
 * We can't use kernel_thread since we must avoid to
 * reschedule the child.
@@ -867,13 +872,16 @@ static int __devinit do_boot_cpu(int api
inquire_remote_apic(apicid);
}
}
-   x86_cpu_to_apicid[cpu] = apicid;
+
if (boot_error) {
/* Try to put things back the way they were before ... */
unmap_cpu_to_logical_apicid(cpu);
cpu_clear(cpu, cpu_callout_map); /* was set here 
(do_boot_cpu()) */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpucount--;
+   } else {
+   x86_cpu_to_apicid[cpu] = apicid;
+   cpu_set(cpu, cpu_present_map);
}
 
/* mark stuck area as not stuck */
@@ -882,6 +890,37 @@ static int __devinit do_boot_cpu(int api
return boot_error;
 }
 
+#ifdef CONFIG_STR_SMP
+extern void do_exit_idle(void);
+extern void cpu_uninit(void);
+void cpu_exit_clear(int cpu)
+{
+   int sibling;
+   cpucount --;
+
+   cpu_uninit();
+
+   irq_ctx_exit(cpu);
+
+   cpu_clear(cpu, cpu_callout_map);
+   cpu_clear(cpu, cpu_callin_map);
+   cpu_clear(cpu, cpu_present_map);
+
+   x86_cpu_to_apicid[cpu] = BAD_APICID;
+
+   for_each_cpu_mask(sibling, cpu_sibling_map[cpu])
+   cpu_clear(cpu, cpu_sibling_map[sibling]);
+   cpus_clear(cpu_sibling_map[cpu]);
+
+   phys_proc_id[cpu] = BAD_APICID;
+
+   cpu_clear(cpu, smp_commenced_mask);
+
+   unmap_cpu_to_logical_apicid(cpu);
+
+   do_exit_idle();
+}
+#endif
 static void smp_tune_scheduling (void)
 {
unsigned long cachesize;   /* kB   */
@@ -1104,6 +1143,7 @@ void __devinit smp_prepare_boot_cpu(void
 {
cpu_set(smp_processor_id(), cpu_online_map);
cpu_set(smp_processor_id(), cpu_callout_map);
+   cpu_set(smp_processor_id(), cpu_present_map);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff -puN arch/i386/kernel/cpu/common.c~cpu_state_clean 
arch/i386/kernel/cpu/common.c
--- linux-2.6.11/arch/i386/kernel/cpu/common.c~cpu_state_clean  2005-03-31 
10:50:27.0 +0800
+++ linux-2.6.11-root/arch/i386/kernel/cpu/common.c 2005-03-31 
10:50:27.0 +0800
@@ -621,3 +621,15 @@ void __devinit cpu_init (void)
clear_used_math();
mxcsr_feature_mask_init();
 }
+
+#ifdef CONFIG_STR_SMP
+void 

[RFC 6/6]Physcial CPU hotadd and S3 SMP support

2005-04-03 Thread Li Shaohua
Boot a CPU at runtime and use it to support S3 SMP.

Thanks,
Shaohua

---

 linux-2.6.11-root/arch/i386/kernel/smpboot.c |   79 +++
 linux-2.6.11-root/include/asm-i386/smp.h |4 +
 linux-2.6.11-root/kernel/power/main.c|   30 ++
 3 files changed, 104 insertions(+), 9 deletions(-)

diff -puN arch/i386/kernel/smpboot.c~warmboot_cpu arch/i386/kernel/smpboot.c
--- linux-2.6.11/arch/i386/kernel/smpboot.c~warmboot_cpu2005-04-04 
09:13:48.600255048 +0800
+++ linux-2.6.11-root/arch/i386/kernel/smpboot.c2005-04-04 
09:13:48.607253984 +0800
@@ -76,6 +76,12 @@ cpumask_t cpu_callin_map;
 cpumask_t cpu_callout_map;
 static cpumask_t smp_commenced_mask;
 
+/* This is ugly, but TSC's upper 32 bits can't be written in eariler CPU
+ * (before prescott), there is no way to resync one AP against BP
+ * TBD: for prescott and above, we should use IA64's algorithm
+ */
+static int __devinit tsc_sync_disabled;
+
 /* Per CPU bogomips and other parameters */
 struct cpuinfo_x86 cpu_data[NR_CPUS] __cacheline_aligned;
 
@@ -412,7 +418,7 @@ static void __devinit smp_callin(void)
/*
 *  Synchronize the TSC with the BP
 */
-   if (cpu_has_tsc  cpu_khz)
+   if (cpu_has_tsc  cpu_khz  !tsc_sync_disabled)
synchronize_tsc_ap();
 }
 
@@ -781,8 +787,19 @@ wakeup_secondary_cpu(int phys_apicid, un
 #endif /* WAKE_SECONDARY_VIA_INIT */
 
 extern cpumask_t cpu_initialized;
+static inline int alloc_cpu_id(void)
+{
+   cpumask_t   tmp_map;
+   int cpu;
 
-static int __devinit do_boot_cpu(int apicid)
+   cpus_complement(tmp_map, cpu_present_map);
+   cpu = first_cpu(tmp_map);
+   if (cpu = NR_CPUS)
+   return -ENODEV;
+   return cpu;
+}
+
+static int __devinit do_boot_cpu(int apicid, int cpu)
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -791,15 +808,10 @@ static int __devinit do_boot_cpu(int api
 {
struct task_struct *idle;
unsigned long boot_error;
-   int timeout, cpu;
+   int timeout;
unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0;
-   cpumask_t   tmp_map;
 
-   cpus_complement(tmp_map, cpu_present_map);
-   cpu = first_cpu(tmp_map);
-   if (cpu = NR_CPUS)
-   return -ENODEV;
++cpucount;
/*
 * We can't use kernel_thread since we must avoid to
@@ -920,6 +932,53 @@ void cpu_exit_clear(int cpu)
 
do_exit_idle();
 }
+
+struct warm_boot_cpu_info {
+   struct completion *complete;
+   int apicid;
+   int cpu;
+};
+
+static void __devinit do_warm_boot_cpu(void *p)
+{
+   struct warm_boot_cpu_info *info = p;
+   do_boot_cpu(info-apicid, info-cpu);
+   complete(info-complete);
+}
+
+int __devinit smp_prepare_cpu(int apicid)
+{
+   DECLARE_COMPLETION(done);
+   struct warm_boot_cpu_info info;
+   struct work_struct task;
+   int cpu;
+
+   lock_cpu_hotplug();
+   cpu = alloc_cpu_id();
+
+   if (cpu  0)
+   goto exit;
+
+   info.complete = done;
+   info.apicid = apicid;
+   info.cpu = cpu;
+   INIT_WORK(task, do_warm_boot_cpu, info);
+
+   tsc_sync_disabled = 1;
+
+   /* init low mem mapping */
+   memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+   sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+   flush_tlb_all();
+   schedule_work(task);
+   wait_for_completion(done);
+
+   tsc_sync_disabled = 0;
+   zap_low_mappings();
+exit:
+   unlock_cpu_hotplug();
+   return cpu;
+}
 #endif
 static void smp_tune_scheduling (void)
 {
@@ -1064,7 +1123,7 @@ static void __init smp_boot_cpus(unsigne
if (max_cpus = cpucount+1)
continue;
 
-   if (do_boot_cpu(apicid))
+   if (((cpu = alloc_cpu_id())  0)  do_boot_cpu(apicid, cpu))
printk(CPU #%d not responding - cannot use it.\n,
apicid);
else
@@ -1253,10 +1312,12 @@ void __init smp_cpus_done(unsigned int m
setup_ioapic_dest();
 #endif
zap_low_mappings();
+#ifndef CONFIG_STR_SMP
/*
 * Disable executability of the SMP trampoline:
 */
set_kernel_exec((unsigned long)trampoline_base, trampoline_exec);
+#endif
 }
 
 void __init smp_intr_init(void)
diff -puN kernel/power/main.c~warmboot_cpu kernel/power/main.c
--- linux-2.6.11/kernel/power/main.c~warmboot_cpu   2005-04-04 
09:13:48.601254896 +0800
+++ linux-2.6.11-root/kernel/power/main.c   2005-04-04 09:13:48.607253984 
+0800
@@ -15,6 +15,7 @@
 #include linux/errno.h
 #include linux/init.h
 #include linux/pm.h
+#include linux/cpu.h
 
 
 #include power.h
@@ -137,6 +138,24 @@ static char * pm_states[] = {
 static int enter_state(suspend_state_t state)
 

Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-03 Thread Li Shaohua
On Mon, 2005-04-04 at 10:37, Andrew Morton wrote:
 Li Shaohua [EMAIL PROTECTED] wrote:
 
  The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm
   tree.
 
 Should I merge that thing into mainline?  It seems that a few people are
 needing it.
I'd like to listen to some comments first. There are still some things
I'm not sure, such as the do_exit_idle.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/6] S3 SMP support with physcial CPU hotplug

2005-04-03 Thread Li Shaohua
On Mon, 2005-04-04 at 10:48, Andrew Morton wrote:
 Li Shaohua [EMAIL PROTECTED] wrote:
 
  On Mon, 2005-04-04 at 10:37, Andrew Morton wrote:
   Li Shaohua [EMAIL PROTECTED] wrote:
   
The patches are against 2.6.11-rc1 with Zwane's CPU hotplug patch in -mm
 tree.
   
   Should I merge that thing into mainline?  It seems that a few people are
   needing it.
  I'd like to listen to some comments first. There are still some things
  I'm not sure, such as the do_exit_idle.
  
 
 I was referring to Zwane's i386-cpu-hotplug-updated-for-mm.patch
Yep, great. Pavel's swsusp also need it.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: [RFC 5/6]clean cpu state after hotremove CPU

2005-04-03 Thread Li Shaohua
Hi,
On Mon, 2005-04-04 at 13:28, Nathan Lynch wrote:
 On Mon, Apr 04, 2005 at 10:07:02AM +0800, Li Shaohua wrote:
  Clean up all CPU states including its runqueue and idle thread, 
  so we can use boot time code without any changes.
  Note this makes /sys/devices/system/cpu/cpux/online unworkable.
 
 In what sense does it make the online attribute unworkable?
I removed the idle thread and other CPU states, and makes the dead CPU
into a 'halt' busy loop. 

 
  diff -puN kernel/exit.c~cpu_state_clean kernel/exit.c
  --- linux-2.6.11/kernel/exit.c~cpu_state_clean  2005-03-31 
  10:50:27.0 +0800
  +++ linux-2.6.11-root/kernel/exit.c 2005-03-31 10:50:27.0 +0800
  @@ -845,6 +845,65 @@ fastcall NORET_TYPE void do_exit(long co
  for (;;) ;
   }
   
  +#ifdef CONFIG_STR_SMP
  +void do_exit_idle(void)
  +{
  +   struct task_struct *tsk = current;
  +   int group_dead;
  +
  +   BUG_ON(tsk-pid);
  +   BUG_ON(tsk-mm);
  +
  +   if (tsk-io_context)
  +   exit_io_context();
  +   tsk-flags |= PF_EXITING;
  +   tsk-it_virt_expires = cputime_zero;
  +   tsk-it_prof_expires = cputime_zero;
  +   tsk-it_sched_expires = 0;
  +
  +   acct_update_integrals(tsk);
  +   update_mem_hiwater(tsk);
  +   group_dead = atomic_dec_and_test(tsk-signal-live);
  +   if (group_dead) {
  +   del_timer_sync(tsk-signal-real_timer);
  +   acct_process(-1);
  +   }
  +   exit_mm(tsk);
  +
  +   exit_sem(tsk);
  +   __exit_files(tsk);
  +   __exit_fs(tsk);
  +   exit_namespace(tsk);
  +   exit_thread();
  +   exit_keys(tsk);
  +
  +   if (group_dead  tsk-signal-leader)
  +   disassociate_ctty(1);
  +
  +   module_put(tsk-thread_info-exec_domain-module);
  +   if (tsk-binfmt)
  +   module_put(tsk-binfmt-module);
  +
  +   tsk-exit_code = -1;
  +   tsk-exit_state = EXIT_DEAD;
  +
  +   /* in release_task */
  +   atomic_dec(tsk-user-processes);
  +   write_lock_irq(tasklist_lock);
  +   __exit_signal(tsk);
  +   __exit_sighand(tsk);
  +   write_unlock_irq(tasklist_lock);
  +   release_thread(tsk);
  +   put_task_struct(tsk);
  +
  +   tsk-flags |= PF_DEAD;
  +#ifdef CONFIG_NUMA
  +   mpol_free(tsk-mempolicy);
  +   tsk-mempolicy = NULL;
  +#endif
  +}
  +#endif
 
 I don't understand why this is needed at all.  It looks like a fair
 amount of code from do_exit is being duplicated here.  
Yes, exactly. Someone who understand do_exit please help clean up the
code. I'd like to remove the idle thread, since the smpboot code will
create a new idle thread.

 We've been
 doing cpu removal on ppc64 logical partitions for a while and never
 needed to do anything like this. 
Did it remove idle thread? or dead cpu is in a busy loop of idle?

  Maybe idle_task_exit would suffice?
idle_task_exit seems just drop mm. We need destroy the idle task for
physical CPU hotplug, right?

 
 
  diff -puN kernel/sched.c~cpu_state_clean kernel/sched.c
  --- linux-2.6.11/kernel/sched.c~cpu_state_clean 2005-03-31 
  10:50:27.0 +0800
  +++ linux-2.6.11-root/kernel/sched.c2005-04-04 09:06:40.362357104 
  +0800
  @@ -4028,6 +4028,58 @@ void __devinit init_idle(task_t *idle, i
   }
   
   /*
  + * Initial dummy domain for early boot and for hotplug cpu. Being static,
  + * it is initialized to zero, so all balancing flags are cleared which is
  + * what we want.
  + */
  +static struct sched_domain sched_domain_dummy;
  +
  +#ifdef CONFIG_STR_SMP
  +static void __devinit exit_idle(int cpu)
  +{
  +   runqueue_t *rq = cpu_rq(cpu);
  +   struct task_struct *p = rq-idle;
  +   int j, k;
  +   prio_array_t *array;
  +
  +   /* init runqueue */
  +   spin_lock_init(rq-lock);
  +   rq-active = rq-arrays;
  +   rq-expired = rq-arrays + 1;
  +   rq-best_expired_prio = MAX_PRIO;
  +
  +   rq-prev_mm = NULL;
  +   rq-curr = rq-idle = NULL;
  +   rq-expired_timestamp = 0;
  +
  +   rq-sd = sched_domain_dummy;
  +   rq-cpu_load = 0;
  +   rq-active_balance = 0;
  +   rq-push_cpu = 0;
  +   rq-migration_thread = NULL;
  +   INIT_LIST_HEAD(rq-migration_queue);
  +   atomic_set(rq-nr_iowait, 0);
  +
  +   for (j = 0; j  2; j++) {
  +   array = rq-arrays + j;
  +   for (k = 0; k  MAX_PRIO; k++) {
  +   INIT_LIST_HEAD(array-queue + k);
  +   __clear_bit(k, array-bitmap);
  +   }
  +   // delimiter for bitsearch
  +   __set_bit(MAX_PRIO, array-bitmap);
  +   }
  +   /* Destroy IDLE thread.
  +* it's safe now, the CPU is in busy loop
  +*/
  +   if (p-active_mm)
  +   mmdrop(p-active_mm);
  +   p-active_mm = NULL;
  +   put_task_struct(p);
  +}
  +#endif
  +
  +/*
* In a system that switches off the HZ timer nohz_cpu_mask
* indicates which cpus entered this state. This is used
* in the rcu update to wait only for active cpus. For system
  @@ -4432,6 +4484,9 @@ static int migration_call(struct notifie
  complete(req-done);
  }
  spin_unlock_irq(rq-lock);
  +#ifdef

Re: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers

2005-03-27 Thread Li Shaohua
On Sun, 2005-03-27 at 02:23, Rafael J. Wysocki wrote:
> Hi,
> 
> On Friday, 25 of March 2005 15:19, Rafael J. Wysocki wrote: 
> > On Friday, 25 of March 2005 13:54, you wrote:
> > ]--snip--[
> > > >My box is still hanged solid on resume (swsusp) by the drivers:
> > > >
> > > >ohci_hcd
> > > >ehci_hcd
> > > >yenta_socket
> > > >
> > > >possibly others, too.  To avoid this, I had to revert the following
> > > patch from the Len's tree:
> > > >
> > > >diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> > > >--- a/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00
> > > >+++ b/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00
> > > >@@ -72,10 +72,12 @@
> > > > u8  active; /* Current IRQ
> > > */
> > > > u8  edge_level; /* All IRQs */
> > > > u8  active_high_low;/* All IRQs */
> > > >-u8  initialized;
> > > > u8  resource_type;
> > > > u8  possible_count;
> > > > u8  possible[ACPI_PCI_LINK_MAX_POSSIBLE];
> > > >+u8  initialized:1;
> > > >+u8  suspend_resume:1;
> > > >+u8  reserved:6;
> > > > };
> > > >
> > > > struct acpi_pci_link {
> > > >@@ -530,6 +532,10 @@
> > > >
> > > > ACPI_FUNCTION_TRACE("acpi_pci_link_allocate");
> > > >
> > > >+if (link->irq.suspend_resume) {
> > > >+acpi_pci_link_set(link, link->irq.active);
> > > >+link->irq.suspend_resume = 0;
> > > >+}
> > > > if (link->irq.initialized)
> > > > return_VALUE(0);
> > > 
> > > How about just remove below line:
> > > >+acpi_pci_link_set(link, link->irq.active);
> > 
> > You mean apply the patch again and remove just the single
> > line?  No effect (ie hangs).
> 
> It looks like removing this line couldn't help.
> 
> Apparently, acpi_pci_link_set(link, link->irq.active) must be called
> _before_ the call to pci_write_config_word() in
> drivers/pci/pci.c:pci_set_power_state(), because the box hangs
> otherwise.  However, with the patch applied,
> acpi_pci_link_set(link, link->irq.active) is only called through
> pcibios_enable_irq() in pcibios_enable_device(), which is _after_
> the call to pci_set_power_state() in pci_enable_device_bars(),
> so it's too late.
> 
> Hence, it seems, if you really want to get rid of the
> irqrouter_resume(), whatever the reason, the simplest fix
> seems to be to change the order of calls to pci_set_power_state()
> and pcibios_enable_device() in pci_enable_device_bars():
> 
> --- old/drivers/pci/pci.c 2005-03-26 19:10:09.0 +0100
> +++ linux-2.6.12-rc1-mm2/drivers/pci/pci.c2005-03-26 19:10:54.0 
> +0100
> @@ -442,9 +442,9 @@ pci_enable_device_bars(struct pci_dev *d
>  {
>   int err;
>  
> - pci_set_power_state(dev, PCI_D0);
>   if ((err = pcibios_enable_device(dev, bars)) < 0)
>   return err;
> + pci_set_power_state(dev, PCI_D0);
>   return 0;
>  }
>  
> though I'm not sure if that's legal.
Hmm, no, pci_set_power_state should be called before
pcibios_enable_device, otherwise enable_device may fail. This is very
strange. In boot time, there also are uninitialized link devices, I'm
wonder why the call of pci_enable_device_bars doesn't fail in boot time.
Did you find the bug only in specific system?

Could you please file a bug in bugzilla? I don't want to lose the
context of thread. And please attach your acpidmp output in the bug.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers

2005-03-27 Thread Li Shaohua
On Sun, 2005-03-27 at 02:23, Rafael J. Wysocki wrote:
 Hi,
 
 On Friday, 25 of March 2005 15:19, Rafael J. Wysocki wrote: 
  On Friday, 25 of March 2005 13:54, you wrote:
  ]--snip--[
   My box is still hanged solid on resume (swsusp) by the drivers:
   
   ohci_hcd
   ehci_hcd
   yenta_socket
   
   possibly others, too.  To avoid this, I had to revert the following
   patch from the Len's tree:
   
   diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
   --- a/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00
   +++ b/drivers/acpi/pci_link.c2005-03-24 04:57:27 -08:00
   @@ -72,10 +72,12 @@
u8  active; /* Current IRQ
   */
u8  edge_level; /* All IRQs */
u8  active_high_low;/* All IRQs */
   -u8  initialized;
u8  resource_type;
u8  possible_count;
u8  possible[ACPI_PCI_LINK_MAX_POSSIBLE];
   +u8  initialized:1;
   +u8  suspend_resume:1;
   +u8  reserved:6;
};
   
struct acpi_pci_link {
   @@ -530,6 +532,10 @@
   
ACPI_FUNCTION_TRACE(acpi_pci_link_allocate);
   
   +if (link-irq.suspend_resume) {
   +acpi_pci_link_set(link, link-irq.active);
   +link-irq.suspend_resume = 0;
   +}
if (link-irq.initialized)
return_VALUE(0);
   
   How about just remove below line:
   +acpi_pci_link_set(link, link-irq.active);
  
  You mean apply the patch again and remove just the single
  line?  No effect (ie hangs).
 
 It looks like removing this line couldn't help.
 
 Apparently, acpi_pci_link_set(link, link-irq.active) must be called
 _before_ the call to pci_write_config_word() in
 drivers/pci/pci.c:pci_set_power_state(), because the box hangs
 otherwise.  However, with the patch applied,
 acpi_pci_link_set(link, link-irq.active) is only called through
 pcibios_enable_irq() in pcibios_enable_device(), which is _after_
 the call to pci_set_power_state() in pci_enable_device_bars(),
 so it's too late.
 
 Hence, it seems, if you really want to get rid of the
 irqrouter_resume(), whatever the reason, the simplest fix
 seems to be to change the order of calls to pci_set_power_state()
 and pcibios_enable_device() in pci_enable_device_bars():
 
 --- old/drivers/pci/pci.c 2005-03-26 19:10:09.0 +0100
 +++ linux-2.6.12-rc1-mm2/drivers/pci/pci.c2005-03-26 19:10:54.0 
 +0100
 @@ -442,9 +442,9 @@ pci_enable_device_bars(struct pci_dev *d
  {
   int err;
  
 - pci_set_power_state(dev, PCI_D0);
   if ((err = pcibios_enable_device(dev, bars))  0)
   return err;
 + pci_set_power_state(dev, PCI_D0);
   return 0;
  }
  
 though I'm not sure if that's legal.
Hmm, no, pci_set_power_state should be called before
pcibios_enable_device, otherwise enable_device may fail. This is very
strange. In boot time, there also are uninitialized link devices, I'm
wonder why the call of pci_enable_device_bars doesn't fail in boot time.
Did you find the bug only in specific system?

Could you please file a bug in bugzilla? I don't want to lose the
context of thread. And please attach your acpidmp output in the bug.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers

2005-03-25 Thread Li, Shaohua
Hi,
>On Friday, 25 of March 2005 09:21, Andrew Morton wrote:
>>
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-
>rc1/2.6.12-rc1-mm3/
>>
>> - Mainly a bunch of fixes relative to 2.6.12-rc1-mm2.
>
>First, rmmod works again (thanks ;-)).
>
>> - Again, we'd like people who have had recent DRM and USB resume
problems
>to
>>   test and report, please.
>
>My box is still hanged solid on resume (swsusp) by the drivers:
>
>ohci_hcd
>ehci_hcd
>yenta_socket
>
>possibly others, too.  To avoid this, I had to revert the following
patch
>from
>the Len's tree:
>
>diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
>--- a/drivers/acpi/pci_link.c  2005-03-24 04:57:27 -08:00
>+++ b/drivers/acpi/pci_link.c  2005-03-24 04:57:27 -08:00
>@@ -72,10 +72,12 @@
>   u8  active; /* Current IRQ
*/
>   u8  edge_level; /* All IRQs */
>   u8  active_high_low;/* All IRQs */
>-  u8  initialized;
>   u8  resource_type;
>   u8  possible_count;
>   u8  possible[ACPI_PCI_LINK_MAX_POSSIBLE];
>+  u8  initialized:1;
>+  u8  suspend_resume:1;
>+  u8  reserved:6;
> };
>
> struct acpi_pci_link {
>@@ -530,6 +532,10 @@
>
>   ACPI_FUNCTION_TRACE("acpi_pci_link_allocate");
>
>+  if (link->irq.suspend_resume) {
>+  acpi_pci_link_set(link, link->irq.active);
>+  link->irq.suspend_resume = 0;
>+  }
>   if (link->irq.initialized)
>   return_VALUE(0);

How about just remove below line:
>+  acpi_pci_link_set(link, link->irq.active);

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.12-rc1-mm3: box hangs solid on resume from disk while resuming device drivers

2005-03-25 Thread Li, Shaohua
Hi,
On Friday, 25 of March 2005 09:21, Andrew Morton wrote:

 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-
rc1/2.6.12-rc1-mm3/

 - Mainly a bunch of fixes relative to 2.6.12-rc1-mm2.

First, rmmod works again (thanks ;-)).

 - Again, we'd like people who have had recent DRM and USB resume
problems
to
   test and report, please.

My box is still hanged solid on resume (swsusp) by the drivers:

ohci_hcd
ehci_hcd
yenta_socket

possibly others, too.  To avoid this, I had to revert the following
patch
from
the Len's tree:

diff -Naru a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
--- a/drivers/acpi/pci_link.c  2005-03-24 04:57:27 -08:00
+++ b/drivers/acpi/pci_link.c  2005-03-24 04:57:27 -08:00
@@ -72,10 +72,12 @@
   u8  active; /* Current IRQ
*/
   u8  edge_level; /* All IRQs */
   u8  active_high_low;/* All IRQs */
-  u8  initialized;
   u8  resource_type;
   u8  possible_count;
   u8  possible[ACPI_PCI_LINK_MAX_POSSIBLE];
+  u8  initialized:1;
+  u8  suspend_resume:1;
+  u8  reserved:6;
 };

 struct acpi_pci_link {
@@ -530,6 +532,10 @@

   ACPI_FUNCTION_TRACE(acpi_pci_link_allocate);

+  if (link-irq.suspend_resume) {
+  acpi_pci_link_set(link, link-irq.active);
+  link-irq.suspend_resume = 0;
+  }
   if (link-irq.initialized)
   return_VALUE(0);

How about just remove below line:
+  acpi_pci_link_set(link, link-irq.active);

Thanks,
Shaohua
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)

2005-03-24 Thread Li Shaohua
On Thu, 2005-03-24 at 21:42, Rafael J. Wysocki wrote:
> Hi,
> 
> On Thursday, 24 of March 2005 02:27, Li Shaohua wrote:
> > On Thu, 2005-03-24 at 09:03, Len Brown wrote:
> > > On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote:
> > > > Hi,
> > > > 
> > > > On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote:
> > > > > Hi!
> > > > >
> > > > > > > > > Will this do it for the moment?
> > > > > > > >
> > > > > > > > Its certainly better.
> > > > > > >
> > > > > > > With the Len's patch applied I have to unload the modules:
> > > > > > >
> > > > > > > ohci_hcd
> > > > > > > ehci_hcd
> > > > > > > yenta_socket
> > > > > > >
> > > > > > > before suspend as each of them hangs the box solid during
> > > either
> > > > > > > suspend or resume.  Moreover, when I tried to load the
> > > ehci_hcd
> > > > > > > module back after resume, it hanged the box solid too.
> > > 
> > > Is this failure with suspend to RAM or to disk?
> > > 
> > > How about if you try this patch?
> > > 
> > > http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED]
> > > 
> > > patch -Rp1 from 2.6.12-rc1-mm1 and see if it stops being broken
> > > or patch -Np1 to 2.6.12-rc and see if it starts being broken.
> > > 
> > > This one removes an earlier attempt at resuming PCI links -- now
> > > putting the onus on the drivers to be properly written
> > > to release and acquire their interrupt for a successful
> > > suspend/resume.
> > > 
> > > 
> > > In theory, this is taken care of something like this:
> > > driver.resume
> > > pci_enable_device
> > > pci_enable_device_bars
> > > pcibios_enable_device
> > > pcibios_enable_irq
> > > acpi_pci_irq_enable
> > > 
> > > but if the patch above makes a difference, then theory != practice:-)
> 
> It looks like that. ;-)
> 
> > > I'd believe that ohci_hcd and ehci_hcd are fragile since glancing
> > > at their lengthy .resume routines it isn't immediately obvious
> > > that they do this.  But yenta_dev_resume has a pci_enable_device(),
> > > so that failure may be less straightforward.
> > > 
> > > cheers,
> > > -Len
> > > 
> > > ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled
> > > boot, that would help -- for it will show if we're even using pci
> > > interrupt links (and programming them) for these devices on this box.
> > Yes, we changed the behavior of device suspend/resume. Every PCI device
> > should call 'pci_disable_device' at suspend and call 'pci_enable_device'
> > at resume. It fixes a bug and more important thing is it's safer (Eg. it
> > disable interrupts, bus master and etc).
> > I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and
> > definitely required for S3). Unclear if it's ok for S4, so please try
> > revert the patch.
> 
> 2.6.11-rc1-mm1 with the patch reverted works fine. :-)
So just remove the pci_enable/disable_device call in the driver makes
the system work? Strange, I tried them on two laptops (one HP nx5000,
and one Toshiba M2N), both works (no hang, and USB mouse works after
S3/S4. I didn't try yenta, since I have no pc card) for S3/S4. Is it
possible it's another bug or just because of different BIOS?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)

2005-03-24 Thread Li Shaohua
On Thu, 2005-03-24 at 21:42, Rafael J. Wysocki wrote:
 Hi,
 
 On Thursday, 24 of March 2005 02:27, Li Shaohua wrote:
  On Thu, 2005-03-24 at 09:03, Len Brown wrote:
   On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote:
Hi,

On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote:
 Hi!

 Will this do it for the moment?
   
Its certainly better.
  
   With the Len's patch applied I have to unload the modules:
  
   ohci_hcd
   ehci_hcd
   yenta_socket
  
   before suspend as each of them hangs the box solid during
   either
   suspend or resume.  Moreover, when I tried to load the
   ehci_hcd
   module back after resume, it hanged the box solid too.
   
   Is this failure with suspend to RAM or to disk?
   
   How about if you try this patch?
   
   http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED]
   
   patch -Rp1 from 2.6.12-rc1-mm1 and see if it stops being broken
   or patch -Np1 to 2.6.12-rc and see if it starts being broken.
   
   This one removes an earlier attempt at resuming PCI links -- now
   putting the onus on the drivers to be properly written
   to release and acquire their interrupt for a successful
   suspend/resume.
   
   
   In theory, this is taken care of something like this:
   driver.resume
   pci_enable_device
   pci_enable_device_bars
   pcibios_enable_device
   pcibios_enable_irq
   acpi_pci_irq_enable
   
   but if the patch above makes a difference, then theory != practice:-)
 
 It looks like that. ;-)
 
   I'd believe that ohci_hcd and ehci_hcd are fragile since glancing
   at their lengthy .resume routines it isn't immediately obvious
   that they do this.  But yenta_dev_resume has a pci_enable_device(),
   so that failure may be less straightforward.
   
   cheers,
   -Len
   
   ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled
   boot, that would help -- for it will show if we're even using pci
   interrupt links (and programming them) for these devices on this box.
  Yes, we changed the behavior of device suspend/resume. Every PCI device
  should call 'pci_disable_device' at suspend and call 'pci_enable_device'
  at resume. It fixes a bug and more important thing is it's safer (Eg. it
  disable interrupts, bus master and etc).
  I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and
  definitely required for S3). Unclear if it's ok for S4, so please try
  revert the patch.
 
 2.6.11-rc1-mm1 with the patch reverted works fine. :-)
So just remove the pci_enable/disable_device call in the driver makes
the system work? Strange, I tried them on two laptops (one HP nx5000,
and one Toshiba M2N), both works (no hang, and USB mouse works after
S3/S4. I didn't try yenta, since I have no pc card) for S3/S4. Is it
possible it's another bug or just because of different BIOS?

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389

2005-03-23 Thread Li Shaohua
On Tue, 2005-03-22 at 20:20, Pavel Machek wrote:
> Hi!
> 
> > >> > Yes, but it is needed. There are many drivers, and they look at
> > >> > numerical value of PMSG_*. I'm proceeding in steps. I hopefully
> > killed
> > >> > all direct accesses to the constants, and will switch constants
> to
> > >> > something else... But that is going to be tommorow (need some
> > sleep).
> > >> The patches are going to acquire correct PCI device sleep state
> for
> > >> suspend/resume. We discussed the issue several months ago. My
> plan is
> > we
> > >> first introduce 'platform_pci_set_power_state', then merge the
> > >> 'platform_pci_choose_state' patch after Pavel's pm_message_t
> > conversion
> > >> finished. Maybe Len mislead my comments.
> > >>
> > >> Anyway for the callback, my intend is platform_pci_choose_state
> > accept
> > >> the pm_message_t parameter, and it return an 'int', since
> platform
> > >> method possibly failed and then pci_choose_state translate the
> return
> > >> value to pci_power_t.
> > >
> > >You can't just retype around like that. You may want it take
> > >pci_power_t * as an argument, and then return 0/-ENODEV or
> something
> > >like that. But you can't retype between int and pm_message_t...
> > No, taking pci_power_t as an argument is meaningless. For ACPI, we
> > should know the exact sleep state, pm_message_t will tell us. But
> I'm ok
> > to let it return a pci_power_t, and the failure case returns
> > -ENODEV.
> 
> You can't put -ENODEV into pci_power_t ... but maybe we should create
> PCI_ERROR and pass it in cases like this one?
That makes sense, please do it.

> 
> > >> > Could you just revert those two patches? First one is very
> > >> > wrong. Second one might be fixed, but... See comments below.
> > >> I think the platform_pci_set_power_state should be ok, did you
> see it
> > >> causes oops?
> > >
> > >No its just ugly and uses __force in "creative" way. That one can
> be
> > >recovered.
> > Do you mean this?
> > 
> > > +   static int state_conv[] = {
> > > +   [0] = 0,
> > > +   [1] = 1,
> > > +   [2] = 2,
> > > +   [3] = 3,
> > > +   [4] = 3
> > > +   };
> > > +   int acpi_state = state_conv[(int __force) state];
> > 
> > The table should be
> >   [PCI_D0] = 0,
> > 
> > I'm not sure, but then could we use state_conv[state] directly? It
> seems
> 
> I think so. Of course it is wrong, but it is less wrong than forcing
> it to integer than index, without using macros at all.
> 
> Or perhaps you should do
> 
> switch (state) {
> case PCI_D0: ...
> }
> 
> ...and handle default case somehow.
That's ok for me. I'll change it later.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)

2005-03-23 Thread Li Shaohua
On Thu, 2005-03-24 at 09:03, Len Brown wrote:
> On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote:
> > Hi,
> > 
> > On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote:
> > > Hi!
> > >
> > > > > > > Will this do it for the moment?
> > > > > >
> > > > > > Its certainly better.
> > > > >
> > > > > With the Len's patch applied I have to unload the modules:
> > > > >
> > > > > ohci_hcd
> > > > > ehci_hcd
> > > > > yenta_socket
> > > > >
> > > > > before suspend as each of them hangs the box solid during
> either
> > > > > suspend or resume.  Moreover, when I tried to load the
> ehci_hcd
> > > > > module back after resume, it hanged the box solid too.
> 
> Is this failure with suspend to RAM or to disk?
> 
> How about if you try this patch?
> 
> http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED]
> 
> patch -Rp1 from 2.6.12-rc1-mm and see if it stops being broken
> or patch -Np1 to 2.6.12-rc and see if it starts being broken.
> 
> This one removes an earlier attempt at resuming PCI links -- now
> putting the onus on the drivers to be properly written
> to release and acquire their interrupt for a successful
> suspend/resume.
> 
> 
> In theory, this is taken care of something like this:
> driver.resume
> pci_enable_device
> pci_enable_device_bars
> pcibios_enable_device
> pcibios_enable_irq
> acpi_pci_irq_enable
> 
> but if the patch above makes a difference, then theory != practice:-)
> 
> I'd believe that ohci_hcd and ehci_hcd are fragile since glancing
> at their lengthy .resume routines it isn't immediately obvious
> that they do this.  But yenta_dev_resume has a pci_enable_device(),
> so that failure may be less straightforward.
> 
> cheers,
> -Len
> 
> ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled
> boot, that would help -- for it will show if we're even using pci
> interrupt links (and programming them) for these devices on this box.
Yes, we changed the behavior of device suspend/resume. Every PCI device
should call 'pci_disable_device' at suspend and call 'pci_enable_device'
at resume. It fixes a bug and more important thing is it's safer (Eg. it
disable interrupts, bus master and etc).
I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and
definitely required for S3). Unclear if it's ok for S4, so please try
revert the patch.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: resume regression [update] (was: Re:2.6.12-rc1-mm1: Kernel BUG at pci:389)

2005-03-23 Thread Li Shaohua
On Thu, 2005-03-24 at 09:03, Len Brown wrote:
 On Wed, 2005-03-23 at 18:49, Rafael J. Wysocki wrote:
  Hi,
  
  On Wednesday, 23 of March 2005 23:39, Pavel Machek wrote:
   Hi!
  
   Will this do it for the moment?
 
  Its certainly better.

 With the Len's patch applied I have to unload the modules:

 ohci_hcd
 ehci_hcd
 yenta_socket

 before suspend as each of them hangs the box solid during
 either
 suspend or resume.  Moreover, when I tried to load the
 ehci_hcd
 module back after resume, it hanged the box solid too.
 
 Is this failure with suspend to RAM or to disk?
 
 How about if you try this patch?
 
 http://linux-acpi.bkbits.net:8080/to-akpm/[EMAIL PROTECTED]
 
 patch -Rp1 from 2.6.12-rc1-mm and see if it stops being broken
 or patch -Np1 to 2.6.12-rc and see if it starts being broken.
 
 This one removes an earlier attempt at resuming PCI links -- now
 putting the onus on the drivers to be properly written
 to release and acquire their interrupt for a successful
 suspend/resume.
 
 
 In theory, this is taken care of something like this:
 driver.resume
 pci_enable_device
 pci_enable_device_bars
 pcibios_enable_device
 pcibios_enable_irq
 acpi_pci_irq_enable
 
 but if the patch above makes a difference, then theory != practice:-)
 
 I'd believe that ohci_hcd and ehci_hcd are fragile since glancing
 at their lengthy .resume routines it isn't immediately obvious
 that they do this.  But yenta_dev_resume has a pci_enable_device(),
 so that failure may be less straightforward.
 
 cheers,
 -Len
 
 ps. if point me to a full dmesg -s64000 from 2.6.12-rc1 acpi-enabled
 boot, that would help -- for it will show if we're even using pci
 interrupt links (and programming them) for these devices on this box.
Yes, we changed the behavior of device suspend/resume. Every PCI device
should call 'pci_disable_device' at suspend and call 'pci_enable_device'
at resume. It fixes a bug and more important thing is it's safer (Eg. it
disable interrupts, bus master and etc).
I actually added such calls in uhci, ehci and yenta. It's ok for S3 (and
definitely required for S3). Unclear if it's ok for S4, so please try
revert the patch.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389

2005-03-23 Thread Li Shaohua
On Tue, 2005-03-22 at 20:20, Pavel Machek wrote:
 Hi!
 
Yes, but it is needed. There are many drivers, and they look at
numerical value of PMSG_*. I'm proceeding in steps. I hopefully
  killed
all direct accesses to the constants, and will switch constants
 to
something else... But that is going to be tommorow (need some
  sleep).
   The patches are going to acquire correct PCI device sleep state
 for
   suspend/resume. We discussed the issue several months ago. My
 plan is
  we
   first introduce 'platform_pci_set_power_state', then merge the
   'platform_pci_choose_state' patch after Pavel's pm_message_t
  conversion
   finished. Maybe Len mislead my comments.
  
   Anyway for the callback, my intend is platform_pci_choose_state
  accept
   the pm_message_t parameter, and it return an 'int', since
 platform
   method possibly failed and then pci_choose_state translate the
 return
   value to pci_power_t.
  
  You can't just retype around like that. You may want it take
  pci_power_t * as an argument, and then return 0/-ENODEV or
 something
  like that. But you can't retype between int and pm_message_t...
  No, taking pci_power_t as an argument is meaningless. For ACPI, we
  should know the exact sleep state, pm_message_t will tell us. But
 I'm ok
  to let it return a pci_power_t, and the failure case returns
  -ENODEV.
 
 You can't put -ENODEV into pci_power_t ... but maybe we should create
 PCI_ERROR and pass it in cases like this one?
That makes sense, please do it.

 
Could you just revert those two patches? First one is very
wrong. Second one might be fixed, but... See comments below.
   I think the platform_pci_set_power_state should be ok, did you
 see it
   causes oops?
  
  No its just ugly and uses __force in creative way. That one can
 be
  recovered.
  Do you mean this?
  
   +   static int state_conv[] = {
   +   [0] = 0,
   +   [1] = 1,
   +   [2] = 2,
   +   [3] = 3,
   +   [4] = 3
   +   };
   +   int acpi_state = state_conv[(int __force) state];
  
  The table should be
[PCI_D0] = 0,
  
  I'm not sure, but then could we use state_conv[state] directly? It
 seems
 
 I think so. Of course it is wrong, but it is less wrong than forcing
 it to integer than index, without using macros at all.
 
 Or perhaps you should do
 
 switch (state) {
 case PCI_D0: ...
 }
 
 ...and handle default case somehow.
That's ok for me. I'll change it later.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-22 Thread Li Shaohua
On Wed, 2005-03-23 at 04:57, Bjorn Helgaas wrote:
> > Your patch applied with some problems:
> >
> > patching file arch/i386/pci/irq.c
> > Hunk #2 succeeded at 1081 with fuzz 2 (offset 1 line).
> > patching file drivers/acpi/pci_irq.c
> > patching file drivers/pci/quirks.c
> > Hunk #1 succeeded at 678 (offset -5 lines).
> 
> These indicate minor differences in these files between upstream BK
> (which is what my patch was against) and the kernel you're building.
> You can ignore them.
> 
> > Then I tested it and it works (at least my speedtouch still works).
> 
> Great.  Shaohua, where should we go from here?  Do you have more
> concerns with the current patch, or should we ask Andrew to put it
> in -mm?  If you do have concerns, would you like to propose an
> alternate patch that fixes the problem for Grzegorz?
No, the patch is great to me.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.12-rc1-mm1: Kernel BUG at pci:389

2005-03-22 Thread Li, Shaohua
>
>> > Yes, but it is needed. There are many drivers, and they look at
>> > numerical value of PMSG_*. I'm proceeding in steps. I hopefully
killed
>> > all direct accesses to the constants, and will switch constants to
>> > something else... But that is going to be tommorow (need some
sleep).
>> The patches are going to acquire correct PCI device sleep state for
>> suspend/resume. We discussed the issue several months ago. My plan is
we
>> first introduce 'platform_pci_set_power_state', then merge the
>> 'platform_pci_choose_state' patch after Pavel's pm_message_t
conversion
>> finished. Maybe Len mislead my comments.
>>
>> Anyway for the callback, my intend is platform_pci_choose_state
accept
>> the pm_message_t parameter, and it return an 'int', since platform
>> method possibly failed and then pci_choose_state translate the return
>> value to pci_power_t.
>
>You can't just retype around like that. You may want it take
>pci_power_t * as an argument, and then return 0/-ENODEV or something
>like that. But you can't retype between int and pm_message_t...
No, taking pci_power_t as an argument is meaningless. For ACPI, we
should know the exact sleep state, pm_message_t will tell us. But I'm ok
to let it return a pci_power_t, and the failure case returns -ENODEV.

>
>Plus that function should have a documentation somewhere!
I will add it.

>
>> > Could you just revert those two patches? First one is very
>> > wrong. Second one might be fixed, but... See comments below.
>> I think the platform_pci_set_power_state should be ok, did you see it
>> causes oops?
>
>No its just ugly and uses __force in "creative" way. That one can be
>recovered.
Do you mean this?

> + static int state_conv[] = {
> + [0] = 0,
> + [1] = 1,
> + [2] = 2,
> + [3] = 3,
> + [4] = 3
> + };
> + int acpi_state = state_conv[(int __force) state];

The table should be
[PCI_D0] = 0,

I'm not sure, but then could we use state_conv[state] directly? It seems
wrong to me (the array accepts a pci_power_t as index?)

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.12-rc1-mm1: Kernel BUG at pci:389

2005-03-22 Thread Li, Shaohua

  Yes, but it is needed. There are many drivers, and they look at
  numerical value of PMSG_*. I'm proceeding in steps. I hopefully
killed
  all direct accesses to the constants, and will switch constants to
  something else... But that is going to be tommorow (need some
sleep).
 The patches are going to acquire correct PCI device sleep state for
 suspend/resume. We discussed the issue several months ago. My plan is
we
 first introduce 'platform_pci_set_power_state', then merge the
 'platform_pci_choose_state' patch after Pavel's pm_message_t
conversion
 finished. Maybe Len mislead my comments.

 Anyway for the callback, my intend is platform_pci_choose_state
accept
 the pm_message_t parameter, and it return an 'int', since platform
 method possibly failed and then pci_choose_state translate the return
 value to pci_power_t.

You can't just retype around like that. You may want it take
pci_power_t * as an argument, and then return 0/-ENODEV or something
like that. But you can't retype between int and pm_message_t...
No, taking pci_power_t as an argument is meaningless. For ACPI, we
should know the exact sleep state, pm_message_t will tell us. But I'm ok
to let it return a pci_power_t, and the failure case returns -ENODEV.


Plus that function should have a documentation somewhere!
I will add it.


  Could you just revert those two patches? First one is very
  wrong. Second one might be fixed, but... See comments below.
 I think the platform_pci_set_power_state should be ok, did you see it
 causes oops?

No its just ugly and uses __force in creative way. That one can be
recovered.
Do you mean this?

 + static int state_conv[] = {
 + [0] = 0,
 + [1] = 1,
 + [2] = 2,
 + [3] = 3,
 + [4] = 3
 + };
 + int acpi_state = state_conv[(int __force) state];

The table should be
[PCI_D0] = 0,

I'm not sure, but then could we use state_conv[state] directly? It seems
wrong to me (the array accepts a pci_power_t as index?)

Thanks,
Shaohua
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-22 Thread Li Shaohua
On Wed, 2005-03-23 at 04:57, Bjorn Helgaas wrote:
  Your patch applied with some problems:
 
  patching file arch/i386/pci/irq.c
  Hunk #2 succeeded at 1081 with fuzz 2 (offset 1 line).
  patching file drivers/acpi/pci_irq.c
  patching file drivers/pci/quirks.c
  Hunk #1 succeeded at 678 (offset -5 lines).
 
 These indicate minor differences in these files between upstream BK
 (which is what my patch was against) and the kernel you're building.
 You can ignore them.
 
  Then I tested it and it works (at least my speedtouch still works).
 
 Great.  Shaohua, where should we go from here?  Do you have more
 concerns with the current patch, or should we ask Andrew to put it
 in -mm?  If you do have concerns, would you like to propose an
 alternate patch that fixes the problem for Grzegorz?
No, the patch is great to me.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389

2005-03-21 Thread Li Shaohua
On Tue, 2005-03-22 at 09:35, Pavel Machek wrote:
> Hi!
> 
> > > and that says:
> > > 
> > > #define PMSG_FREEZE ((__force pm_message_t) 3)
> > > 
> > > ... I certainly have _FREEZE defined as 1 in my local tree, but I
> do
> > > not see that change in -mm yet.
> > 
> > Both 2.6.12-rc1-mm1 and 2.6.12-rc1 have:
> > 
> > #define PMSG_FREEZE ((__force pm_message_t) 3)
> > #define PMSG_SUSPEND((__force pm_message_t) 3)
> > #define PMSG_ON ((__force pm_message_t) 0)
> > 
> > which looks odd.
> 
> Yes, but it is needed. There are many drivers, and they look at
> numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed
> all direct accesses to the constants, and will switch constants to
> something else... But that is going to be tommorow (need some sleep).
The patches are going to acquire correct PCI device sleep state for
suspend/resume. We discussed the issue several months ago. My plan is we
first introduce 'platform_pci_set_power_state', then merge the
'platform_pci_choose_state' patch after Pavel's pm_message_t conversion
finished. Maybe Len mislead my comments. 

Anyway for the callback, my intend is platform_pci_choose_state accept
the pm_message_t parameter, and it return an 'int', since platform
method possibly failed and then pci_choose_state translate the return
value to pci_power_t.

> > > I reproduced it here.. I do not know who introduced
> > > platform_pci_choose_state, but it is *very* wrong. It returns
> > > it. Should it return pci_power_t? It probably should to match
> > > pci_choose_state, but that int is retyped to pm_message_t. Oops.
> > 
> > That change came from Len.  I've appended the two relevant patches
> below.
> > 
> > So hm.  We have incompatible changes in flight.  That doesn't happen
> very
> > often.
> > 
> > Could I suggest that you prepare a fixup against 2.6.12-rc1-mm1 and
> send
> > that to Len and myself?  If that fixup is not suitable for a
> 2.6.12-rc1
> > based tree then I can look after it until things get flushed out.
> 
> Could you just revert those two patches? First one is very
> wrong. Second one might be fixed, but... See comments below.
I think the platform_pci_set_power_state should be ok, did you see it
causes oops?

> 
> And they are both "dangerous" -- they introduce new and untested
> functionality while I'm trying to transition from int to
> pm_message_t. They also affect all the drivers.
> 
> Len, please Cc me on patches that affect suspend.
> 
> > @@ -17,6 +17,7 @@
> >  #include 
> >  
> >  #include 
> > +#include "pci.h"
> 
> 
> Should be ?
I suppose it's not exported out side of PCI, so I used 'pci.h' 

> 
> > +static int acpi_pci_choose_state(struct pci_dev *pdev, pm_message_t
> state)
> > +{
> 
> Should return pci_power_t, probably.
Should return int as I said above.

> 
> > + char dstate_str[] = "_S0D";
> > + acpi_status status;
> > + unsigned long val;
> > + struct device *dev = >dev;
> > +
> > + /* Fixme: the check is wrong after pm_message_t is a struct */
> 
> Exactly.
> 
> > + if ((state >= PM_SUSPEND_MAX) || !DEVICE_ACPI_HANDLE(dev))
> 
> PM_SUSPEND_MAX and friends is going to disappear.
Yep, this should be fixed. 

> 
> > + return -EINVAL;
> > + dstate_str[2] += state; /* _S1D, _S2D, _S3D, _S4D */
> 
> Ugh, assumes numerical values of states actually meaning anything. It
> definitely should not. Should be switch(state.event), but that code
> is not merged, yet => I'll send code that switches pm_message_t to
> struct, tommorow. But it may compile-time break some obscure
> drivers...
> 
> > diff -Nru a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > --- a/drivers/pci/pci-acpi.c  2005-03-21 17:02:38 -08:00
> > +++ b/drivers/pci/pci-acpi.c  2005-03-21 17:02:38 -08:00
> > @@ -253,6 +253,24 @@
> >   return -ENODEV;
> >  }
> >  
> > +static int acpi_pci_set_power_state(struct pci_dev *dev,
> pci_power_t state)
> > +{
> > + acpi_handle handle = DEVICE_ACPI_HANDLE(>dev);
> > + static int state_conv[] = {
> > + [0] = 0,
> > + [1] = 1,
> > + [2] = 2,
> > + [3] = 3,
> > + [4] = 3
> > + };
> > + int acpi_state = state_conv[(int __force) state];
> 
> The table should be
> [PCI_D0] = 0,
> ...
Ok, please revert the 'platform_pci_choose_pci' patch, I will add it
after Pavel's conversion is finished. Or after Pavel's is done, I can
send a quick fix.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.12-rc1-mm1: Kernel BUG at pci:389

2005-03-21 Thread Li Shaohua
On Tue, 2005-03-22 at 09:35, Pavel Machek wrote:
 Hi!
 
   and that says:
   
   #define PMSG_FREEZE ((__force pm_message_t) 3)
   
   ... I certainly have _FREEZE defined as 1 in my local tree, but I
 do
   not see that change in -mm yet.
  
  Both 2.6.12-rc1-mm1 and 2.6.12-rc1 have:
  
  #define PMSG_FREEZE ((__force pm_message_t) 3)
  #define PMSG_SUSPEND((__force pm_message_t) 3)
  #define PMSG_ON ((__force pm_message_t) 0)
  
  which looks odd.
 
 Yes, but it is needed. There are many drivers, and they look at
 numerical value of PMSG_*. I'm proceeding in steps. I hopefully killed
 all direct accesses to the constants, and will switch constants to
 something else... But that is going to be tommorow (need some sleep).
The patches are going to acquire correct PCI device sleep state for
suspend/resume. We discussed the issue several months ago. My plan is we
first introduce 'platform_pci_set_power_state', then merge the
'platform_pci_choose_state' patch after Pavel's pm_message_t conversion
finished. Maybe Len mislead my comments. 

Anyway for the callback, my intend is platform_pci_choose_state accept
the pm_message_t parameter, and it return an 'int', since platform
method possibly failed and then pci_choose_state translate the return
value to pci_power_t.

   I reproduced it here.. I do not know who introduced
   platform_pci_choose_state, but it is *very* wrong. It returns
   it. Should it return pci_power_t? It probably should to match
   pci_choose_state, but that int is retyped to pm_message_t. Oops.
  
  That change came from Len.  I've appended the two relevant patches
 below.
  
  So hm.  We have incompatible changes in flight.  That doesn't happen
 very
  often.
  
  Could I suggest that you prepare a fixup against 2.6.12-rc1-mm1 and
 send
  that to Len and myself?  If that fixup is not suitable for a
 2.6.12-rc1
  based tree then I can look after it until things get flushed out.
 
 Could you just revert those two patches? First one is very
 wrong. Second one might be fixed, but... See comments below.
I think the platform_pci_set_power_state should be ok, did you see it
causes oops?

 
 And they are both dangerous -- they introduce new and untested
 functionality while I'm trying to transition from int to
 pm_message_t. They also affect all the drivers.
 
 Len, please Cc me on patches that affect suspend.
 
  @@ -17,6 +17,7 @@
   #include acpi/acpi_bus.h
   
   #include linux/pci-acpi.h
  +#include pci.h
 
 
 Should be linux/pci.h?
I suppose it's not exported out side of PCI, so I used 'pci.h' 

 
  +static int acpi_pci_choose_state(struct pci_dev *pdev, pm_message_t
 state)
  +{
 
 Should return pci_power_t, probably.
Should return int as I said above.

 
  + char dstate_str[] = _S0D;
  + acpi_status status;
  + unsigned long val;
  + struct device *dev = pdev-dev;
  +
  + /* Fixme: the check is wrong after pm_message_t is a struct */
 
 Exactly.
 
  + if ((state = PM_SUSPEND_MAX) || !DEVICE_ACPI_HANDLE(dev))
 
 PM_SUSPEND_MAX and friends is going to disappear.
Yep, this should be fixed. 

 
  + return -EINVAL;
  + dstate_str[2] += state; /* _S1D, _S2D, _S3D, _S4D */
 
 Ugh, assumes numerical values of states actually meaning anything. It
 definitely should not. Should be switch(state.event), but that code
 is not merged, yet = I'll send code that switches pm_message_t to
 struct, tommorow. But it may compile-time break some obscure
 drivers...
 
  diff -Nru a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
  --- a/drivers/pci/pci-acpi.c  2005-03-21 17:02:38 -08:00
  +++ b/drivers/pci/pci-acpi.c  2005-03-21 17:02:38 -08:00
  @@ -253,6 +253,24 @@
return -ENODEV;
   }
   
  +static int acpi_pci_set_power_state(struct pci_dev *dev,
 pci_power_t state)
  +{
  + acpi_handle handle = DEVICE_ACPI_HANDLE(dev-dev);
  + static int state_conv[] = {
  + [0] = 0,
  + [1] = 1,
  + [2] = 2,
  + [3] = 3,
  + [4] = 3
  + };
  + int acpi_state = state_conv[(int __force) state];
 
 The table should be
 [PCI_D0] = 0,
 ...
Ok, please revert the 'platform_pci_choose_pci' patch, I will add it
after Pavel's conversion is finished. Or after Pavel's is done, I can
send a quick fix.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-17 Thread Li Shaohua
On Fri, 2005-03-18 at 02:08, Bjorn Helgaas wrote:
> On Thu, 2005-03-17 at 09:33 +0800, Li Shaohua wrote:
> > The comments in previous quirk said it's required only in PIC mode.
> ...
> > I feel we concerned too much. Changing the interrupt line isn't harmful,
> > right? Linux actually ignored interrupt line. Maybe just a
> > PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is
> > sufficient.
> 
> I think it's good to limit the scope of the quirk as much as
> possible because that makes it easier to do future restructuring,
> such as device-specific interrupt routers.
> 
> The comment (before quirk_via_acpi(), nowhere near quirk_via_irqpic())
> says *on-chip devices* have this unusual behavior when the interrupt
> line is written.  That makes sense to me.
> 
> Writing the interrupt line on random plug-in Via PCI devices does
> not make sense to me, because for that to have any effect, an
> upstream bridge would have to be snooping the traffic going through
> it.  That doesn't sound plausible to me.
> 
> What about this:
Hmm, this looks like previous solution. We removed the specific via
quirk is because we don't know how many devices have such issue. Every
time we encounter an IRQ issue in a VIA PCI device, we will suspect it
requires quirk and keep try. This is a big overhead. 

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]fix oops when inserting ipmi_si module

2005-03-17 Thread Li Shaohua
Hi,
In one of machines in our lab, spmi->addr.register_bit_width is 0 (so
the returned address is invalid). Ignoring the check will cause
inserting the module oops.

Thanks,
Shaohua

Signed-off-by: Li Shaohua<[EMAIL PROTECTED]>

--- a/drivers/char/ipmi/ipmi_si_intf.c  2005-03-03 10:56:51.0 +0800
+++ b/drivers/char/ipmi/ipmi_si_intf.c  2005-03-17 16:34:32.478606080 +0800
@@ -1466,6 +1466,11 @@ static int try_init_acpi(int intf_num, s
if (!is_new_interface(-1, addr_space, spmi->addr.address))
return -ENODEV;
 
+   if (!spmi->addr.register_bit_width) {
+   acpi_failure = 1;
+   return -ENODEV;
+   }
+
/* Figure out the interface type. */
switch (spmi->InterfaceType)
{


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]fix oops when inserting ipmi_si module

2005-03-17 Thread Li Shaohua
Hi,
In one of machines in our lab, spmi-addr.register_bit_width is 0 (so
the returned address is invalid). Ignoring the check will cause
inserting the module oops.

Thanks,
Shaohua

Signed-off-by: Li Shaohua[EMAIL PROTECTED]

--- a/drivers/char/ipmi/ipmi_si_intf.c  2005-03-03 10:56:51.0 +0800
+++ b/drivers/char/ipmi/ipmi_si_intf.c  2005-03-17 16:34:32.478606080 +0800
@@ -1466,6 +1466,11 @@ static int try_init_acpi(int intf_num, s
if (!is_new_interface(-1, addr_space, spmi-addr.address))
return -ENODEV;
 
+   if (!spmi-addr.register_bit_width) {
+   acpi_failure = 1;
+   return -ENODEV;
+   }
+
/* Figure out the interface type. */
switch (spmi-InterfaceType)
{


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-17 Thread Li Shaohua
On Fri, 2005-03-18 at 02:08, Bjorn Helgaas wrote:
 On Thu, 2005-03-17 at 09:33 +0800, Li Shaohua wrote:
  The comments in previous quirk said it's required only in PIC mode.
 ...
  I feel we concerned too much. Changing the interrupt line isn't harmful,
  right? Linux actually ignored interrupt line. Maybe just a
  PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is
  sufficient.
 
 I think it's good to limit the scope of the quirk as much as
 possible because that makes it easier to do future restructuring,
 such as device-specific interrupt routers.
 
 The comment (before quirk_via_acpi(), nowhere near quirk_via_irqpic())
 says *on-chip devices* have this unusual behavior when the interrupt
 line is written.  That makes sense to me.
 
 Writing the interrupt line on random plug-in Via PCI devices does
 not make sense to me, because for that to have any effect, an
 upstream bridge would have to be snooping the traffic going through
 it.  That doesn't sound plausible to me.
 
 What about this:
Hmm, this looks like previous solution. We removed the specific via
quirk is because we don't know how many devices have such issue. Every
time we encounter an IRQ issue in a VIA PCI device, we will suspect it
requires quirk and keep try. This is a big overhead. 

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-16 Thread Li Shaohua
Hi,
On Thu, 2005-03-17 at 00:10, Bjorn Helgaas wrote:
> On Tue, 2005-03-15 at 16:02 -0700, Zwane Mwaikambo wrote:
> > On Tue, 15 Mar 2005, Bjorn Helgaas wrote:
> > > That seems awfully suspicious to me.  So the following is
> > > probably safe as far as it goes, but not sufficient for all
> > > cases.
> > 
> > VIA bridges allow for IRQ routing updates by programming 
> > PCI_INTERRUPT_LINE, so it is supposed to work even if we do it for
> all the 
> > devices, so it appears to be a board/bios specific problem.
> 
> This just feels like a sledgehammer approach, i.e., we're
> programming PCI_INTERRUPT_LINE in more cases that we actually
> need to.  I especially don't like that any Via device with
> devfn==0 triggers the quirk.  That doesn't seem like the
> right test if we're really looking for a Via bridge.
> 
> > > -static void __devinit quirk_via_bridge(struct pci_dev *pdev)
> > > +static void __devinit quirk_via_irqpic(struct pci_dev *dev)
> > >  {
> > > -   if(pdev->devfn == 0) {
> > > -   printk(KERN_INFO "PCI: Via IRQ fixup\n");
> > > -   via_interrupt_line_quirk = 1;
> > > +   u8 irq, new_irq = dev->irq & 0xf;
> > > +
> > > +   pci_read_config_byte(dev, PCI_INTERRUPT_LINE, );
> > > +   if (new_irq != irq) {
> > > +   printk(KERN_INFO "PCI: Via IRQ fixup for %s, from %d
> to %d\n",
> > > +   pci_name(dev), irq, new_irq);
> > > +   udelay(15);
> > > +   pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
> new_irq);
> > > }
> > >  }
> > > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA,   
> PCI_ANY_ID, quirk_via_bridge );
> > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
> PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irqpic);
> > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
> PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic);
> > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
> PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic);
> > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
> PCI_DEVICE_ID_VIA_8233_5,   quirk_via_irqpic);
> > > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
> PCI_DEVICE_ID_VIA_8233_7,   quirk_via_irqpic);
> > 
> > This looks like it'll only affect the PCI device associated with the
> > listed south bridges, which might break systems which relied on the
> per 
> > device setting. Your 'debug' patch actually made sense to me, that
> is, 
> > moving the PCI_INTERRUPT_LINE fixup at gsi register.
> 
> Yes, that's what I meant by the above probably not being sufficient.
> 
> The main thing the debug patch did was to move the write to after
> the IOAPIC programming.  (And I think it added back the mysterious
> udelay().)  My point is that the write could just as easily be done
> in a pci_enable fixup, because that also happens after the IOAPIC
> update.
The comments in previous quirk said it's required only in PIC mode.

> 
> The quirk would have to be something like this:
> 
> static void __devinit quirk_via_irq(struct pci_dev *dev)
> {
> if  (!via_interrupt_line_quirk)
> return;
> 
> /* update PCI_INTERRUPT_LINE */
> ...
> }
> DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID,
> quirk_via_irq);
> 
> with a PCI_FIXUP_HEADER quirk that sets via_interrupt_line_quirk when
> we find a Via bridge.
> 
> But I'm uneasy even about this -- what if there are multiple bridges,
> with only one of them being a Via?  Why would we want to apply this
> quirk to the devices under the non-Via bridges?  Wouldn't it be better
> to search up the hierarchy of each device, looking for a Via bridge,
> and apply the quirk only if we find one?
I feel we concerned too much. Changing the interrupt line isn't harmful,
right? Linux actually ignored interrupt line. Maybe just a
PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is
sufficient.
and 
quirk_via_irq(..)
{
update_interrupt_line
}

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-16 Thread Li Shaohua
Hi,
On Thu, 2005-03-17 at 00:10, Bjorn Helgaas wrote:
 On Tue, 2005-03-15 at 16:02 -0700, Zwane Mwaikambo wrote:
  On Tue, 15 Mar 2005, Bjorn Helgaas wrote:
   That seems awfully suspicious to me.  So the following is
   probably safe as far as it goes, but not sufficient for all
   cases.
  
  VIA bridges allow for IRQ routing updates by programming 
  PCI_INTERRUPT_LINE, so it is supposed to work even if we do it for
 all the 
  devices, so it appears to be a board/bios specific problem.
 
 This just feels like a sledgehammer approach, i.e., we're
 programming PCI_INTERRUPT_LINE in more cases that we actually
 need to.  I especially don't like that any Via device with
 devfn==0 triggers the quirk.  That doesn't seem like the
 right test if we're really looking for a Via bridge.
 
   -static void __devinit quirk_via_bridge(struct pci_dev *pdev)
   +static void __devinit quirk_via_irqpic(struct pci_dev *dev)
{
   -   if(pdev-devfn == 0) {
   -   printk(KERN_INFO PCI: Via IRQ fixup\n);
   -   via_interrupt_line_quirk = 1;
   +   u8 irq, new_irq = dev-irq  0xf;
   +
   +   pci_read_config_byte(dev, PCI_INTERRUPT_LINE, irq);
   +   if (new_irq != irq) {
   +   printk(KERN_INFO PCI: Via IRQ fixup for %s, from %d
 to %d\n,
   +   pci_name(dev), irq, new_irq);
   +   udelay(15);
   +   pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
 new_irq);
   }
}
   -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA,   
 PCI_ANY_ID, quirk_via_bridge );
   +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
 PCI_DEVICE_ID_VIA_82C586_2, quirk_via_irqpic);
   +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
 PCI_DEVICE_ID_VIA_82C686_5, quirk_via_irqpic);
   +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
 PCI_DEVICE_ID_VIA_82C686_6, quirk_via_irqpic);
   +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
 PCI_DEVICE_ID_VIA_8233_5,   quirk_via_irqpic);
   +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA,
 PCI_DEVICE_ID_VIA_8233_7,   quirk_via_irqpic);
  
  This looks like it'll only affect the PCI device associated with the
  listed south bridges, which might break systems which relied on the
 per 
  device setting. Your 'debug' patch actually made sense to me, that
 is, 
  moving the PCI_INTERRUPT_LINE fixup at gsi register.
 
 Yes, that's what I meant by the above probably not being sufficient.
 
 The main thing the debug patch did was to move the write to after
 the IOAPIC programming.  (And I think it added back the mysterious
 udelay().)  My point is that the write could just as easily be done
 in a pci_enable fixup, because that also happens after the IOAPIC
 update.
The comments in previous quirk said it's required only in PIC mode.

 
 The quirk would have to be something like this:
 
 static void __devinit quirk_via_irq(struct pci_dev *dev)
 {
 if  (!via_interrupt_line_quirk)
 return;
 
 /* update PCI_INTERRUPT_LINE */
 ...
 }
 DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID,
 quirk_via_irq);
 
 with a PCI_FIXUP_HEADER quirk that sets via_interrupt_line_quirk when
 we find a Via bridge.
 
 But I'm uneasy even about this -- what if there are multiple bridges,
 with only one of them being a Via?  Why would we want to apply this
 quirk to the devices under the non-Via bridges?  Wouldn't it be better
 to search up the hierarchy of each device, looking for a Via bridge,
 and apply the quirk only if we find one?
I feel we concerned too much. Changing the interrupt line isn't harmful,
right? Linux actually ignored interrupt line. Maybe just a
PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is
sufficient.
and 
quirk_via_irq(..)
{
update_interrupt_line
}

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Call for help: list of machines with working S3

2005-03-15 Thread Li Shaohua
Hi,
On Mon, 2005-03-14 at 16:00, Pavel Machek wrote:
> Hi!
> 
> >   * MySQL (hinders the actual suspension process and kicks the pc
> back to 
> > where it was)
> 
> Try this patch...
> Pavel
> 
> --- clean/kernel/signal.c   2005-02-03 22:27:26.0 +0100
> +++ linux/kernel/signal.c   2005-02-03 22:28:19.0 +0100
> @@ -,6 +,7 @@
> ret = -EINTR;
> }
>  
> +   try_to_freeze(1);
> return ret;
>  }
I also encounter a similar issue. syslogd can't be stopped. It's waiting
for kjournald to flush some works but kjournald is stopped first. Looks
like the kernel thread should be stopped later than user thread just
like Nigel's suspend2 patch does.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Call for help: list of machines with working S3

2005-03-15 Thread Li Shaohua
Hi,
On Mon, 2005-03-14 at 16:00, Pavel Machek wrote:
 Hi!
 
* MySQL (hinders the actual suspension process and kicks the pc
 back to 
  where it was)
 
 Try this patch...
 Pavel
 
 --- clean/kernel/signal.c   2005-02-03 22:27:26.0 +0100
 +++ linux/kernel/signal.c   2005-02-03 22:28:19.0 +0100
 @@ -,6 +,7 @@
 ret = -EINTR;
 }
  
 +   try_to_freeze(1);
 return ret;
  }
I also encounter a similar issue. syslogd can't be stopped. It's waiting
for kjournald to flush some works but kjournald is stopped first. Looks
like the kernel thread should be stopped later than user thread just
like Nigel's suspend2 patch does.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-14 Thread Li, Shaohua
Hi,
This issue is quite interesting. We removed all specific VIA quirk
recently and apply a generic VIA quirk. But in this case, the MCH 00:0.0
is from AMD, and the ISA bridge and built-in devices are from VIA, this
means VIA quirk is useless, since it takes action only when the MCH is
from VIA. We possibly should enable VIA quirk if a VIA ISA bridge is
found instead of a VIA MCH found, but Bjorn's method seems ok.
If you want to put the patch into kernel, please also change the '
pirq_enable_irq' case.

Thanks,
Shaohua

>-Original Message-
>From: [EMAIL PROTECTED] [mailto:acpi-devel-
>[EMAIL PROTECTED] On Behalf Of Grzegorz Kulewski
>Sent: Sunday, March 13, 2005 11:15 PM
>To: Bjorn Helgaas
>Cc: Andrew Morton; ACPI List; lkml
>Subject: Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI
breaks
>USB
>
>On Fri, 11 Mar 2005, Bjorn Helgaas wrote:
>
>> Can you do an "lspci -vvn"?  I'm looking at quirk_via_irqpic() in
>> 2.6.9, which is what printed this:
>>
PCI: Via IRQ fixup for :00:07.2, from 9 to 10
PCI: Via IRQ fixup for :00:07.3, from 9 to 10
>>
>> but it looks like it should only run for PCI_DEVICE_ID_VIA_82C586_2,
>> PCI_DEVICE_ID_VIA_82C686_5, and PCI_DEVICE_ID_VIA_82C686_6.
>>
>> You have:
>>
>> :00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo
Super
>South] (rev 40)
>> :00:07.1 IDE interface: VIA Technologies, Inc.
>VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06)
>> :00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 1a)
(prog-if
>00 [UHCI])
>> :00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 1a)
(prog-if
>00 [UHCI])
>>
>> and we apparently ran the quirk for 07.2 and 07.3.  I wouldn't
>> have thought those would have one of the above device IDs.  The
>> "lspci -vvn" should tell us for sure.
>>
>> 2.6.11 removed that quirk and runs quirk_via_bridge() for
>> all VIA devices, but only sets via_interrupt_line_quirk if
>> (pdev->devfn == 0), which you don't have.  So that's why
>> my patch didn't do anything.
>>
>>> Also two more questions:
>>>
>>> 1. What is VIA fixup? Is it some hardware bug? Or BIOS problem? Why
is
>it
>>> needed? On what hardware / software it is needed?
>>
>> I really don't know much about the VIA fixup.  I just noticed
>> that we seem to be doing it slightly differently in 2.6.11 than
>> we did in 2.6.9, and thought maybe it was related to your problem.
>> Here's a changeset that has a couple pointers:
>>
>>http://linux.bkbits.net:8080/linux-
>2.5/cset%4041cb9d48DRV4TYe77gvstTawuZFYyQ
>>
>>> 2. Why this patch shrinked bzImage that much:
>>>
>>> -rw-r--r--  1 root root 1828186 mar 11 23:33 vmlinuz-2.6.11-cko1
>>> -rw-r--r--  1 root root 1828355 mar  2 20:48 vmlinuz-2.6.11-cko1.old
>>
>> I have no idea about this.  But it's only a couple hundred bytes.
>>
>> So here's another patch to try (revert the first one, then apply
this).
>>
>> = drivers/acpi/pci_irq.c 1.37 vs edited =
>> --- 1.37/drivers/acpi/pci_irq.c  2005-03-01 09:57:29 -07:00
>> +++ edited/drivers/acpi/pci_irq.c2005-03-11 15:13:49 -07:00
>> @@ -30,6 +30,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> #include 
>> #include 
>> #include 
>> @@ -438,10 +439,17 @@
>>  }
>>  }
>>
>> -if (via_interrupt_line_quirk)
>> -pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq &
15);
>> -
>>  dev->irq = acpi_register_gsi(irq, edge_level, active_high_low);
>> +
>> +if (dev->vendor == PCI_VENDOR_ID_VIA) {
>> +u8 old_irq, new_irq = dev->irq & 0xf;
>> +
>> +pci_read_config_byte(dev, PCI_INTERRUPT_LINE, _irq);
>> +printk(KERN_INFO PREFIX "Via IRQ fixup for %s, from %d "
>> +"to %d\n", pci_name(dev), old_irq, new_irq);
>> +udelay(15);
>> +pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq);
>> +}
>>
>>  printk(KERN_INFO PREFIX "PCI interrupt %s[%c] -> GSI %u "
>>  "(%s, %s) -> IRQ %d\n",
>>
>
>Ok, this patch works. Here is the log:
>
>Mar 13 17:16:17 kangur Linux version 2.6.11-cko1 ([EMAIL PROTECTED]) (gcc
>version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6))
#3
>Sun Mar 13 17:10:10 CET 2005
>Mar 13 17:16:17 kangur BIOS-provided physical RAM map:
>Mar 13 17:16:17 kangur BIOS-e820:  - 0009fc00
>(usable)
>Mar 13 17:16:17 kangur BIOS-e820: 0009fc00 - 000a
>(reserved)
>Mar 13 17:16:17 kangur BIOS-e820: 000f - 0010
>(reserved)
>Mar 13 17:16:17 kangur BIOS-e820: 0010 - 1fff
>(usable)
>Mar 13 17:16:17 kangur BIOS-e820: 1fff - 1fff3000
>(ACPI NVS)
>Mar 13 17:16:17 kangur BIOS-e820: 1fff3000 - 2000
>(ACPI data)
>Mar 13 17:16:17 kangur BIOS-e820:  - 0001
>(reserved)
>Mar 13 17:16:17 kangur 511MB LOWMEM available.
>Mar 13 17:16:17 kangur On node 0 totalpages: 131056
>Mar 13 17:16:17 kangur DMA zone: 4096 

RE: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-14 Thread Li, Shaohua
Hi,
This issue is quite interesting. We removed all specific VIA quirk
recently and apply a generic VIA quirk. But in this case, the MCH 00:0.0
is from AMD, and the ISA bridge and built-in devices are from VIA, this
means VIA quirk is useless, since it takes action only when the MCH is
from VIA. We possibly should enable VIA quirk if a VIA ISA bridge is
found instead of a VIA MCH found, but Bjorn's method seems ok.
If you want to put the patch into kernel, please also change the '
pirq_enable_irq' case.

Thanks,
Shaohua

-Original Message-
From: [EMAIL PROTECTED] [mailto:acpi-devel-
[EMAIL PROTECTED] On Behalf Of Grzegorz Kulewski
Sent: Sunday, March 13, 2005 11:15 PM
To: Bjorn Helgaas
Cc: Andrew Morton; ACPI List; lkml
Subject: Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI
breaks
USB

On Fri, 11 Mar 2005, Bjorn Helgaas wrote:

 Can you do an lspci -vvn?  I'm looking at quirk_via_irqpic() in
 2.6.9, which is what printed this:

PCI: Via IRQ fixup for :00:07.2, from 9 to 10
PCI: Via IRQ fixup for :00:07.3, from 9 to 10

 but it looks like it should only run for PCI_DEVICE_ID_VIA_82C586_2,
 PCI_DEVICE_ID_VIA_82C686_5, and PCI_DEVICE_ID_VIA_82C686_6.

 You have:

 :00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo
Super
South] (rev 40)
 :00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06)
 :00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 1a)
(prog-if
00 [UHCI])
 :00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 1a)
(prog-if
00 [UHCI])

 and we apparently ran the quirk for 07.2 and 07.3.  I wouldn't
 have thought those would have one of the above device IDs.  The
 lspci -vvn should tell us for sure.

 2.6.11 removed that quirk and runs quirk_via_bridge() for
 all VIA devices, but only sets via_interrupt_line_quirk if
 (pdev-devfn == 0), which you don't have.  So that's why
 my patch didn't do anything.

 Also two more questions:

 1. What is VIA fixup? Is it some hardware bug? Or BIOS problem? Why
is
it
 needed? On what hardware / software it is needed?

 I really don't know much about the VIA fixup.  I just noticed
 that we seem to be doing it slightly differently in 2.6.11 than
 we did in 2.6.9, and thought maybe it was related to your problem.
 Here's a changeset that has a couple pointers:

http://linux.bkbits.net:8080/linux-
2.5/cset%4041cb9d48DRV4TYe77gvstTawuZFYyQ

 2. Why this patch shrinked bzImage that much:

 -rw-r--r--  1 root root 1828186 mar 11 23:33 vmlinuz-2.6.11-cko1
 -rw-r--r--  1 root root 1828355 mar  2 20:48 vmlinuz-2.6.11-cko1.old

 I have no idea about this.  But it's only a couple hundred bytes.

 So here's another patch to try (revert the first one, then apply
this).

 = drivers/acpi/pci_irq.c 1.37 vs edited =
 --- 1.37/drivers/acpi/pci_irq.c  2005-03-01 09:57:29 -07:00
 +++ edited/drivers/acpi/pci_irq.c2005-03-11 15:13:49 -07:00
 @@ -30,6 +30,7 @@
 #include linux/module.h
 #include linux/init.h
 #include linux/types.h
 +#include linux/delay.h
 #include linux/proc_fs.h
 #include linux/spinlock.h
 #include linux/pm.h
 @@ -438,10 +439,17 @@
  }
  }

 -if (via_interrupt_line_quirk)
 -pci_write_config_byte(dev, PCI_INTERRUPT_LINE, irq 
15);
 -
  dev-irq = acpi_register_gsi(irq, edge_level, active_high_low);
 +
 +if (dev-vendor == PCI_VENDOR_ID_VIA) {
 +u8 old_irq, new_irq = dev-irq  0xf;
 +
 +pci_read_config_byte(dev, PCI_INTERRUPT_LINE, old_irq);
 +printk(KERN_INFO PREFIX Via IRQ fixup for %s, from %d 
 +to %d\n, pci_name(dev), old_irq, new_irq);
 +udelay(15);
 +pci_write_config_byte(dev, PCI_INTERRUPT_LINE, new_irq);
 +}

  printk(KERN_INFO PREFIX PCI interrupt %s[%c] - GSI %u 
  (%s, %s) - IRQ %d\n,


Ok, this patch works. Here is the log:

Mar 13 17:16:17 kangur Linux version 2.6.11-cko1 ([EMAIL PROTECTED]) (gcc
version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6))
#3
Sun Mar 13 17:10:10 CET 2005
Mar 13 17:16:17 kangur BIOS-provided physical RAM map:
Mar 13 17:16:17 kangur BIOS-e820:  - 0009fc00
(usable)
Mar 13 17:16:17 kangur BIOS-e820: 0009fc00 - 000a
(reserved)
Mar 13 17:16:17 kangur BIOS-e820: 000f - 0010
(reserved)
Mar 13 17:16:17 kangur BIOS-e820: 0010 - 1fff
(usable)
Mar 13 17:16:17 kangur BIOS-e820: 1fff - 1fff3000
(ACPI NVS)
Mar 13 17:16:17 kangur BIOS-e820: 1fff3000 - 2000
(ACPI data)
Mar 13 17:16:17 kangur BIOS-e820:  - 0001
(reserved)
Mar 13 17:16:17 kangur 511MB LOWMEM available.
Mar 13 17:16:17 kangur On node 0 totalpages: 131056
Mar 13 17:16:17 kangur DMA zone: 4096 pages, LIFO batch:1
Mar 13 17:16:17 kangur Normal zone: 126960 pages, LIFO batch:16
Mar 13 17:16:17 kangur HighMem zone: 0 pages, LIFO 

RE: [ACPI] s4bios: does anyone use it?

2005-03-07 Thread Li, Shaohua
Hi,
>> >
>> > Is there single user of s4bios? It used to work for me 4 notebooks
>> > ago, but I never really used it.
>>
>> I don't have anymore my toshiba laptop where S4 bios was first
>> implemented.
>>
>> > I think I'm the only person that ever
>> > seen it working, but I could be wrong.
>>
>> You are indeed wrong.
>
>Okay, so we had 2 users in past but have 0 users now? :-).
I wonder how could anyone use S4BIOS in 2.6.11. S4 and S4b all came into
'enter_state'. and in acpi_sleep_init:

if (i == ACPI_STATE_S4) {
if (acpi_gbl_FACS->S4bios_f) {
sleep_states[i] = 1;
printk(" S4bios");
acpi_pm_ops.pm_disk_mode =
PM_DISK_FIRMWARE;
}
if (sleep_states[i])
acpi_pm_ops.pm_disk_mode =
PM_DISK_PLATFORM;
}
That means we actually can't set PM_DISK_FIRMWARE (always set
PM_DISK_PLATFORM). Is this intended? If no, .pm_disk_mode should be a
mask.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] s4bios: does anyone use it?

2005-03-07 Thread Li, Shaohua
Hi,
 
  Is there single user of s4bios? It used to work for me 4 notebooks
  ago, but I never really used it.

 I don't have anymore my toshiba laptop where S4 bios was first
 implemented.

  I think I'm the only person that ever
  seen it working, but I could be wrong.

 You are indeed wrong.

Okay, so we had 2 users in past but have 0 users now? :-).
I wonder how could anyone use S4BIOS in 2.6.11. S4 and S4b all came into
'enter_state'. and in acpi_sleep_init:

if (i == ACPI_STATE_S4) {
if (acpi_gbl_FACS-S4bios_f) {
sleep_states[i] = 1;
printk( S4bios);
acpi_pm_ops.pm_disk_mode =
PM_DISK_FIRMWARE;
}
if (sleep_states[i])
acpi_pm_ops.pm_disk_mode =
PM_DISK_PLATFORM;
}
That means we actually can't set PM_DISK_FIRMWARE (always set
PM_DISK_PLATFORM). Is this intended? If no, .pm_disk_mode should be a
mask.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fixup debug warnings during ACPI S3 resume from ram

2005-01-17 Thread Li Shaohua
On Sat, 2005-01-15 at 08:24, Christian Borntraeger wrote:
> During the wakeup from suspend-to-ram I get several warnings (see below).
> This patch fixes the warnings for me, but I am not an expert in ACPI. Please 
> read the patch and consider to apply it. 
Thanks looking at this issue. We (intel ACPI team) have many discussions
about this issue. Actually this problem isn't so easy. The warning is
when doing resume PCI link device with interrupt disabled. A more
important issue is suspend/resume is doing with all processes frozen,
which will cause many issues such as semaphore, memory mapping, kmalloc.
The real solution is on going. I'll let you know when it's ready.

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4]Bind physical devices with ACPI devices - take 2

2005-01-17 Thread Li Shaohua
On Mon, 2005-01-17 at 19:28, Pavel Machek wrote:
> Hi!
> 
> > > The series of patches implement binding physical devices with ACPI
> > > devices. With it, device drivers can utilize methods provided by
> > > firmware (ACPI). These patches are against 2.6.10, please give your
> > > comments.
> 
> > This is updated patches according to latest discussion.
> > Changes from last one:
> > 1. introduce new field 'firmware_data' in 'struct device', since people
> > complain rename 'platform_data. Greg, could you please check if the
> > comments I added in 'struct device' are correct?
> > 2. align to Pavel's latest PCI state convention work.
> > 3. Some cleanups and add more comments.
> > One issue is 'platform_pci_choose_state' doesn't get called, it should
> > be after Pavel updates the parameter of 'pci_choose_state'
> 
> diff -puN drivers/pci/pci.c~acpi-pci-get-suspend-state-callback
> drivers/pci/pci.c
> --- 2.5/drivers/pci/pci.c~acpi-pci-get-suspend-state-callback
> 2005-01-17 12:54:05.357547072 +0800
> +++ 2.5-root/drivers/pci/pci.c  2005-01-17 13:08:50.835933896 +0800
> @@ -317,6 +317,7 @@ pci_set_power_state(struct pci_dev *dev,
>   * Returns PCI power state suitable for given device and given system
>   * message.
>   */
> +int (*platform_pci_choose_state)(struct pci_dev *, pm_message_t) = 0;
> 
>  pci_power_t pci_choose_state(struct pci_dev *dev, u32 state)
>  {
> 
> Perhaps you want this to be "= NULL"?
I must be in sleep :). I will fix it soon.
> 
> 
> > @@ -208,6 +209,25 @@ acpi_status pci_osc_control_set(u32 flag
> >  }
> >  EXPORT_SYMBOL(pci_osc_control_set);
> >  
> > +static int acpi_pci_choose_state(struct pci_dev *pdev,
> > +   pm_message_t state)
> > +{
> > +   char dstate_str[] = "_S0D";
> > +   acpi_status status;
> > +   unsigned long val;
> > +   struct device *dev = >dev;
> > +
> > +   /* state is PM_SUSPEND_* */
> > +   if ((state >= PM_SUSPEND_MAX) || !DEVICE_ACPI_HANDLE(dev))
> > +   return -EINVAL;
> > +   dstate_str[2] += (int __force)state;
> 
> When I'm done, you will not be able to just retype state to
> integer... Perhaps you want to do pci_choose_state first; that gets
> you pci_power_t and that one *is* okay to retype to int?
Firmware possibly will can't return a useful suspend state (Either
firmware doesn't define such device or evaluation failed), that's why I
return an int. I suppose pci_choose_state will do something:
ret = firmware_pci_choose_state(dev, state);
if (ret >= 0)
pci_state = ret;
switch(pci_state) {
case 0: return PCI_D0;
.
}

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4]Bind physical devices with ACPI devices - take 2

2005-01-17 Thread Li Shaohua
On Wed, 2005-01-05 at 10:50, Li Shaohua wrote:
> Hi,
> The series of patches implement binding physical devices with ACPI
> devices. With it, device drivers can utilize methods provided by
> firmware (ACPI). These patches are against 2.6.10, please give your
> comments.
Hi,
This is updated patches according to latest discussion.
Changes from last one:
1. introduce new field 'firmware_data' in 'struct device', since people
complain rename 'platform_data. Greg, could you please check if the
comments I added in 'struct device' are correct?
2. align to Pavel's latest PCI state convention work.
3. Some cleanups and add more comments.
One issue is 'platform_pci_choose_state' doesn't get called, it should
be after Pavel updates the parameter of 'pci_choose_state'

Thanks,
Shaohua


This patch implemented the framework for binding physical devices with ACPI
devices. A physical bus like PCI bus should create a 'acpi_bus_type'.
The method in 'acpi_bus_type':
.find_device:
	For device which has parent such as normal PCI devices.
.find_bridge:
	It's for special devices, such as PCI root bridge and IDE controller. 
such devices generally haven't parent or ->bus. We use the special method 
to get an ACPI handle.

---

 2.5-root/drivers/acpi/Makefile   |2 
 2.5-root/drivers/acpi/glue.c |  360 +++
 2.5-root/drivers/acpi/ibm_acpi.c |4 
 2.5-root/include/acpi/acpi_bus.h |   21 ++
 2.5-root/include/linux/device.h  |6 
 5 files changed, 388 insertions(+), 5 deletions(-)

diff -puN /dev/null drivers/acpi/glue.c
--- /dev/null	2004-02-24 05:02:56.0 +0800
+++ 2.5-root/drivers/acpi/glue.c	2005-01-17 12:52:16.825046520 +0800
@@ -0,0 +1,360 @@
+/*
+ * Link physical devices with ACPI devices support
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define ACPI_GLUE_DEBUG	0
+#if ACPI_GLUE_DEBUG
+#define DBG(x...) printk(PREFIX x)
+#else
+#define DBG(x...)
+#endif
+static LIST_HEAD(bus_type_list);
+static DECLARE_RWSEM(bus_type_sem);
+
+int register_acpi_bus_type(struct acpi_bus_type *type)
+{
+	if (acpi_disabled)
+		return -ENODEV;
+	if (type && type->bus && type->find_device) {
+		down_write(_type_sem);
+		list_add_tail(>list, _type_list);
+		up_write(_type_sem);
+		DBG("ACPI bus type %s registered\n", type->bus->name);
+		return 0;
+	}
+	return -ENODEV;
+}
+EXPORT_SYMBOL(register_acpi_bus_type);
+
+int unregister_acpi_bus_type(struct acpi_bus_type *type)
+{
+	if (acpi_disabled)
+		return 0;
+	if (type) {
+		down_write(_type_sem);
+		list_del_init(>list);
+		up_write(_type_sem);
+		DBG("ACPI bus type %s unregistered\n", type->bus->name);
+		return 0;
+	}
+	return -ENODEV;
+}
+EXPORT_SYMBOL(unregister_acpi_bus_type);
+
+static struct acpi_bus_type *
+acpi_get_bus_type(struct bus_type *type)
+{
+	struct acpi_bus_type *tmp, *ret = NULL;
+
+	down_read(_type_sem);
+	list_for_each_entry(tmp, _type_list, list) {
+		if (tmp->bus == type) {
+			ret = tmp;
+			break;
+		}
+	}
+	up_read(_type_sem);
+	return ret;
+}
+
+static int
+acpi_find_bridge_device(struct device *dev, acpi_handle *handle)
+{
+	struct acpi_bus_type *tmp;
+	int	ret = -ENODEV;
+
+	down_read(_type_sem);
+	list_for_each_entry(tmp, _type_list, list) {
+		if (tmp->find_bridge && !tmp->find_bridge(dev, handle)) {
+			ret = 0;
+			break;
+		}
+	}
+	up_read(_type_sem);
+	return ret;
+}
+
+/* Get PCI root bridge's handle from its segment and bus number */
+struct acpi_find_pci_root {
+	unsigned int seg;
+	unsigned int bus;
+	acpi_handle handle;
+};
+
+static acpi_status
+do_root_bridge_busnr_callback (struct acpi_resource *resource, void *data)
+{
+	int *busnr = (int *)data;
+	struct acpi_resource_address64 address;
+
+	if (resource->id != ACPI_RSTYPE_ADDRESS16 &&
+	resource->id != ACPI_RSTYPE_ADDRESS32 &&
+	resource->id != ACPI_RSTYPE_ADDRESS64)
+		return AE_OK;
+
+	acpi_resource_to_address64(resource, );
+	if ((address.address_length > 0) &&
+	   (address.resource_type == ACPI_BUS_NUMBER_RANGE))
+		*busnr = address.min_address_range;
+
+	return AE_OK;
+}
+
+static int
+get_root_bridge_busnr(acpi_handle handle)
+{
+	acpi_status status;
+	int bus, bbn;
+	struct acpi_buffer	buffer = {ACPI_ALLOCATE_BUFFER, NULL};
+
+	acpi_get_name(handle, ACPI_FULL_PATHNAME, );
+
+	status = acpi_evaluate_integer(handle, METHOD_NAME__BBN, NULL,
+		(unsigned long *));
+	if (status == AE_NOT_FOUND) {
+		/* Assume bus = 0 */
+		printk(KERN_INFO PREFIX
+			"Assume root bridge [%s] bus is 0\n",
+			(char *)buffer.pointer);
+		status = AE_OK;
+		bbn = 0;
+	}
+	if (ACPI_FAILURE(status)) {
+		bbn = -ENODEV;
+		goto exit;
+	}
+	if (bbn > 0)
+		goto exit;
+
+	/* _BBN in some systems return 0 for all root bridges */
+	bus = -1;
+	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
+		do_root_bridge_busnr_callback, );
+	/* If _CRS failed, we just use _BBN */
+	if (ACPI_FAILURE(status

Re: [PATCH 0/4]Bind physical devices with ACPI devices - take 2

2005-01-17 Thread Li Shaohua
On Wed, 2005-01-05 at 10:50, Li Shaohua wrote:
 Hi,
 The series of patches implement binding physical devices with ACPI
 devices. With it, device drivers can utilize methods provided by
 firmware (ACPI). These patches are against 2.6.10, please give your
 comments.
Hi,
This is updated patches according to latest discussion.
Changes from last one:
1. introduce new field 'firmware_data' in 'struct device', since people
complain rename 'platform_data. Greg, could you please check if the
comments I added in 'struct device' are correct?
2. align to Pavel's latest PCI state convention work.
3. Some cleanups and add more comments.
One issue is 'platform_pci_choose_state' doesn't get called, it should
be after Pavel updates the parameter of 'pci_choose_state'

Thanks,
Shaohua


This patch implemented the framework for binding physical devices with ACPI
devices. A physical bus like PCI bus should create a 'acpi_bus_type'.
The method in 'acpi_bus_type':
.find_device:
	For device which has parent such as normal PCI devices.
.find_bridge:
	It's for special devices, such as PCI root bridge and IDE controller. 
such devices generally haven't parent or -bus. We use the special method 
to get an ACPI handle.

---

 2.5-root/drivers/acpi/Makefile   |2 
 2.5-root/drivers/acpi/glue.c |  360 +++
 2.5-root/drivers/acpi/ibm_acpi.c |4 
 2.5-root/include/acpi/acpi_bus.h |   21 ++
 2.5-root/include/linux/device.h  |6 
 5 files changed, 388 insertions(+), 5 deletions(-)

diff -puN /dev/null drivers/acpi/glue.c
--- /dev/null	2004-02-24 05:02:56.0 +0800
+++ 2.5-root/drivers/acpi/glue.c	2005-01-17 12:52:16.825046520 +0800
@@ -0,0 +1,360 @@
+/*
+ * Link physical devices with ACPI devices support
+ */
+#include linux/init.h
+#include linux/list.h
+#include linux/device.h
+#include linux/rwsem.h
+#include linux/acpi.h
+
+#define ACPI_GLUE_DEBUG	0
+#if ACPI_GLUE_DEBUG
+#define DBG(x...) printk(PREFIX x)
+#else
+#define DBG(x...)
+#endif
+static LIST_HEAD(bus_type_list);
+static DECLARE_RWSEM(bus_type_sem);
+
+int register_acpi_bus_type(struct acpi_bus_type *type)
+{
+	if (acpi_disabled)
+		return -ENODEV;
+	if (type  type-bus  type-find_device) {
+		down_write(bus_type_sem);
+		list_add_tail(type-list, bus_type_list);
+		up_write(bus_type_sem);
+		DBG(ACPI bus type %s registered\n, type-bus-name);
+		return 0;
+	}
+	return -ENODEV;
+}
+EXPORT_SYMBOL(register_acpi_bus_type);
+
+int unregister_acpi_bus_type(struct acpi_bus_type *type)
+{
+	if (acpi_disabled)
+		return 0;
+	if (type) {
+		down_write(bus_type_sem);
+		list_del_init(type-list);
+		up_write(bus_type_sem);
+		DBG(ACPI bus type %s unregistered\n, type-bus-name);
+		return 0;
+	}
+	return -ENODEV;
+}
+EXPORT_SYMBOL(unregister_acpi_bus_type);
+
+static struct acpi_bus_type *
+acpi_get_bus_type(struct bus_type *type)
+{
+	struct acpi_bus_type *tmp, *ret = NULL;
+
+	down_read(bus_type_sem);
+	list_for_each_entry(tmp, bus_type_list, list) {
+		if (tmp-bus == type) {
+			ret = tmp;
+			break;
+		}
+	}
+	up_read(bus_type_sem);
+	return ret;
+}
+
+static int
+acpi_find_bridge_device(struct device *dev, acpi_handle *handle)
+{
+	struct acpi_bus_type *tmp;
+	int	ret = -ENODEV;
+
+	down_read(bus_type_sem);
+	list_for_each_entry(tmp, bus_type_list, list) {
+		if (tmp-find_bridge  !tmp-find_bridge(dev, handle)) {
+			ret = 0;
+			break;
+		}
+	}
+	up_read(bus_type_sem);
+	return ret;
+}
+
+/* Get PCI root bridge's handle from its segment and bus number */
+struct acpi_find_pci_root {
+	unsigned int seg;
+	unsigned int bus;
+	acpi_handle handle;
+};
+
+static acpi_status
+do_root_bridge_busnr_callback (struct acpi_resource *resource, void *data)
+{
+	int *busnr = (int *)data;
+	struct acpi_resource_address64 address;
+
+	if (resource-id != ACPI_RSTYPE_ADDRESS16 
+	resource-id != ACPI_RSTYPE_ADDRESS32 
+	resource-id != ACPI_RSTYPE_ADDRESS64)
+		return AE_OK;
+
+	acpi_resource_to_address64(resource, address);
+	if ((address.address_length  0) 
+	   (address.resource_type == ACPI_BUS_NUMBER_RANGE))
+		*busnr = address.min_address_range;
+
+	return AE_OK;
+}
+
+static int
+get_root_bridge_busnr(acpi_handle handle)
+{
+	acpi_status status;
+	int bus, bbn;
+	struct acpi_buffer	buffer = {ACPI_ALLOCATE_BUFFER, NULL};
+
+	acpi_get_name(handle, ACPI_FULL_PATHNAME, buffer);
+
+	status = acpi_evaluate_integer(handle, METHOD_NAME__BBN, NULL,
+		(unsigned long *)bbn);
+	if (status == AE_NOT_FOUND) {
+		/* Assume bus = 0 */
+		printk(KERN_INFO PREFIX
+			Assume root bridge [%s] bus is 0\n,
+			(char *)buffer.pointer);
+		status = AE_OK;
+		bbn = 0;
+	}
+	if (ACPI_FAILURE(status)) {
+		bbn = -ENODEV;
+		goto exit;
+	}
+	if (bbn  0)
+		goto exit;
+
+	/* _BBN in some systems return 0 for all root bridges */
+	bus = -1;
+	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
+		do_root_bridge_busnr_callback, bus);
+	/* If _CRS failed, we just use _BBN */
+	if (ACPI_FAILURE(status) || (bus == -1))
+		goto exit;
+	/* We select _CRS */
+	if (bbn

  1   2   >