subject:"Re\: \[BUG\] Code reordering in swsusp breaks suspend on SMP systems"

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-25 Thread Rafael J. Wysocki

On Sunday, 25 March 2007 17:10, Maxim wrote:
> On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote:
> > On Sunday, 25 March 2007 01:40, Maxim wrote:
> > > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> > > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > > > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
> > > Hi,
> > >   I confirm that the above patch works,
> > > 
> > >   At least system didn't hang on resume with microcode driver loaded,
> > > 
> > >   I can't really test whenever it did update microcode because I almost 
> > > sure there is nothing to update
> > >   (I use core 2 duo that I bought a month ago, and an intel motherboard 
> > > with latest bios ( updated yesterday) )
> > >   I selected this driver just in case when I compiled kernel.
> > 
> > OK, thanks for testing.
> > 
> 
> 
> Hello,
>   I tested this patch, and it does work too,
>   System suspended /resumed correctly, no errors in kernel log

Thanks for the confirmation.

I wonder if someone who actually needs the microcode to be updated can test
it ...

> > -   first_cpu = first_cpu(cpu_present_map);
> > -   if (!cpu_online(first_cpu)) {
> > -   error = _cpu_up(first_cpu);
> > -   if (error) {
> > -   printk(KERN_ERR "Could not bring CPU%d up.\n",
> > -   first_cpu);
> > -   goto out;
> > -   }
> > -   }
> > -
> 
> Nice, I once have seen those lines too, and they look ridiculous, but
> could that be that they are still necessary on some systems, 

I think this was a mistake from the beginning.

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-25 Thread Maxim

On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote:
> On Sunday, 25 March 2007 01:40, Maxim wrote:
> > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
> > Hi,
> > I confirm that the above patch works,
> > 
> > At least system didn't hang on resume with microcode driver loaded,
> > 
> > I can't really test whenever it did update microcode because I almost 
> > sure there is nothing to update
> > (I use core 2 duo that I bought a month ago, and an intel motherboard 
> > with latest bios ( updated yesterday) )
> > I selected this driver just in case when I compiled kernel.
> 
> OK, thanks for testing.
> 


Hello,
I tested this patch, and it does work too,
System suspended /resumed correctly, no errors in kernel log



> - first_cpu = first_cpu(cpu_present_map);
> - if (!cpu_online(first_cpu)) {
> - error = _cpu_up(first_cpu);
> - if (error) {
> - printk(KERN_ERR "Could not bring CPU%d up.\n",
> - first_cpu);
> - goto out;
> - }
> - }
> -

Nice, I once have seen those lines too, and they look ridiculous, but could 
that be that they are still necessary on some systems,


Best regards,
Maxim Levitsky

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-25 Thread Rafael J. Wysocki

On Sunday, 25 March 2007 01:40, Maxim wrote:
> On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
> Hi,
>   I confirm that the above patch works,
> 
>   At least system didn't hang on resume with microcode driver loaded,
> 
>   I can't really test whenever it did update microcode because I almost 
> sure there is nothing to update
>   (I use core 2 duo that I bought a month ago, and an intel motherboard 
> with latest bios ( updated yesterday) )
>   I selected this driver just in case when I compiled kernel.

OK, thanks for testing.

In the meantime I've prepared the more sophisticated version of the patch that 
is
appended.  Could you please check if it still works for you?

Greetings,
Rafael


---
 arch/i386/kernel/microcode.c |   71 ---
 include/linux/cpu.h  |2 +
 kernel/cpu.c |   32 +--
 3 files changed, 85 insertions(+), 20 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c  2007-03-23 
22:57:28.0 +0100
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c   2007-03-23 
23:43:27.0 +0100
@@ -567,6 +567,53 @@ static int cpu_request_microcode(int cpu
return error;
 }
 
+static int apply_microcode_on_cpu(int cpu)
+{
+   struct cpuinfo_x86 *c = cpu_data + cpu;
+   struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
+   cpumask_t old;
+   unsigned int val[2];
+   int err = 0;
+
+   if (!uci->mc)
+   return -EINVAL;
+
+   old = current->cpus_allowed;
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
+
+   /* Check if the microcode we have in memory matches the CPU */
+   if (c->x86_vendor != X86_VENDOR_INTEL || c->x86 < 6 ||
+   cpu_has(c, X86_FEATURE_IA64) || uci->sig != cpuid_eax(0x0001))
+   err = -EINVAL;
+
+   if (!err && ((c->x86_model >= 5) || (c->x86 > 6))) {
+   /* get processor flags from MSR 0x17 */
+   rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+   if (uci->pf != (1 << ((val[1] >> 18) & 7)))
+   err = -EINVAL;
+   }
+
+   if (!err) {
+   wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+   /* see notes above for revision 1.07.  Apparent chip bug */
+   sync_core();
+   /* get the current revision from MSR 0x8B */
+   rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+   if (uci->rev != val[1])
+   err = -EINVAL;
+   }
+
+   if (!err)
+   apply_microcode(cpu);
+   else
+   printk(KERN_ERR "microcode: Could not apply microcode to CPU%d:"
+   " sig=0x%x, pf=0x%x, rev=0x%x\n",
+   cpu, uci->sig, uci->pf, uci->rev);
+
+   set_cpus_allowed(current, old);
+   return err;
+}
+
 static void microcode_init_cpu(int cpu)
 {
cpumask_t old;
@@ -577,7 +624,8 @@ static void microcode_init_cpu(int cpu)
set_cpus_allowed(current, cpumask_of_cpu(cpu));
mutex_lock(_mutex);
collect_cpu_info(cpu);
-   if (uci->valid && system_state == SYSTEM_RUNNING)
+   if (uci->valid && system_state == SYSTEM_RUNNING &&
+   !suspend_cpu_hotplug)
cpu_request_microcode(cpu);
mutex_unlock(_mutex);
set_cpus_allowed(current, old);
@@ -663,13 +711,24 @@ static int mc_sysdev_add(struct sys_devi
return 0;
 
pr_debug("Microcode:CPU %d added\n", cpu);
-   memset(uci, 0, sizeof(*uci));
+   /* If suspend_cpu_hotplug is set, the system is resuming and we should
+* use the data from before the suspend.
+*/
+   if (suspend_cpu_hotplug) {
+   err = apply_microcode_on_cpu(cpu);
+   if (err)
+   microcode_fini_cpu(cpu);
+   }
+   if (!uci->valid)
+   memset(uci, 0, sizeof(*uci));
 
err = sysfs_create_group(_dev->kobj, _attr_group);
if (err)
return err;
 
-   microcode_init_cpu(cpu);
+   if (!uci->valid)
+   microcode_init_cpu(cpu);
+
return 0;
 }
 
@@ -680,7 +739,11 @@ static int mc_sysdev_remove(struct sys_d
if (!cpu_online(cpu))
return 0;
pr_debug("Microcode:CPU %d removed\n", cpu);
-   microcode_fini_cpu(cpu);
+   /* If suspend_cpu_hotplug is set, the system is

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-25 Thread Rafael J. Wysocki

On Sunday, 25 March 2007 01:40, Maxim wrote:
 On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
  On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
   On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
On Thursday, 22 March 2007 00:39, Maxim wrote:
 On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
  On Thursday, 22 March 2007 00:09, Maxim wrote:
   On Thursday 22 March 2007 00:39:02 you wrote:
On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
 Hi,
   I confirm that the above patch works,
 
   At least system didn't hang on resume with microcode driver loaded,
 
   I can't really test whenever it did update microcode because I almost 
 sure there is nothing to update
   (I use core 2 duo that I bought a month ago, and an intel motherboard 
 with latest bios ( updated yesterday) )
   I selected this driver just in case when I compiled kernel.

OK, thanks for testing.

In the meantime I've prepared the more sophisticated version of the patch that 
is
appended.  Could you please check if it still works for you?

Greetings,
Rafael


---
 arch/i386/kernel/microcode.c |   71 ---
 include/linux/cpu.h  |2 +
 kernel/cpu.c |   32 +--
 3 files changed, 85 insertions(+), 20 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c  2007-03-23 
22:57:28.0 +0100
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c   2007-03-23 
23:43:27.0 +0100
@@ -567,6 +567,53 @@ static int cpu_request_microcode(int cpu
return error;
 }
 
+static int apply_microcode_on_cpu(int cpu)
+{
+   struct cpuinfo_x86 *c = cpu_data + cpu;
+   struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
+   cpumask_t old;
+   unsigned int val[2];
+   int err = 0;
+
+   if (!uci-mc)
+   return -EINVAL;
+
+   old = current-cpus_allowed;
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
+
+   /* Check if the microcode we have in memory matches the CPU */
+   if (c-x86_vendor != X86_VENDOR_INTEL || c-x86  6 ||
+   cpu_has(c, X86_FEATURE_IA64) || uci-sig != cpuid_eax(0x0001))
+   err = -EINVAL;
+
+   if (!err  ((c-x86_model = 5) || (c-x86  6))) {
+   /* get processor flags from MSR 0x17 */
+   rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]);
+   if (uci-pf != (1  ((val[1]  18)  7)))
+   err = -EINVAL;
+   }
+
+   if (!err) {
+   wrmsr(MSR_IA32_UCODE_REV, 0, 0);
+   /* see notes above for revision 1.07.  Apparent chip bug */
+   sync_core();
+   /* get the current revision from MSR 0x8B */
+   rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]);
+   if (uci-rev != val[1])
+   err = -EINVAL;
+   }
+
+   if (!err)
+   apply_microcode(cpu);
+   else
+   printk(KERN_ERR microcode: Could not apply microcode to CPU%d:
+sig=0x%x, pf=0x%x, rev=0x%x\n,
+   cpu, uci-sig, uci-pf, uci-rev);
+
+   set_cpus_allowed(current, old);
+   return err;
+}
+
 static void microcode_init_cpu(int cpu)
 {
cpumask_t old;
@@ -577,7 +624,8 @@ static void microcode_init_cpu(int cpu)
set_cpus_allowed(current, cpumask_of_cpu(cpu));
mutex_lock(microcode_mutex);
collect_cpu_info(cpu);
-   if (uci-valid  system_state == SYSTEM_RUNNING)
+   if (uci-valid  system_state == SYSTEM_RUNNING 
+   !suspend_cpu_hotplug)
cpu_request_microcode(cpu);
mutex_unlock(microcode_mutex);
set_cpus_allowed(current, old);
@@ -663,13 +711,24 @@ static int mc_sysdev_add(struct sys_devi
return 0;
 
pr_debug(Microcode:CPU %d added\n, cpu);
-   memset(uci, 0, sizeof(*uci));
+   /* If suspend_cpu_hotplug is set, the system is resuming and we should
+* use the data from before the suspend.
+*/
+   if (suspend_cpu_hotplug) {
+   err = apply_microcode_on_cpu(cpu);
+   if (err)
+   microcode_fini_cpu(cpu);
+   }
+   if (!uci-valid)
+   memset(uci, 0, sizeof(*uci));
 
err = sysfs_create_group(sys_dev-kobj, mc_attr_group);
if (err)
return err;
 
-   microcode_init_cpu(cpu);
+   if (!uci-valid)
+   microcode_init_cpu(cpu);
+
return 0;
 }
 
@@ -680,7 +739,11 @@ static int mc_sysdev_remove(struct sys_d
if (!cpu_online(cpu))
return 0;
pr_debug(Microcode:CPU %d removed\n, cpu);
-   microcode_fini_cpu(cpu);
+   /* If suspend_cpu_hotplug is set, the system is suspending and we should
+* keep the microcode in

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-25 Thread Maxim

On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote:
 On Sunday, 25 March 2007 01:40, Maxim wrote:
  On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
   On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:39, Maxim wrote:
  On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
   On Thursday, 22 March 2007 00:09, Maxim wrote:
On Thursday 22 March 2007 00:39:02 you wrote:
 On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
  Hi,
  I confirm that the above patch works,
  
  At least system didn't hang on resume with microcode driver loaded,
  
  I can't really test whenever it did update microcode because I almost 
  sure there is nothing to update
  (I use core 2 duo that I bought a month ago, and an intel motherboard 
  with latest bios ( updated yesterday) )
  I selected this driver just in case when I compiled kernel.
 
 OK, thanks for testing.
 


Hello,
I tested this patch, and it does work too,
System suspended /resumed correctly, no errors in kernel log



 - first_cpu = first_cpu(cpu_present_map);
 - if (!cpu_online(first_cpu)) {
 - error = _cpu_up(first_cpu);
 - if (error) {
 - printk(KERN_ERR Could not bring CPU%d up.\n,
 - first_cpu);
 - goto out;
 - }
 - }
 -

Nice, I once have seen those lines too, and they look ridiculous, but could 
that be that they are still necessary on some systems,


Best regards,
Maxim Levitsky

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-25 Thread Rafael J. Wysocki

On Sunday, 25 March 2007 17:10, Maxim wrote:
 On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote:
  On Sunday, 25 March 2007 01:40, Maxim wrote:
   On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
  On Thursday, 22 March 2007 00:39, Maxim wrote:
   On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
On Thursday, 22 March 2007 00:09, Maxim wrote:
 On Thursday 22 March 2007 00:39:02 you wrote:
  On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: 
   Hi,
 I confirm that the above patch works,
   
 At least system didn't hang on resume with microcode driver loaded,
   
 I can't really test whenever it did update microcode because I almost 
   sure there is nothing to update
 (I use core 2 duo that I bought a month ago, and an intel motherboard 
   with latest bios ( updated yesterday) )
 I selected this driver just in case when I compiled kernel.
  
  OK, thanks for testing.
  
 
 
 Hello,
   I tested this patch, and it does work too,
   System suspended /resumed correctly, no errors in kernel log

Thanks for the confirmation.

I wonder if someone who actually needs the microcode to be updated can test
it ...

  -   first_cpu = first_cpu(cpu_present_map);
  -   if (!cpu_online(first_cpu)) {
  -   error = _cpu_up(first_cpu);
  -   if (error) {
  -   printk(KERN_ERR Could not bring CPU%d up.\n,
  -   first_cpu);
  -   goto out;
  -   }
  -   }
  -
 
 Nice, I once have seen those lines too, and they look ridiculous, but
 could that be that they are still necessary on some systems, 

I think this was a mistake from the beginning.

Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-24 Thread Maxim

On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
> On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > > > Hi!
> > > > > > > > 
> > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
> > > > > > > > > anymore on my system.
> > > > > > > > > 
> > > > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > > > 
> > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] 
> > > > > > > > > PM: Change code ordering in main.c
> > > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
> > > > > > > > > swsusp: Change code ordering in disk.c
> > > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
> > > > > > > > > swsusp: Change code ordering in user.c
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > > > 
> > > > > > > > > I already reported about it, but now i know the reason why 
> > > > > > > > > suspend breaks.
> > > > > > > > > 
> > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to 
> > > > > > > > > sleep until now, 
> > > > > > > > > and it did work because those functions could be called only 
> > > > > > > > > in process context
> > > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) 
> > > > > > > > > or  idle thread  that does smp_init()).
> > > > > > > > > 
> > > > > > > > > But now they are called _after_ all tasks were suspended, so 
> > > > > > > > > if cpu_down tries for example to take a lock
> > > > > > > > > that is taken by different process, it can't since the 
> > > > > > > > > different proccess is frozen and can't release the lock.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks for detailed explanation.
> > > > > > > > 
> > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing 
> > > > > > > > this.
> > > > > > > > 
> > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen 
> > > > > > > > task
> > > > > > > > holds a lock, that's a bug.
> > > > > > > > 
> > > > > > > > > Or, it is also possible to revert this change.
> > > > > > > > 
> > > > > > > > Are you using xfs?
> > > > > > > 
> > > > > > > Well, this is the only case that can trigger it.  There are no 
> > > > > > > other freezable
> > > > > > > workqueues.
> > > > > > > 
> > > > > > > Greetings,
> > > > > > > Rafael
> > > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > Yes, you are right and it is XFS
> > > > > > 
> > > > > > System suspends and resumes with xfs and your patch correctly,
> > > > > 
> > > > > Could you please sent this information to the list?  I'd like it to 
> > > > > reach all
> > > > > of the CCed parites. ;-)
> > > > 
> > > > I did now ( sorry I just keep using this Answer command, instead of 
> > > > Answer to everybody)
> > > > I didn't intend to send private email.
> > > > > 
> > > > > > Of course I need to mention that I had to unload microcode 
> > > > > > update driver because it prevented resume,
> > > > > > because it calls firmware loader helper, and again sleeps on 
> > > > > > lock
> > > > > 
> > > > > This is interesting.  Did it happen before or is it a regression?
> > > > 
> > > > It is from the same group of bugs , I mean hang because cpu_up/down is 
> > > > called with frozen tasks
> > > > Of course it didn't happen before those reordering commits were 
> > > > introduced
> > > 
> > > Well, we want cpu_up/down to be called after processes have been frozen, 
> > > for
> > > various reasons (one of them being that applications shouldn't see us 
> > > playing
> > > with the CPUs).
> > > 
> > > Thanks for reporting this, I'll have a look at the microcode update 
> > > driver.
> > 
> > Well, I have invented the appended workaround, but I'm not sure how much
> > sense it makes with respect to the microcode driver.  At least, it doesn't
> > break my AMD64 SMP setup. ;-)
> 
> Modified version of the patch is appended.  Unfortunately I have no hardware
> supporting the microcode updates.
> 
> Greetings,
> Rafael
> 
> 
> ---
>  arch/i386/kernel/microcode.c |   28 +---
>  include/linux/cpu.h  |2 ++
>  kernel/cpu.c |   32 
>  3 files changed, 43 insertions(+), 19 deletions(-)
> 
> Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
> ===
> --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
> +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
> @@ -567,6 +567,16 @@ static int

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-24 Thread Maxim

On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote:
 On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
  On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
   On Thursday, 22 March 2007 00:39, Maxim wrote:
On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:09, Maxim wrote:
  On Thursday 22 March 2007 00:39:02 you wrote:
   On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
Hi!

 Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
 anymore on my system.
 
 I did a git-bisect and found that those commits break it:
 
 e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] 
 PM: Change code ordering in main.c
 ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
 swsusp: Change code ordering in disk.c
 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
 swsusp: Change code ordering in user.c
 

(Yep, it was in my to analyze queue).

 I already reported about it, but now i know the reason why 
 suspend breaks.
 
 The problem is that both cpu_up/cpu_down were allowed to 
 sleep until now, 
 and it did work because those functions could be called only 
 in process context
 (the one that writes to /sys/devices/system/cpu/cpu*/online) 
 or  idle thread  that does smp_init()).
 
 But now they are called _after_ all tasks were suspended, so 
 if cpu_down tries for example to take a lock
 that is taken by different process, it can't since the 
 different proccess is frozen and can't release the lock.
 

Thanks for detailed explanation.

...but, on my machine suspend works ok in -rc4. I'm not seeing 
this.

...by design, frozen tasks must not hold any locks. If frozen 
task
holds a lock, that's a bug.

 Or, it is also possible to revert this change.

Are you using xfs?
   
   Well, this is the only case that can trigger it.  There are no 
   other freezable
   workqueues.
   
   Greetings,
   Rafael
   
  
  Hello,
  
  Yes, you are right and it is XFS
  
  System suspends and resumes with xfs and your patch correctly,
 
 Could you please sent this information to the list?  I'd like it to 
 reach all
 of the CCed parites. ;-)

I did now ( sorry I just keep using this Answer command, instead of 
Answer to everybody)
I didn't intend to send private email.
 
  Of course I need to mention that I had to unload microcode 
  update driver because it prevented resume,
  because it calls firmware loader helper, and again sleeps on 
  lock
 
 This is interesting.  Did it happen before or is it a regression?

It is from the same group of bugs , I mean hang because cpu_up/down is 
called with frozen tasks
Of course it didn't happen before those reordering commits were 
introduced
   
   Well, we want cpu_up/down to be called after processes have been frozen, 
   for
   various reasons (one of them being that applications shouldn't see us 
   playing
   with the CPUs).
   
   Thanks for reporting this, I'll have a look at the microcode update 
   driver.
  
  Well, I have invented the appended workaround, but I'm not sure how much
  sense it makes with respect to the microcode driver.  At least, it doesn't
  break my AMD64 SMP setup. ;-)
 
 Modified version of the patch is appended.  Unfortunately I have no hardware
 supporting the microcode updates.
 
 Greetings,
 Rafael
 
 
 ---
  arch/i386/kernel/microcode.c |   28 +---
  include/linux/cpu.h  |2 ++
  kernel/cpu.c |   32 
  3 files changed, 43 insertions(+), 19 deletions(-)
 
 Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
 ===
 --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
 +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
 @@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu
   return error;
  }
  
 +static void apply_microcode_on_cpu(int cpu)
 +{
 + cpumask_t old;
 +
 + old = current-cpus_allowed;
 + set_cpus_allowed(current, cpumask_of_cpu(cpu));
 + apply_microcode(cpu);
 + set_cpus_allowed(current, old);
 +}
 +
  static void microcode_init_cpu(int cpu)
  {
   cpumask_t old;
 @@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi
   return 0;
  
   pr_debug(Microcode:CPU %d added\n, cpu);
 - memset(uci, 0, sizeof(*uci));
 + /* If suspend_cpu_hotplug is set, the system is resuming and we should
 +  * use the data from before the

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-23 Thread Rafael J. Wysocki

On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:39, Maxim wrote:
> > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > > Hi!
> > > > > > > 
> > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
> > > > > > > > anymore on my system.
> > > > > > > > 
> > > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > > 
> > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
> > > > > > > > Change code ordering in main.c
> > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
> > > > > > > > swsusp: Change code ordering in disk.c
> > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
> > > > > > > > swsusp: Change code ordering in user.c
> > > > > > > > 
> > > > > > > 
> > > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > > 
> > > > > > > > I already reported about it, but now i know the reason why 
> > > > > > > > suspend breaks.
> > > > > > > > 
> > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep 
> > > > > > > > until now, 
> > > > > > > > and it did work because those functions could be called only in 
> > > > > > > > process context
> > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or 
> > > > > > > >  idle thread  that does smp_init()).
> > > > > > > > 
> > > > > > > > But now they are called _after_ all tasks were suspended, so if 
> > > > > > > > cpu_down tries for example to take a lock
> > > > > > > > that is taken by different process, it can't since the 
> > > > > > > > different proccess is frozen and can't release the lock.
> > > > > > > > 
> > > > > > > 
> > > > > > > Thanks for detailed explanation.
> > > > > > > 
> > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing 
> > > > > > > this.
> > > > > > > 
> > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen 
> > > > > > > task
> > > > > > > holds a lock, that's a bug.
> > > > > > > 
> > > > > > > > Or, it is also possible to revert this change.
> > > > > > > 
> > > > > > > Are you using xfs?
> > > > > > 
> > > > > > Well, this is the only case that can trigger it.  There are no 
> > > > > > other freezable
> > > > > > workqueues.
> > > > > > 
> > > > > > Greetings,
> > > > > > Rafael
> > > > > > 
> > > > > 
> > > > > Hello,
> > > > > 
> > > > >   Yes, you are right and it is XFS
> > > > > 
> > > > >   System suspends and resumes with xfs and your patch correctly,
> > > > 
> > > > Could you please sent this information to the list?  I'd like it to 
> > > > reach all
> > > > of the CCed parites. ;-)
> > > 
> > > I did now ( sorry I just keep using this Answer command, instead of 
> > > Answer to everybody)
> > > I didn't intend to send private email.
> > > > 
> > > > >   Of course I need to mention that I had to unload microcode 
> > > > > update driver because it prevented resume,
> > > > >   because it calls firmware loader helper, and again sleeps on 
> > > > > lock
> > > > 
> > > > This is interesting.  Did it happen before or is it a regression?
> > > 
> > > It is from the same group of bugs , I mean hang because cpu_up/down is 
> > > called with frozen tasks
> > > Of course it didn't happen before those reordering commits were introduced
> > 
> > Well, we want cpu_up/down to be called after processes have been frozen, for
> > various reasons (one of them being that applications shouldn't see us 
> > playing
> > with the CPUs).
> > 
> > Thanks for reporting this, I'll have a look at the microcode update driver.
> 
> Well, I have invented the appended workaround, but I'm not sure how much
> sense it makes with respect to the microcode driver.  At least, it doesn't
> break my AMD64 SMP setup. ;-)

Modified version of the patch is appended.  Unfortunately I have no hardware
supporting the microcode updates.

Greetings,
Rafael


---
 arch/i386/kernel/microcode.c |   28 +---
 include/linux/cpu.h  |2 ++
 kernel/cpu.c |   32 
 3 files changed, 43 insertions(+), 19 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
@@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu
return error;
 }
 
+static void apply_microcode_on_cpu(int cpu)
+{
+   cpumask_t old;
+
+   old = current->cpus_allowed;
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
+   apply_microcode(cpu);
+   set_cpus_allowed(current, old);
+}
+
 static

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-23 Thread Rafael J. Wysocki

On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
  On Thursday, 22 March 2007 00:39, Maxim wrote:
   On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
On Thursday, 22 March 2007 00:09, Maxim wrote:
 On Thursday 22 March 2007 00:39:02 you wrote:
  On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
   Hi!
   
Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
anymore on my system.

I did a git-bisect and found that those commits break it:

e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
Change code ordering in main.c
ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
swsusp: Change code ordering in disk.c
259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
swsusp: Change code ordering in user.c

   
   (Yep, it was in my to analyze queue).
   
I already reported about it, but now i know the reason why 
suspend breaks.

The problem is that both cpu_up/cpu_down were allowed to sleep 
until now, 
and it did work because those functions could be called only in 
process context
(the one that writes to /sys/devices/system/cpu/cpu*/online) or 
 idle thread  that does smp_init()).

But now they are called _after_ all tasks were suspended, so if 
cpu_down tries for example to take a lock
that is taken by different process, it can't since the 
different proccess is frozen and can't release the lock.

   
   Thanks for detailed explanation.
   
   ...but, on my machine suspend works ok in -rc4. I'm not seeing 
   this.
   
   ...by design, frozen tasks must not hold any locks. If frozen 
   task
   holds a lock, that's a bug.
   
Or, it is also possible to revert this change.
   
   Are you using xfs?
  
  Well, this is the only case that can trigger it.  There are no 
  other freezable
  workqueues.
  
  Greetings,
  Rafael
  
 
 Hello,
 
   Yes, you are right and it is XFS
 
   System suspends and resumes with xfs and your patch correctly,

Could you please sent this information to the list?  I'd like it to 
reach all
of the CCed parites. ;-)
   
   I did now ( sorry I just keep using this Answer command, instead of 
   Answer to everybody)
   I didn't intend to send private email.

   Of course I need to mention that I had to unload microcode 
 update driver because it prevented resume,
   because it calls firmware loader helper, and again sleeps on 
 lock

This is interesting.  Did it happen before or is it a regression?
   
   It is from the same group of bugs , I mean hang because cpu_up/down is 
   called with frozen tasks
   Of course it didn't happen before those reordering commits were introduced
  
  Well, we want cpu_up/down to be called after processes have been frozen, for
  various reasons (one of them being that applications shouldn't see us 
  playing
  with the CPUs).
  
  Thanks for reporting this, I'll have a look at the microcode update driver.
 
 Well, I have invented the appended workaround, but I'm not sure how much
 sense it makes with respect to the microcode driver.  At least, it doesn't
 break my AMD64 SMP setup. ;-)

Modified version of the patch is appended.  Unfortunately I have no hardware
supporting the microcode updates.

Greetings,
Rafael


---
 arch/i386/kernel/microcode.c |   28 +---
 include/linux/cpu.h  |2 ++
 kernel/cpu.c |   32 
 3 files changed, 43 insertions(+), 19 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c
@@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu
return error;
 }
 
+static void apply_microcode_on_cpu(int cpu)
+{
+   cpumask_t old;
+
+   old = current-cpus_allowed;
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
+   apply_microcode(cpu);
+   set_cpus_allowed(current, old);
+}
+
 static void microcode_init_cpu(int cpu)
 {
cpumask_t old;
@@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi
return 0;
 
pr_debug(Microcode:CPU %d added\n, cpu);
-   memset(uci, 0, sizeof(*uci));
+   /* If suspend_cpu_hotplug is set, the system is resuming and we should
+* use the data from before the suspend.
+*/
+   if (!suspend_cpu_hotplug)
+   memset(uci, 0, sizeof(*uci));
 
err = sysfs_create_group(sys_dev-kobj, mc_attr_group);
if

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:39, Maxim wrote:
> > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > Hi!
> > > > > > 
> > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
> > > > > > > anymore on my system.
> > > > > > > 
> > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > 
> > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
> > > > > > > Change code ordering in main.c
> > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
> > > > > > > swsusp: Change code ordering in disk.c
> > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
> > > > > > > swsusp: Change code ordering in user.c
> > > > > > > 
> > > > > > 
> > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > 
> > > > > > > I already reported about it, but now i know the reason why 
> > > > > > > suspend breaks.
> > > > > > > 
> > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep 
> > > > > > > until now, 
> > > > > > > and it did work because those functions could be called only in 
> > > > > > > process context
> > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
> > > > > > > idle thread  that does smp_init()).
> > > > > > > 
> > > > > > > But now they are called _after_ all tasks were suspended, so if 
> > > > > > > cpu_down tries for example to take a lock
> > > > > > > that is taken by different process, it can't since the different 
> > > > > > > proccess is frozen and can't release the lock.
> > > > > > > 
> > > > > > 
> > > > > > Thanks for detailed explanation.
> > > > > > 
> > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > 
> > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > holds a lock, that's a bug.
> > > > > > 
> > > > > > > Or, it is also possible to revert this change.
> > > > > > 
> > > > > > Are you using xfs?
> > > > > 
> > > > > Well, this is the only case that can trigger it.  There are no other 
> > > > > freezable
> > > > > workqueues.
> > > > > 
> > > > > Greetings,
> > > > > Rafael
> > > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > Yes, you are right and it is XFS
> > > > 
> > > > System suspends and resumes with xfs and your patch correctly,
> > > 
> > > Could you please sent this information to the list?  I'd like it to reach 
> > > all
> > > of the CCed parites. ;-)
> > 
> > I did now ( sorry I just keep using this Answer command, instead of Answer 
> > to everybody)
> > I didn't intend to send private email.
> > > 
> > > > Of course I need to mention that I had to unload microcode 
> > > > update driver because it prevented resume,
> > > > because it calls firmware loader helper, and again sleeps on 
> > > > lock
> > > 
> > > This is interesting.  Did it happen before or is it a regression?
> > 
> > It is from the same group of bugs , I mean hang because cpu_up/down is 
> > called with frozen tasks
> > Of course it didn't happen before those reordering commits were introduced
> 
> Well, we want cpu_up/down to be called after processes have been frozen, for
> various reasons (one of them being that applications shouldn't see us playing
> with the CPUs).
> 
> Thanks for reporting this, I'll have a look at the microcode update driver.

Well, I have invented the appended workaround, but I'm not sure how much
sense it makes with respect to the microcode driver.  At least, it doesn't
break my AMD64 SMP setup. ;-)

Greetings,
Rafael

---
 arch/i386/kernel/microcode.c |   23 +--
 include/linux/cpu.h  |2 ++
 kernel/cpu.c |   20 +++-
 3 files changed, 38 insertions(+), 7 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c  2007-03-16 
21:51:23.0 +0100
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c   2007-03-22 
23:55:40.0 +0100
@@ -703,6 +703,16 @@ static struct sysdev_driver mc_sysdev_dr
.resume = mc_sysdev_resume,
 };
 
+static __cpuinit void apply_microcode_on_cpu(int cpu)
+{
+   cpumask_t old;
+
+   old = current->cpus_allowed;
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
+   apply_microcode(cpu);
+   set_cpus_allowed(current, old);
+}
+
 static __cpuinit int
 mc_cpu_callback(struct notifier_block *nb, unsigned long action, void *hcpu)
 {
@@ -713,10 +723,19 @@ mc_cpu_callback(struct notifier_block *n
switch (action) {
case CPU_ONLINE:
case CPU_DOWN_FAILED:
-   mc_sysdev_add(sys_dev);
+

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 08:31, Andrew Morton wrote:
> On Thu, 22 Mar 2007 08:23:39 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
> wrote:
> 
> > On Thursday, 22 March 2007 05:51, David Chinner wrote:
> > > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> > > > 
> > > > I think this is the XFS problem with freezable workqueues.
> > > > 
> > > > Maxim, please try to apply the appended patch and see if it helps.
> > > > 
> > > > Greetings,
> > > > Rafael
> > > > 
> > > > 
> > > > ---
> > > > Since freezable workqueues are broken in 2.6.21-rc
> > > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> > > > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> > > > it's better to remove them altogether for 2.6.21 and change the only 
> > > > user of
> > > > them (XFS) accordingly.
> > > > 
> > > > ---
> > > >  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > > ===
> > > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > > > if (!xfs_buf_zone)
> > > > goto out_free_trace_buf;
> > > >  
> > > > -   xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > > > +   xfslogd_workqueue = create_workqueue("xfslogd");
> > > > if (!xfslogd_workqueue)
> > > > goto out_free_buf_zone;
> > > >  
> > > > -   xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > > > +   xfsdatad_workqueue = create_workqueue("xfsdatad");
> > > > if (!xfsdatad_workqueue)
> > > > goto out_destroy_xfslogd_workqueue;
> > > >  
> > > 
> > > Acked-by: Dave Chinner <[EMAIL PROTECTED]>
> > > 
> > > Rafael, it sounds like this really needs to go into the next -rc
> > > kernel. Can you push it to Linus?
> > 
> > I tried, but Andrew resent it to you. :-)
> > 
> > Andrew, could we please merge it?
> 
> Sure.
> 
> It shouldn't be this hard :(

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Andrew Morton

On Thu, 22 Mar 2007 08:23:39 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
wrote:

> On Thursday, 22 March 2007 05:51, David Chinner wrote:
> > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> > > 
> > > I think this is the XFS problem with freezable workqueues.
> > > 
> > > Maxim, please try to apply the appended patch and see if it helps.
> > > 
> > > Greetings,
> > > Rafael
> > > 
> > > 
> > > ---
> > > Since freezable workqueues are broken in 2.6.21-rc
> > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> > > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> > > it's better to remove them altogether for 2.6.21 and change the only user 
> > > of
> > > them (XFS) accordingly.
> > > 
> > > ---
> > >  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > ===
> > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > >   if (!xfs_buf_zone)
> > >   goto out_free_trace_buf;
> > >  
> > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > > + xfslogd_workqueue = create_workqueue("xfslogd");
> > >   if (!xfslogd_workqueue)
> > >   goto out_free_buf_zone;
> > >  
> > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > > + xfsdatad_workqueue = create_workqueue("xfsdatad");
> > >   if (!xfsdatad_workqueue)
> > >   goto out_destroy_xfslogd_workqueue;
> > >  
> > 
> > Acked-by: Dave Chinner <[EMAIL PROTECTED]>
> > 
> > Rafael, it sounds like this really needs to go into the next -rc
> > kernel. Can you push it to Linus?
> 
> I tried, but Andrew resent it to you. :-)
> 
> Andrew, could we please merge it?

Sure.

It shouldn't be this hard :(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 05:51, David Chinner wrote:
> On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> > 
> > I think this is the XFS problem with freezable workqueues.
> > 
> > Maxim, please try to apply the appended patch and see if it helps.
> > 
> > Greetings,
> > Rafael
> > 
> > 
> > ---
> > Since freezable workqueues are broken in 2.6.21-rc
> > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> > it's better to remove them altogether for 2.6.21 and change the only user of
> > them (XFS) accordingly.
> > 
> > ---
> >  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > ===
> > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > if (!xfs_buf_zone)
> > goto out_free_trace_buf;
> >  
> > -   xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > +   xfslogd_workqueue = create_workqueue("xfslogd");
> > if (!xfslogd_workqueue)
> > goto out_free_buf_zone;
> >  
> > -   xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > +   xfsdatad_workqueue = create_workqueue("xfsdatad");
> > if (!xfsdatad_workqueue)
> > goto out_destroy_xfslogd_workqueue;
> >  
> 
> Acked-by: Dave Chinner <[EMAIL PROTECTED]>
> 
> Rafael, it sounds like this really needs to go into the next -rc
> kernel. Can you push it to Linus?

I tried, but Andrew resent it to you. :-)

Andrew, could we please merge it?

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 05:51, David Chinner wrote:
 On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
  
  I think this is the XFS problem with freezable workqueues.
  
  Maxim, please try to apply the appended patch and see if it helps.
  
  Greetings,
  Rafael
  
  
  ---
  Since freezable workqueues are broken in 2.6.21-rc
  (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
  http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
  it's better to remove them altogether for 2.6.21 and change the only user of
  them (XFS) accordingly.
  
  ---
   fs/xfs/linux-2.6/xfs_buf.c |4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
  ===
  --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
  +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
  @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
  if (!xfs_buf_zone)
  goto out_free_trace_buf;
   
  -   xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
  +   xfslogd_workqueue = create_workqueue(xfslogd);
  if (!xfslogd_workqueue)
  goto out_free_buf_zone;
   
  -   xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
  +   xfsdatad_workqueue = create_workqueue(xfsdatad);
  if (!xfsdatad_workqueue)
  goto out_destroy_xfslogd_workqueue;
   
 
 Acked-by: Dave Chinner [EMAIL PROTECTED]
 
 Rafael, it sounds like this really needs to go into the next -rc
 kernel. Can you push it to Linus?

I tried, but Andrew resent it to you. :-)

Andrew, could we please merge it?

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Andrew Morton

On Thu, 22 Mar 2007 08:23:39 +0100 Rafael J. Wysocki [EMAIL PROTECTED] 
wrote:

 On Thursday, 22 March 2007 05:51, David Chinner wrote:
  On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
   
   I think this is the XFS problem with freezable workqueues.
   
   Maxim, please try to apply the appended patch and see if it helps.
   
   Greetings,
   Rafael
   
   
   ---
   Since freezable workqueues are broken in 2.6.21-rc
   (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
   http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
   it's better to remove them altogether for 2.6.21 and change the only user 
   of
   them (XFS) accordingly.
   
   ---
fs/xfs/linux-2.6/xfs_buf.c |4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
   
   Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
   ===
   --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
   +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
   @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
 if (!xfs_buf_zone)
 goto out_free_trace_buf;

   - xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
   + xfslogd_workqueue = create_workqueue(xfslogd);
 if (!xfslogd_workqueue)
 goto out_free_buf_zone;

   - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
   + xfsdatad_workqueue = create_workqueue(xfsdatad);
 if (!xfsdatad_workqueue)
 goto out_destroy_xfslogd_workqueue;

  
  Acked-by: Dave Chinner [EMAIL PROTECTED]
  
  Rafael, it sounds like this really needs to go into the next -rc
  kernel. Can you push it to Linus?
 
 I tried, but Andrew resent it to you. :-)
 
 Andrew, could we please merge it?

Sure.

It shouldn't be this hard :(
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 08:31, Andrew Morton wrote:
 On Thu, 22 Mar 2007 08:23:39 +0100 Rafael J. Wysocki [EMAIL PROTECTED] 
 wrote:
 
  On Thursday, 22 March 2007 05:51, David Chinner wrote:
   On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:

I think this is the XFS problem with freezable workqueues.

Maxim, please try to apply the appended patch and see if it helps.

Greetings,
Rafael


---
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
it's better to remove them altogether for 2.6.21 and change the only 
user of
them (XFS) accordingly.

---
 fs/xfs/linux-2.6/xfs_buf.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
@@ -1829,11 +1829,11 @@ xfs_buf_init(void)
if (!xfs_buf_zone)
goto out_free_trace_buf;
 
-   xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
+   xfslogd_workqueue = create_workqueue(xfslogd);
if (!xfslogd_workqueue)
goto out_free_buf_zone;
 
-   xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
+   xfsdatad_workqueue = create_workqueue(xfsdatad);
if (!xfsdatad_workqueue)
goto out_destroy_xfslogd_workqueue;
 
   
   Acked-by: Dave Chinner [EMAIL PROTECTED]
   
   Rafael, it sounds like this really needs to go into the next -rc
   kernel. Can you push it to Linus?
  
  I tried, but Andrew resent it to you. :-)
  
  Andrew, could we please merge it?
 
 Sure.
 
 It shouldn't be this hard :(

Thanks!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-22 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:39, Maxim wrote:
  On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
   On Thursday, 22 March 2007 00:09, Maxim wrote:
On Thursday 22 March 2007 00:39:02 you wrote:
 On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
  Hi!
  
   Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
   anymore on my system.
   
   I did a git-bisect and found that those commits break it:
   
   e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
   Change code ordering in main.c
   ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
   swsusp: Change code ordering in disk.c
   259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
   swsusp: Change code ordering in user.c
   
  
  (Yep, it was in my to analyze queue).
  
   I already reported about it, but now i know the reason why 
   suspend breaks.
   
   The problem is that both cpu_up/cpu_down were allowed to sleep 
   until now, 
   and it did work because those functions could be called only in 
   process context
   (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
   idle thread  that does smp_init()).
   
   But now they are called _after_ all tasks were suspended, so if 
   cpu_down tries for example to take a lock
   that is taken by different process, it can't since the different 
   proccess is frozen and can't release the lock.
   
  
  Thanks for detailed explanation.
  
  ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
  
  ...by design, frozen tasks must not hold any locks. If frozen task
  holds a lock, that's a bug.
  
   Or, it is also possible to revert this change.
  
  Are you using xfs?
 
 Well, this is the only case that can trigger it.  There are no other 
 freezable
 workqueues.
 
 Greetings,
 Rafael
 

Hello,

Yes, you are right and it is XFS

System suspends and resumes with xfs and your patch correctly,
   
   Could you please sent this information to the list?  I'd like it to reach 
   all
   of the CCed parites. ;-)
  
  I did now ( sorry I just keep using this Answer command, instead of Answer 
  to everybody)
  I didn't intend to send private email.
   
Of course I need to mention that I had to unload microcode 
update driver because it prevented resume,
because it calls firmware loader helper, and again sleeps on 
lock
   
   This is interesting.  Did it happen before or is it a regression?
  
  It is from the same group of bugs , I mean hang because cpu_up/down is 
  called with frozen tasks
  Of course it didn't happen before those reordering commits were introduced
 
 Well, we want cpu_up/down to be called after processes have been frozen, for
 various reasons (one of them being that applications shouldn't see us playing
 with the CPUs).
 
 Thanks for reporting this, I'll have a look at the microcode update driver.

Well, I have invented the appended workaround, but I'm not sure how much
sense it makes with respect to the microcode driver.  At least, it doesn't
break my AMD64 SMP setup. ;-)

Greetings,
Rafael

---
 arch/i386/kernel/microcode.c |   23 +--
 include/linux/cpu.h  |2 ++
 kernel/cpu.c |   20 +++-
 3 files changed, 38 insertions(+), 7 deletions(-)

Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c
===
--- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c  2007-03-16 
21:51:23.0 +0100
+++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c   2007-03-22 
23:55:40.0 +0100
@@ -703,6 +703,16 @@ static struct sysdev_driver mc_sysdev_dr
.resume = mc_sysdev_resume,
 };
 
+static __cpuinit void apply_microcode_on_cpu(int cpu)
+{
+   cpumask_t old;
+
+   old = current-cpus_allowed;
+   set_cpus_allowed(current, cpumask_of_cpu(cpu));
+   apply_microcode(cpu);
+   set_cpus_allowed(current, old);
+}
+
 static __cpuinit int
 mc_cpu_callback(struct notifier_block *nb, unsigned long action, void *hcpu)
 {
@@ -713,10 +723,19 @@ mc_cpu_callback(struct notifier_block *n
switch (action) {
case CPU_ONLINE:
case CPU_DOWN_FAILED:
-   mc_sysdev_add(sys_dev);
+   if (suspend_cpu_hotplug)
+   /* This means we have been called during the resume.
+* This is not the real CPU hotplug and we don't need
+* to register the sysdev
+*/
+   apply_microcode_on_cpu(cpu);
+   else
+   mc_sysdev_add(sys_dev);
break;
case

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread David Chinner

On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
> 
> I think this is the XFS problem with freezable workqueues.
> 
> Maxim, please try to apply the appended patch and see if it helps.
> 
> Greetings,
> Rafael
> 
> 
> ---
> Since freezable workqueues are broken in 2.6.21-rc
> (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> it's better to remove them altogether for 2.6.21 and change the only user of
> them (XFS) accordingly.
> 
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> ===
> --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
>   if (!xfs_buf_zone)
>   goto out_free_trace_buf;
>  
> - xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> + xfslogd_workqueue = create_workqueue("xfslogd");
>   if (!xfslogd_workqueue)
>   goto out_free_buf_zone;
>  
> - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> + xfsdatad_workqueue = create_workqueue("xfsdatad");
>   if (!xfsdatad_workqueue)
>   goto out_destroy_xfslogd_workqueue;
>  

Acked-by: Dave Chinner <[EMAIL PROTECTED]>

Rafael, it sounds like this really needs to go into the next -rc
kernel. Can you push it to Linus?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:14:05 Maxim wrote:
> On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote:
> > Hi.
> > 
> > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> > > Hi,
> > > 
> > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on 
> > > my system.
> > > 
> > > I did a git-bisect and found that those commits break it:
> > > 
> > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
> > > code ordering in main.c
> > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
> > > code ordering in disk.c
> > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
> > > code ordering in user.c
> > > 
> > > I already reported about it, but now i know the reason why suspend breaks.
> > > 
> > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > and it did work because those functions could be called only in process 
> > > context
> > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
> > > thread  that does smp_init()).
> > > 
> > > But now they are called _after_ all tasks were suspended, so if cpu_down 
> > > tries for example to take a lock
> > > that is taken by different process, it can't since the different proccess 
> > > is frozen and can't release the lock.
> > > 
> > > I tested this and all results are positive:
> > > 
> > > I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> > > Suspend to disk went correctly, but it hang on resume, and I know why.
> > > It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> > > 
> > > I was able using kdb to confirm that this is true because it was still 
> > > possible to enter kdb, and see that
> > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
> > > workqueue_cpu_callback.
> > > 
> > > The solution for this problem seems to be ether complete audit of code 
> > > that uses register_cpu_notifier,
> > > to ensure that it doesn't sleep. 
> > > Also documentation should be changed to note about it.
> > > 
> > > Or, it is also possible to revert this change.
> > 
> > Do you know exactly which mutex was being waited on and where it was
> > taken? If you can say that, it would be much more helpful.
> > 
> > Regards,
> > 
> > Nigel
> > 
> > 
> 
> Hello,
> 
>   It is workqueue_mutex
>   and it is taken in kernel/workqueue.c:797
> 
>   this is guilt of freezable work queues , and XFS uses it (and I use XFS)
> 
>   Thanks to Rafael J. Wysocki for pointing it out to me.
> 
>   Regards,
>   Maxim Levitsky
> 


I think that I made a mistake,

I now reverted the patch that fixes the above error, and I wrote down 
back-trace from this hang,
and it appears that system hangs in kthread_stop:

This I have written in paper, using kdb:

workqueue_cpu_callback ->
cleanup_workqueue_thread ->
kthread_stop ->

and then wake_up_process (I think , I didn't wrote this on paper, I will check 
again)


Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:47:05 Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote:
> > > Do you know exactly which mutex was being waited on and where it was
> > > taken? If you can say that, it would be much more helpful.
> 
> Yeah, me too, but assuming too much sometimes bites me :)
> 
> > I think this is the XFS problem with freezable workqueues.
> > 
> > Maxim, please try to apply the appended patch and see if it helps.
> 
> Thanks for your subsequent messages, Maxim. Could you confirm for us
> that the patch Rafael attached fixes it?
> 
> Regards,
> 
> Nigel
> 
> > ---
> > Since freezable workqueues are broken in 2.6.21-rc
> > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> > it's better to remove them altogether for 2.6.21 and change the only user of
> > them (XFS) accordingly.
> > 
> > ---
> >  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > ===
> > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> > @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
> > if (!xfs_buf_zone)
> > goto out_free_trace_buf;
> >  
> > -   xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> > +   xfslogd_workqueue = create_workqueue("xfslogd");
> > if (!xfslogd_workqueue)
> > goto out_free_buf_zone;
> >  
> > -   xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> > +   xfsdatad_workqueue = create_workqueue("xfsdatad");
> > if (!xfsdatad_workqueue)
> > goto out_destroy_xfslogd_workqueue;
> >  
> 
> 

Hello,

I can confirm now that the above patch work,

First as I said I did try to suspend with this patch and without XFS, and it 
did work,
Now I reverted it and now system still suspends correctly without xfs module 
loaded ( I didn't tell you that i use now ext3,
and that I generally compile everything in kernel, so i put XFS too, because I 
used it once, and I still have a XFS disk image)

But system hangs with XFS loaded, so this patch works.

Regards,
Maxim Levitsky


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:53:54 Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:39, Maxim wrote:
> > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > > Hi!
> > > > > > 
> > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
> > > > > > > anymore on my system.
> > > > > > > 
> > > > > > > I did a git-bisect and found that those commits break it:
> > > > > > > 
> > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
> > > > > > > Change code ordering in main.c
> > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
> > > > > > > swsusp: Change code ordering in disk.c
> > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
> > > > > > > swsusp: Change code ordering in user.c
> > > > > > > 
> > > > > > 
> > > > > > (Yep, it was in my "to analyze" queue).
> > > > > > 
> > > > > > > I already reported about it, but now i know the reason why 
> > > > > > > suspend breaks.
> > > > > > > 
> > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep 
> > > > > > > until now, 
> > > > > > > and it did work because those functions could be called only in 
> > > > > > > process context
> > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
> > > > > > > idle thread  that does smp_init()).
> > > > > > > 
> > > > > > > But now they are called _after_ all tasks were suspended, so if 
> > > > > > > cpu_down tries for example to take a lock
> > > > > > > that is taken by different process, it can't since the different 
> > > > > > > proccess is frozen and can't release the lock.
> > > > > > > 
> > > > > > 
> > > > > > Thanks for detailed explanation.
> > > > > > 
> > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > > 
> > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > > holds a lock, that's a bug.
> > > > > > 
> > > > > > > Or, it is also possible to revert this change.
> > > > > > 
> > > > > > Are you using xfs?
> > > > > 
> > > > > Well, this is the only case that can trigger it.  There are no other 
> > > > > freezable
> > > > > workqueues.
> > > > > 
> > > > > Greetings,
> > > > > Rafael
> > > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > Yes, you are right and it is XFS
> > > > 
> > > > System suspends and resumes with xfs and your patch correctly,
> > > 
> > > Could you please sent this information to the list?  I'd like it to reach 
> > > all
> > > of the CCed parites. ;-)
> > 
> > I did now ( sorry I just keep using this Answer command, instead of Answer 
> > to everybody)
> > I didn't intend to send private email.
> > > 
> > > > Of course I need to mention that I had to unload microcode 
> > > > update driver because it prevented resume,
> > > > because it calls firmware loader helper, and again sleeps on 
> > > > lock
> > > 
> > > This is interesting.  Did it happen before or is it a regression?
> > 
> > It is from the same group of bugs , I mean hang because cpu_up/down is 
> > called with frozen tasks
> > Of course it didn't happen before those reordering commits were introduced
> 
> Well, we want cpu_up/down to be called after processes have been frozen, for
> various reasons (one of them being that applications shouldn't see us playing
> with the CPUs).
> 
> Thanks for reporting this, I'll have a look at the microcode update driver.
> 
> > > > And also I noticed now that system oopses on second attempt to 
> > > > suspend ether to ram or disk
> > > > in pci_restore_msi_state which is called indirectly by 
> > > > ahci_pci_device_resume, I will investigate this soon.
> > > 
> > > Thanks.  We've had such reports earlier, but I think the problem is still 
> > > unresolved.  Any
> > > additional information will be valuable.
> > 
> > I will do my best,
> > Also I want to note that the above problem is 100% repeatable, and happens 
> > independently whenever suspend to disk
> > or suspend to ram was used in first successful try ( or at least, I got 
> > back-trace using kdb, after suspend to disk, after
> > suspend to ram system hang, so I assume, that this it is same problem , 
> > because it didn't hang of first try)
> 
> Thanks for the information.
> 
> BTW, what's the last kernel you have tested?
> 
> Rafael
> 

Hello, 
Thanks for quick response, 
I will continue to test my system

I use literally latest Linus's git tree.

The kernel that works is 2.6.20 , and except very weird hang that happens 
sometimes (1 in 5~6 times) on resume from ram, everything works
I described it in http://lkml.org/lkml/2007/3/16/126

Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 00:39, Maxim wrote:
> On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > Hi!
> > > > > 
> > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
> > > > > > anymore on my system.
> > > > > > 
> > > > > > I did a git-bisect and found that those commits break it:
> > > > > > 
> > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
> > > > > > Change code ordering in main.c
> > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
> > > > > > Change code ordering in disk.c
> > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
> > > > > > Change code ordering in user.c
> > > > > > 
> > > > > 
> > > > > (Yep, it was in my "to analyze" queue).
> > > > > 
> > > > > > I already reported about it, but now i know the reason why suspend 
> > > > > > breaks.
> > > > > > 
> > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep 
> > > > > > until now, 
> > > > > > and it did work because those functions could be called only in 
> > > > > > process context
> > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
> > > > > > idle thread  that does smp_init()).
> > > > > > 
> > > > > > But now they are called _after_ all tasks were suspended, so if 
> > > > > > cpu_down tries for example to take a lock
> > > > > > that is taken by different process, it can't since the different 
> > > > > > proccess is frozen and can't release the lock.
> > > > > > 
> > > > > 
> > > > > Thanks for detailed explanation.
> > > > > 
> > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > 
> > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > holds a lock, that's a bug.
> > > > > 
> > > > > > Or, it is also possible to revert this change.
> > > > > 
> > > > > Are you using xfs?
> > > > 
> > > > Well, this is the only case that can trigger it.  There are no other 
> > > > freezable
> > > > workqueues.
> > > > 
> > > > Greetings,
> > > > Rafael
> > > > 
> > > 
> > > Hello,
> > > 
> > >   Yes, you are right and it is XFS
> > > 
> > >   System suspends and resumes with xfs and your patch correctly,
> > 
> > Could you please sent this information to the list?  I'd like it to reach 
> > all
> > of the CCed parites. ;-)
> 
> I did now ( sorry I just keep using this Answer command, instead of Answer to 
> everybody)
> I didn't intend to send private email.
> > 
> > >   Of course I need to mention that I had to unload microcode update 
> > > driver because it prevented resume,
> > >   because it calls firmware loader helper, and again sleeps on lock
> > 
> > This is interesting.  Did it happen before or is it a regression?
> 
> It is from the same group of bugs , I mean hang because cpu_up/down is called 
> with frozen tasks
> Of course it didn't happen before those reordering commits were introduced

Well, we want cpu_up/down to be called after processes have been frozen, for
various reasons (one of them being that applications shouldn't see us playing
with the CPUs).

Thanks for reporting this, I'll have a look at the microcode update driver.

> > >   And also I noticed now that system oopses on second attempt to suspend 
> > > ether to ram or disk
> > >   in pci_restore_msi_state which is called indirectly by 
> > > ahci_pci_device_resume, I will investigate this soon.
> > 
> > Thanks.  We've had such reports earlier, but I think the problem is still 
> > unresolved.  Any
> > additional information will be valuable.
> 
> I will do my best,
> Also I want to note that the above problem is 100% repeatable, and happens 
> independently whenever suspend to disk
> or suspend to ram was used in first successful try ( or at least, I got 
> back-trace using kdb, after suspend to disk, after
> suspend to ram system hang, so I assume, that this it is same problem , 
> because it didn't hang of first try)

Thanks for the information.

BTW, what's the last kernel you have tested?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Nigel Cunningham

Hi.

On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote:
> > Do you know exactly which mutex was being waited on and where it was
> > taken? If you can say that, it would be much more helpful.

Yeah, me too, but assuming too much sometimes bites me :)

> I think this is the XFS problem with freezable workqueues.
> 
> Maxim, please try to apply the appended patch and see if it helps.

Thanks for your subsequent messages, Maxim. Could you confirm for us
that the patch Rafael attached fixes it?

Regards,

Nigel

> ---
> Since freezable workqueues are broken in 2.6.21-rc
> (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
> http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
> it's better to remove them altogether for 2.6.21 and change the only user of
> them (XFS) accordingly.
> 
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> ===
> --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
>   if (!xfs_buf_zone)
>   goto out_free_trace_buf;
>  
> - xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
> + xfslogd_workqueue = create_workqueue("xfslogd");
>   if (!xfslogd_workqueue)
>   goto out_free_buf_zone;
>  
> - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
> + xfsdatad_workqueue = create_workqueue("xfsdatad");
>   if (!xfsdatad_workqueue)
>   goto out_destroy_xfslogd_workqueue;
>  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:39:24 Maxim wrote:
> On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> > On Thursday, 22 March 2007 00:09, Maxim wrote:
> > > On Thursday 22 March 2007 00:39:02 you wrote:
> > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > > Hi!
> > > > > 
> > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
> > > > > > anymore on my system.
> > > > > > 
> > > > > > I did a git-bisect and found that those commits break it:
> > > > > > 
> > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
> > > > > > Change code ordering in main.c
> > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
> > > > > > Change code ordering in disk.c
> > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
> > > > > > Change code ordering in user.c
> > > > > > 
> > > > > 
> > > > > (Yep, it was in my "to analyze" queue).
> > > > > 
> > > > > > I already reported about it, but now i know the reason why suspend 
> > > > > > breaks.
> > > > > > 
> > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep 
> > > > > > until now, 
> > > > > > and it did work because those functions could be called only in 
> > > > > > process context
> > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
> > > > > > idle thread  that does smp_init()).
> > > > > > 
> > > > > > But now they are called _after_ all tasks were suspended, so if 
> > > > > > cpu_down tries for example to take a lock
> > > > > > that is taken by different process, it can't since the different 
> > > > > > proccess is frozen and can't release the lock.
> > > > > > 
> > > > > 
> > > > > Thanks for detailed explanation.
> > > > > 
> > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > > 
> > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > > holds a lock, that's a bug.
> > > > > 
> > > > > > Or, it is also possible to revert this change.
> > > > > 
> > > > > Are you using xfs?
> > > > 
> > > > Well, this is the only case that can trigger it.  There are no other 
> > > > freezable
> > > > workqueues.
> > > > 
> > > > Greetings,
> > > > Rafael
> > > > 
> > > 
> > > Hello,
> > > 
> > >   Yes, you are right and it is XFS
> > > 
> > >   System suspends and resumes with xfs and your patch correctly,
> > 
> > Could you please sent this information to the list?  I'd like it to reach 
> > all
> > of the CCed parites. ;-)
> 
> I did now ( sorry I just keep using this Answer command, instead of Answer to 
> everybody)
> I didn't intend to send private email.
> > 
> > >   Of course I need to mention that I had to unload microcode update 
> > > driver because it prevented resume,
> > >   because it calls firmware loader helper, and again sleeps on lock
> > 
> > This is interesting.  Did it happen before or is it a regression?
> 
> It is from the same group of bugs , I mean hang because cpu_up/down is called 
> with frozen tasks
> Of course it didn't happen before those reordering commits were introduced
> > 
> > >   And also I noticed now that system oopses on second attempt to suspend 
> > > ether to ram or disk
> > >   in pci_restore_msi_state which is called indirectly by 
> > > ahci_pci_device_resume, I will investigate this soon.
> > 
> > Thanks.  We've had such reports earlier, but I think the problem is still 
> > unresolved.  Any
> > additional information will be valuable.
> 
> I will do my best,
> Also I want to note that the above problem is 100% repeatable, and happens 
> independently whenever suspend to disk
> or suspend to ram was used in first successful try ( or at least, I got 
> back-trace using kdb, after suspend to disk, after suspend to ram system 
> hang, 
> so I assume, that this it is same problem , because it didn't hang of first 
> try)
> 
> > 
> > Greetings,
> > Rafael
> > 
> 
> 
> 
> 
> 
 
And I forgot signature too (what happens with me today,...  please sorry)

Regards
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
> On Thursday, 22 March 2007 00:09, Maxim wrote:
> > On Thursday 22 March 2007 00:39:02 you wrote:
> > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > > Hi!
> > > > 
> > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore 
> > > > > on my system.
> > > > > 
> > > > > I did a git-bisect and found that those commits break it:
> > > > > 
> > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
> > > > > code ordering in main.c
> > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
> > > > > Change code ordering in disk.c
> > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
> > > > > Change code ordering in user.c
> > > > > 
> > > > 
> > > > (Yep, it was in my "to analyze" queue).
> > > > 
> > > > > I already reported about it, but now i know the reason why suspend 
> > > > > breaks.
> > > > > 
> > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until 
> > > > > now, 
> > > > > and it did work because those functions could be called only in 
> > > > > process context
> > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
> > > > > thread  that does smp_init()).
> > > > > 
> > > > > But now they are called _after_ all tasks were suspended, so if 
> > > > > cpu_down tries for example to take a lock
> > > > > that is taken by different process, it can't since the different 
> > > > > proccess is frozen and can't release the lock.
> > > > > 
> > > > 
> > > > Thanks for detailed explanation.
> > > > 
> > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > > 
> > > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > > holds a lock, that's a bug.
> > > > 
> > > > > Or, it is also possible to revert this change.
> > > > 
> > > > Are you using xfs?
> > > 
> > > Well, this is the only case that can trigger it.  There are no other 
> > > freezable
> > > workqueues.
> > > 
> > > Greetings,
> > > Rafael
> > > 
> > 
> > Hello,
> > 
> > Yes, you are right and it is XFS
> > 
> > System suspends and resumes with xfs and your patch correctly,
> 
> Could you please sent this information to the list?  I'd like it to reach all
> of the CCed parites. ;-)

I did now ( sorry I just keep using this Answer command, instead of Answer to 
everybody)
I didn't intend to send private email.
> 
> > Of course I need to mention that I had to unload microcode update 
> > driver because it prevented resume,
> > because it calls firmware loader helper, and again sleeps on lock
> 
> This is interesting.  Did it happen before or is it a regression?

It is from the same group of bugs , I mean hang because cpu_up/down is called 
with frozen tasks
Of course it didn't happen before those reordering commits were introduced
> 
> > And also I noticed now that system oopses on second attempt to suspend 
> > ether to ram or disk
> > in pci_restore_msi_state which is called indirectly by 
> > ahci_pci_device_resume, I will investigate this soon.
> 
> Thanks.  We've had such reports earlier, but I think the problem is still 
> unresolved.  Any
> additional information will be valuable.

I will do my best,
Also I want to note that the above problem is 100% repeatable, and happens 
independently whenever suspend to disk
or suspend to ram was used in first successful try ( or at least, I got 
back-trace using kdb, after suspend to disk, after suspend to ram system hang, 
so I assume, that this it is same problem , because it didn't hang of first try)

> 
> Greetings,
> Rafael
> 




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:09:54 Maxim wrote:
> On Thursday 22 March 2007 00:39:02 you wrote:
> > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore 
> > > > on my system.
> > > > 
> > > > I did a git-bisect and found that those commits break it:
> > > > 
> > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
> > > > code ordering in main.c
> > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
> > > > Change code ordering in disk.c
> > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
> > > > Change code ordering in user.c
> > > > 
> > > 
> > > (Yep, it was in my "to analyze" queue).
> > > 
> > > > I already reported about it, but now i know the reason why suspend 
> > > > breaks.
> > > > 
> > > > The problem is that both cpu_up/cpu_down were allowed to sleep until 
> > > > now, 
> > > > and it did work because those functions could be called only in process 
> > > > context
> > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
> > > > thread  that does smp_init()).
> > > > 
> > > > But now they are called _after_ all tasks were suspended, so if 
> > > > cpu_down tries for example to take a lock
> > > > that is taken by different process, it can't since the different 
> > > > proccess is frozen and can't release the lock.
> > > > 
> > > 
> > > Thanks for detailed explanation.
> > > 
> > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> > > 
> > > ...by design, "frozen" tasks must not hold any locks. If frozen task
> > > holds a lock, that's a bug.
> > > 
> > > > Or, it is also possible to revert this change.
> > > 
> > > Are you using xfs?
> > 
> > Well, this is the only case that can trigger it.  There are no other 
> > freezable
> > workqueues.
> > 
> > Greetings,
> > Rafael
> > 
> 
> Hello,
> 
>   Yes, you are right and it is XFS
> 
>   System suspends and resumes with xfs and your patch correctly,
> 
>   Of course I need to mention that I had to unload microcode update 
> driver because it prevented resume,
>   because it calls firmware loader helper, and again sleeps on lock
> 
>   And also I noticed now that system oopses on second attempt to suspend 
> ether to ram or disk
>   in pci_restore_msi_state which is called indirectly by 
> ahci_pci_device_resume, I will investigate this soon.
> 
>   Thanks a lot,
>   Regards,
>   Maxim Levitsky
> 
> 

And I forgot to cc here too, I didn't intend to send private email.

Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:14:05 Maxim wrote:
> On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote:
> > Hi.
> > 
> > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> > > Hi,
> > > 
> > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on 
> > > my system.
> > > 
> > > I did a git-bisect and found that those commits break it:
> > > 
> > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
> > > code ordering in main.c
> > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
> > > code ordering in disk.c
> > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
> > > code ordering in user.c
> > > 
> > > I already reported about it, but now i know the reason why suspend breaks.
> > > 
> > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > > and it did work because those functions could be called only in process 
> > > context
> > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
> > > thread  that does smp_init()).
> > > 
> > > But now they are called _after_ all tasks were suspended, so if cpu_down 
> > > tries for example to take a lock
> > > that is taken by different process, it can't since the different proccess 
> > > is frozen and can't release the lock.
> > > 
> > > I tested this and all results are positive:
> > > 
> > > I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> > > Suspend to disk went correctly, but it hang on resume, and I know why.
> > > It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> > > 
> > > I was able using kdb to confirm that this is true because it was still 
> > > possible to enter kdb, and see that
> > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
> > > workqueue_cpu_callback.
> > > 
> > > The solution for this problem seems to be ether complete audit of code 
> > > that uses register_cpu_notifier,
> > > to ensure that it doesn't sleep. 
> > > Also documentation should be changed to note about it.
> > > 
> > > Or, it is also possible to revert this change.
> > 
> > Do you know exactly which mutex was being waited on and where it was
> > taken? If you can say that, it would be much more helpful.
> > 
> > Regards,
> > 
> > Nigel
> > 
> > 
> 
> Hello,
> 
>   It is workqueue_mutex
>   and it is taken in kernel/workqueue.c:797
> 
>   this is guilt of freezable work queues , and XFS uses it (and I use XFS)
> 
>   Thanks to Rafael J. Wysocki for pointing it out to me.
> 
>   Regards,
>   Maxim Levitsky
> 

oops, Forgot to cc

Regards,
Maxim Levitsky

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Rafael J. Wysocki

On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
> Hi!
> 
> > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
> > system.
> > 
> > I did a git-bisect and found that those commits break it:
> > 
> > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
> > ordering in main.c
> > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
> > code ordering in disk.c
> > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
> > code ordering in user.c
> > 
> 
> (Yep, it was in my "to analyze" queue).
> 
> > I already reported about it, but now i know the reason why suspend breaks.
> > 
> > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > and it did work because those functions could be called only in process 
> > context
> > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
> > thread  that does smp_init()).
> > 
> > But now they are called _after_ all tasks were suspended, so if cpu_down 
> > tries for example to take a lock
> > that is taken by different process, it can't since the different proccess 
> > is frozen and can't release the lock.
> > 
> 
> Thanks for detailed explanation.
> 
> ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
> 
> ...by design, "frozen" tasks must not hold any locks. If frozen task
> holds a lock, that's a bug.
> 
> > Or, it is also possible to revert this change.
> 
> Are you using xfs?

Well, this is the only case that can trigger it.  There are no other freezable
workqueues.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Pavel Machek

Hi!

> Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
> system.
> 
> I did a git-bisect and found that those commits break it:
> 
> e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
> ordering in main.c
> ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
> code ordering in disk.c
> 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
> code ordering in user.c
> 

(Yep, it was in my "to analyze" queue).

> I already reported about it, but now i know the reason why suspend breaks.
> 
> The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> and it did work because those functions could be called only in process 
> context
> (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  
> that does smp_init()).
> 
> But now they are called _after_ all tasks were suspended, so if cpu_down 
> tries for example to take a lock
> that is taken by different process, it can't since the different proccess is 
> frozen and can't release the lock.
> 

Thanks for detailed explanation.

...but, on my machine suspend works ok in -rc4. I'm not seeing this.

...by design, "frozen" tasks must not hold any locks. If frozen task
holds a lock, that's a bug.

> Or, it is also possible to revert this change.

Are you using xfs?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Rafael J. Wysocki

On Wednesday, 21 March 2007 22:22, Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> > Hi,
> > 
> > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
> > system.
> > 
> > I did a git-bisect and found that those commits break it:
> > 
> > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
> > ordering in main.c
> > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
> > code ordering in disk.c
> > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
> > code ordering in user.c
> > 
> > I already reported about it, but now i know the reason why suspend breaks.
> > 
> > The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> > and it did work because those functions could be called only in process 
> > context
> > (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
> > thread  that does smp_init()).
> > 
> > But now they are called _after_ all tasks were suspended, so if cpu_down 
> > tries for example to take a lock
> > that is taken by different process, it can't since the different proccess 
> > is frozen and can't release the lock.
> > 
> > I tested this and all results are positive:
> > 
> > I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> > Suspend to disk went correctly, but it hang on resume, and I know why.
> > It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> > 
> > I was able using kdb to confirm that this is true because it was still 
> > possible to enter kdb, and see that
> > idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
> > workqueue_cpu_callback.
> > 
> > The solution for this problem seems to be ether complete audit of code that 
> > uses register_cpu_notifier,
> > to ensure that it doesn't sleep. 
> > Also documentation should be changed to note about it.
> > 
> > Or, it is also possible to revert this change.
> 
> Do you know exactly which mutex was being waited on and where it was
> taken? If you can say that, it would be much more helpful.

I think this is the XFS problem with freezable workqueues.

Maxim, please try to apply the appended patch and see if it helps.

Greetings,
Rafael


---
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2)
it's better to remove them altogether for 2.6.21 and change the only user of
them (XFS) accordingly.

---
 fs/xfs/linux-2.6/xfs_buf.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
@@ -1829,11 +1829,11 @@ xfs_buf_init(void)
if (!xfs_buf_zone)
goto out_free_trace_buf;
 
-   xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
+   xfslogd_workqueue = create_workqueue("xfslogd");
if (!xfslogd_workqueue)
goto out_free_buf_zone;
 
-   xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
+   xfsdatad_workqueue = create_workqueue("xfsdatad");
if (!xfsdatad_workqueue)
goto out_destroy_xfslogd_workqueue;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Nigel Cunningham

Hi.

On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
> Hi,
> 
> Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
> system.
> 
> I did a git-bisect and found that those commits break it:
> 
> e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
> ordering in main.c
> ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
> code ordering in disk.c
> 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
> code ordering in user.c
> 
> I already reported about it, but now i know the reason why suspend breaks.
> 
> The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
> and it did work because those functions could be called only in process 
> context
> (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  
> that does smp_init()).
> 
> But now they are called _after_ all tasks were suspended, so if cpu_down 
> tries for example to take a lock
> that is taken by different process, it can't since the different proccess is 
> frozen and can't release the lock.
> 
> I tested this and all results are positive:
> 
> I disabled 2nd cpu by hand, and then suspend to ram was successfull.
> Suspend to disk went correctly, but it hang on resume, and I know why.
> It hang in old kernel trying to disable 2nd cpu that was enabled by it.
> 
> I was able using kdb to confirm that this is true because it was still 
> possible to enter kdb, and see that
> idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
> workqueue_cpu_callback.
> 
> The solution for this problem seems to be ether complete audit of code that 
> uses register_cpu_notifier,
> to ensure that it doesn't sleep. 
> Also documentation should be changed to note about it.
> 
> Or, it is also possible to revert this change.

Do you know exactly which mutex was being waited on and where it was
taken? If you can say that, it would be much more helpful.

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Nigel Cunningham

Hi.

On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
 Hi,
 
 Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
 system.
 
 I did a git-bisect and found that those commits break it:
 
 e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
 ordering in main.c
 ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
 code ordering in disk.c
 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
 code ordering in user.c
 
 I already reported about it, but now i know the reason why suspend breaks.
 
 The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
 and it did work because those functions could be called only in process 
 context
 (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  
 that does smp_init()).
 
 But now they are called _after_ all tasks were suspended, so if cpu_down 
 tries for example to take a lock
 that is taken by different process, it can't since the different proccess is 
 frozen and can't release the lock.
 
 I tested this and all results are positive:
 
 I disabled 2nd cpu by hand, and then suspend to ram was successfull.
 Suspend to disk went correctly, but it hang on resume, and I know why.
 It hang in old kernel trying to disable 2nd cpu that was enabled by it.
 
 I was able using kdb to confirm that this is true because it was still 
 possible to enter kdb, and see that
 idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
 workqueue_cpu_callback.
 
 The solution for this problem seems to be ether complete audit of code that 
 uses register_cpu_notifier,
 to ensure that it doesn't sleep. 
 Also documentation should be changed to note about it.
 
 Or, it is also possible to revert this change.

Do you know exactly which mutex was being waited on and where it was
taken? If you can say that, it would be much more helpful.

Regards,

Nigel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Rafael J. Wysocki

On Wednesday, 21 March 2007 22:22, Nigel Cunningham wrote:
 Hi.
 
 On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
  Hi,
  
  Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
  system.
  
  I did a git-bisect and found that those commits break it:
  
  e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
  ordering in main.c
  ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
  code ordering in disk.c
  259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
  code ordering in user.c
  
  I already reported about it, but now i know the reason why suspend breaks.
  
  The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
  and it did work because those functions could be called only in process 
  context
  (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
  thread  that does smp_init()).
  
  But now they are called _after_ all tasks were suspended, so if cpu_down 
  tries for example to take a lock
  that is taken by different process, it can't since the different proccess 
  is frozen and can't release the lock.
  
  I tested this and all results are positive:
  
  I disabled 2nd cpu by hand, and then suspend to ram was successfull.
  Suspend to disk went correctly, but it hang on resume, and I know why.
  It hang in old kernel trying to disable 2nd cpu that was enabled by it.
  
  I was able using kdb to confirm that this is true because it was still 
  possible to enter kdb, and see that
  idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
  workqueue_cpu_callback.
  
  The solution for this problem seems to be ether complete audit of code that 
  uses register_cpu_notifier,
  to ensure that it doesn't sleep. 
  Also documentation should be changed to note about it.
  
  Or, it is also possible to revert this change.
 
 Do you know exactly which mutex was being waited on and where it was
 taken? If you can say that, it would be much more helpful.

I think this is the XFS problem with freezable workqueues.

Maxim, please try to apply the appended patch and see if it helps.

Greetings,
Rafael


---
Since freezable workqueues are broken in 2.6.21-rc
(cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
it's better to remove them altogether for 2.6.21 and change the only user of
them (XFS) accordingly.

---
 fs/xfs/linux-2.6/xfs_buf.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
@@ -1829,11 +1829,11 @@ xfs_buf_init(void)
if (!xfs_buf_zone)
goto out_free_trace_buf;
 
-   xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
+   xfslogd_workqueue = create_workqueue(xfslogd);
if (!xfslogd_workqueue)
goto out_free_buf_zone;
 
-   xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
+   xfsdatad_workqueue = create_workqueue(xfsdatad);
if (!xfsdatad_workqueue)
goto out_destroy_xfslogd_workqueue;
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Pavel Machek

Hi!

 Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
 system.
 
 I did a git-bisect and found that those commits break it:
 
 e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
 ordering in main.c
 ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
 code ordering in disk.c
 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
 code ordering in user.c
 

(Yep, it was in my to analyze queue).

 I already reported about it, but now i know the reason why suspend breaks.
 
 The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
 and it did work because those functions could be called only in process 
 context
 (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle thread  
 that does smp_init()).
 
 But now they are called _after_ all tasks were suspended, so if cpu_down 
 tries for example to take a lock
 that is taken by different process, it can't since the different proccess is 
 frozen and can't release the lock.
 

Thanks for detailed explanation.

...but, on my machine suspend works ok in -rc4. I'm not seeing this.

...by design, frozen tasks must not hold any locks. If frozen task
holds a lock, that's a bug.

 Or, it is also possible to revert this change.

Are you using xfs?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Rafael J. Wysocki

On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
 Hi!
 
  Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my 
  system.
  
  I did a git-bisect and found that those commits break it:
  
  e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code 
  ordering in main.c
  ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
  code ordering in disk.c
  259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
  code ordering in user.c
  
 
 (Yep, it was in my to analyze queue).
 
  I already reported about it, but now i know the reason why suspend breaks.
  
  The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
  and it did work because those functions could be called only in process 
  context
  (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
  thread  that does smp_init()).
  
  But now they are called _after_ all tasks were suspended, so if cpu_down 
  tries for example to take a lock
  that is taken by different process, it can't since the different proccess 
  is frozen and can't release the lock.
  
 
 Thanks for detailed explanation.
 
 ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
 
 ...by design, frozen tasks must not hold any locks. If frozen task
 holds a lock, that's a bug.
 
  Or, it is also possible to revert this change.
 
 Are you using xfs?

Well, this is the only case that can trigger it.  There are no other freezable
workqueues.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:14:05 Maxim wrote:
 On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote:
  Hi.
  
  On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
   Hi,
   
   Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on 
   my system.
   
   I did a git-bisect and found that those commits break it:
   
   e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
   code ordering in main.c
   ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
   code ordering in disk.c
   259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
   code ordering in user.c
   
   I already reported about it, but now i know the reason why suspend breaks.
   
   The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
   and it did work because those functions could be called only in process 
   context
   (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
   thread  that does smp_init()).
   
   But now they are called _after_ all tasks were suspended, so if cpu_down 
   tries for example to take a lock
   that is taken by different process, it can't since the different proccess 
   is frozen and can't release the lock.
   
   I tested this and all results are positive:
   
   I disabled 2nd cpu by hand, and then suspend to ram was successfull.
   Suspend to disk went correctly, but it hang on resume, and I know why.
   It hang in old kernel trying to disable 2nd cpu that was enabled by it.
   
   I was able using kdb to confirm that this is true because it was still 
   possible to enter kdb, and see that
   idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
   workqueue_cpu_callback.
   
   The solution for this problem seems to be ether complete audit of code 
   that uses register_cpu_notifier,
   to ensure that it doesn't sleep. 
   Also documentation should be changed to note about it.
   
   Or, it is also possible to revert this change.
  
  Do you know exactly which mutex was being waited on and where it was
  taken? If you can say that, it would be much more helpful.
  
  Regards,
  
  Nigel
  
  
 
 Hello,
 
   It is workqueue_mutex
   and it is taken in kernel/workqueue.c:797
 
   this is guilt of freezable work queues , and XFS uses it (and I use XFS)
 
   Thanks to Rafael J. Wysocki for pointing it out to me.
 
   Regards,
   Maxim Levitsky
 

oops, Forgot to cc

Regards,
Maxim Levitsky

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:09:54 Maxim wrote:
 On Thursday 22 March 2007 00:39:02 you wrote:
  On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
   Hi!
   
Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore 
on my system.

I did a git-bisect and found that those commits break it:

e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
code ordering in main.c
ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
Change code ordering in disk.c
259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
Change code ordering in user.c

   
   (Yep, it was in my to analyze queue).
   
I already reported about it, but now i know the reason why suspend 
breaks.

The problem is that both cpu_up/cpu_down were allowed to sleep until 
now, 
and it did work because those functions could be called only in process 
context
(the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
thread  that does smp_init()).

But now they are called _after_ all tasks were suspended, so if 
cpu_down tries for example to take a lock
that is taken by different process, it can't since the different 
proccess is frozen and can't release the lock.

   
   Thanks for detailed explanation.
   
   ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
   
   ...by design, frozen tasks must not hold any locks. If frozen task
   holds a lock, that's a bug.
   
Or, it is also possible to revert this change.
   
   Are you using xfs?
  
  Well, this is the only case that can trigger it.  There are no other 
  freezable
  workqueues.
  
  Greetings,
  Rafael
  
 
 Hello,
 
   Yes, you are right and it is XFS
 
   System suspends and resumes with xfs and your patch correctly,
 
   Of course I need to mention that I had to unload microcode update 
 driver because it prevented resume,
   because it calls firmware loader helper, and again sleeps on lock
 
   And also I noticed now that system oopses on second attempt to suspend 
 ether to ram or disk
   in pci_restore_msi_state which is called indirectly by 
 ahci_pci_device_resume, I will investigate this soon.
 
   Thanks a lot,
   Regards,
   Maxim Levitsky
 
 

And I forgot to cc here too, I didn't intend to send private email.

Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:09, Maxim wrote:
  On Thursday 22 March 2007 00:39:02 you wrote:
   On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
Hi!

 Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore 
 on my system.
 
 I did a git-bisect and found that those commits break it:
 
 e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
 code ordering in main.c
 ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
 Change code ordering in disk.c
 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
 Change code ordering in user.c
 

(Yep, it was in my to analyze queue).

 I already reported about it, but now i know the reason why suspend 
 breaks.
 
 The problem is that both cpu_up/cpu_down were allowed to sleep until 
 now, 
 and it did work because those functions could be called only in 
 process context
 (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
 thread  that does smp_init()).
 
 But now they are called _after_ all tasks were suspended, so if 
 cpu_down tries for example to take a lock
 that is taken by different process, it can't since the different 
 proccess is frozen and can't release the lock.
 

Thanks for detailed explanation.

...but, on my machine suspend works ok in -rc4. I'm not seeing this.

...by design, frozen tasks must not hold any locks. If frozen task
holds a lock, that's a bug.

 Or, it is also possible to revert this change.

Are you using xfs?
   
   Well, this is the only case that can trigger it.  There are no other 
   freezable
   workqueues.
   
   Greetings,
   Rafael
   
  
  Hello,
  
  Yes, you are right and it is XFS
  
  System suspends and resumes with xfs and your patch correctly,
 
 Could you please sent this information to the list?  I'd like it to reach all
 of the CCed parites. ;-)

I did now ( sorry I just keep using this Answer command, instead of Answer to 
everybody)
I didn't intend to send private email.
 
  Of course I need to mention that I had to unload microcode update 
  driver because it prevented resume,
  because it calls firmware loader helper, and again sleeps on lock
 
 This is interesting.  Did it happen before or is it a regression?

It is from the same group of bugs , I mean hang because cpu_up/down is called 
with frozen tasks
Of course it didn't happen before those reordering commits were introduced
 
  And also I noticed now that system oopses on second attempt to suspend 
  ether to ram or disk
  in pci_restore_msi_state which is called indirectly by 
  ahci_pci_device_resume, I will investigate this soon.
 
 Thanks.  We've had such reports earlier, but I think the problem is still 
 unresolved.  Any
 additional information will be valuable.

I will do my best,
Also I want to note that the above problem is 100% repeatable, and happens 
independently whenever suspend to disk
or suspend to ram was used in first successful try ( or at least, I got 
back-trace using kdb, after suspend to disk, after suspend to ram system hang, 
so I assume, that this it is same problem , because it didn't hang of first try)

 
 Greetings,
 Rafael
 




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:39:24 Maxim wrote:
 On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
  On Thursday, 22 March 2007 00:09, Maxim wrote:
   On Thursday 22 March 2007 00:39:02 you wrote:
On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
 Hi!
 
  Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
  anymore on my system.
  
  I did a git-bisect and found that those commits break it:
  
  e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
  Change code ordering in main.c
  ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
  Change code ordering in disk.c
  259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
  Change code ordering in user.c
  
 
 (Yep, it was in my to analyze queue).
 
  I already reported about it, but now i know the reason why suspend 
  breaks.
  
  The problem is that both cpu_up/cpu_down were allowed to sleep 
  until now, 
  and it did work because those functions could be called only in 
  process context
  (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
  idle thread  that does smp_init()).
  
  But now they are called _after_ all tasks were suspended, so if 
  cpu_down tries for example to take a lock
  that is taken by different process, it can't since the different 
  proccess is frozen and can't release the lock.
  
 
 Thanks for detailed explanation.
 
 ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
 
 ...by design, frozen tasks must not hold any locks. If frozen task
 holds a lock, that's a bug.
 
  Or, it is also possible to revert this change.
 
 Are you using xfs?

Well, this is the only case that can trigger it.  There are no other 
freezable
workqueues.

Greetings,
Rafael

   
   Hello,
   
 Yes, you are right and it is XFS
   
 System suspends and resumes with xfs and your patch correctly,
  
  Could you please sent this information to the list?  I'd like it to reach 
  all
  of the CCed parites. ;-)
 
 I did now ( sorry I just keep using this Answer command, instead of Answer to 
 everybody)
 I didn't intend to send private email.
  
 Of course I need to mention that I had to unload microcode update 
   driver because it prevented resume,
 because it calls firmware loader helper, and again sleeps on lock
  
  This is interesting.  Did it happen before or is it a regression?
 
 It is from the same group of bugs , I mean hang because cpu_up/down is called 
 with frozen tasks
 Of course it didn't happen before those reordering commits were introduced
  
 And also I noticed now that system oopses on second attempt to suspend 
   ether to ram or disk
 in pci_restore_msi_state which is called indirectly by 
   ahci_pci_device_resume, I will investigate this soon.
  
  Thanks.  We've had such reports earlier, but I think the problem is still 
  unresolved.  Any
  additional information will be valuable.
 
 I will do my best,
 Also I want to note that the above problem is 100% repeatable, and happens 
 independently whenever suspend to disk
 or suspend to ram was used in first successful try ( or at least, I got 
 back-trace using kdb, after suspend to disk, after suspend to ram system 
 hang, 
 so I assume, that this it is same problem , because it didn't hang of first 
 try)
 
  
  Greetings,
  Rafael
  
 
 
 
 
 
 
And I forgot signature too (what happens with me today,...  please sorry)

Regards
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Nigel Cunningham

Hi.

On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote:
  Do you know exactly which mutex was being waited on and where it was
  taken? If you can say that, it would be much more helpful.

Yeah, me too, but assuming too much sometimes bites me :)

 I think this is the XFS problem with freezable workqueues.
 
 Maxim, please try to apply the appended patch and see if it helps.

Thanks for your subsequent messages, Maxim. Could you confirm for us
that the patch Rafael attached fixes it?

Regards,

Nigel

 ---
 Since freezable workqueues are broken in 2.6.21-rc
 (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
 http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
 it's better to remove them altogether for 2.6.21 and change the only user of
 them (XFS) accordingly.
 
 ---
  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
 ===
 --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
 +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
 @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
   if (!xfs_buf_zone)
   goto out_free_trace_buf;
  
 - xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
 + xfslogd_workqueue = create_workqueue(xfslogd);
   if (!xfslogd_workqueue)
   goto out_free_buf_zone;
  
 - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
 + xfsdatad_workqueue = create_workqueue(xfsdatad);
   if (!xfsdatad_workqueue)
   goto out_destroy_xfslogd_workqueue;
  

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Rafael J. Wysocki

On Thursday, 22 March 2007 00:39, Maxim wrote:
 On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
  On Thursday, 22 March 2007 00:09, Maxim wrote:
   On Thursday 22 March 2007 00:39:02 you wrote:
On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
 Hi!
 
  Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
  anymore on my system.
  
  I did a git-bisect and found that those commits break it:
  
  e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
  Change code ordering in main.c
  ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: 
  Change code ordering in disk.c
  259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: 
  Change code ordering in user.c
  
 
 (Yep, it was in my to analyze queue).
 
  I already reported about it, but now i know the reason why suspend 
  breaks.
  
  The problem is that both cpu_up/cpu_down were allowed to sleep 
  until now, 
  and it did work because those functions could be called only in 
  process context
  (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
  idle thread  that does smp_init()).
  
  But now they are called _after_ all tasks were suspended, so if 
  cpu_down tries for example to take a lock
  that is taken by different process, it can't since the different 
  proccess is frozen and can't release the lock.
  
 
 Thanks for detailed explanation.
 
 ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
 
 ...by design, frozen tasks must not hold any locks. If frozen task
 holds a lock, that's a bug.
 
  Or, it is also possible to revert this change.
 
 Are you using xfs?

Well, this is the only case that can trigger it.  There are no other 
freezable
workqueues.

Greetings,
Rafael

   
   Hello,
   
 Yes, you are right and it is XFS
   
 System suspends and resumes with xfs and your patch correctly,
  
  Could you please sent this information to the list?  I'd like it to reach 
  all
  of the CCed parites. ;-)
 
 I did now ( sorry I just keep using this Answer command, instead of Answer to 
 everybody)
 I didn't intend to send private email.
  
 Of course I need to mention that I had to unload microcode update 
   driver because it prevented resume,
 because it calls firmware loader helper, and again sleeps on lock
  
  This is interesting.  Did it happen before or is it a regression?
 
 It is from the same group of bugs , I mean hang because cpu_up/down is called 
 with frozen tasks
 Of course it didn't happen before those reordering commits were introduced

Well, we want cpu_up/down to be called after processes have been frozen, for
various reasons (one of them being that applications shouldn't see us playing
with the CPUs).

Thanks for reporting this, I'll have a look at the microcode update driver.

 And also I noticed now that system oopses on second attempt to suspend 
   ether to ram or disk
 in pci_restore_msi_state which is called indirectly by 
   ahci_pci_device_resume, I will investigate this soon.
  
  Thanks.  We've had such reports earlier, but I think the problem is still 
  unresolved.  Any
  additional information will be valuable.
 
 I will do my best,
 Also I want to note that the above problem is 100% repeatable, and happens 
 independently whenever suspend to disk
 or suspend to ram was used in first successful try ( or at least, I got 
 back-trace using kdb, after suspend to disk, after
 suspend to ram system hang, so I assume, that this it is same problem , 
 because it didn't hang of first try)

Thanks for the information.

BTW, what's the last kernel you have tested?

Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:53:54 Rafael J. Wysocki wrote:
 On Thursday, 22 March 2007 00:39, Maxim wrote:
  On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote:
   On Thursday, 22 March 2007 00:09, Maxim wrote:
On Thursday 22 March 2007 00:39:02 you wrote:
 On Wednesday, 21 March 2007 23:21, Pavel Machek wrote:
  Hi!
  
   Starting with 2.6.21-rc1 suspend to ram and disk doesn't work 
   anymore on my system.
   
   I did a git-bisect and found that those commits break it:
   
   e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: 
   Change code ordering in main.c
   ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] 
   swsusp: Change code ordering in disk.c
   259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] 
   swsusp: Change code ordering in user.c
   
  
  (Yep, it was in my to analyze queue).
  
   I already reported about it, but now i know the reason why 
   suspend breaks.
   
   The problem is that both cpu_up/cpu_down were allowed to sleep 
   until now, 
   and it did work because those functions could be called only in 
   process context
   (the one that writes to /sys/devices/system/cpu/cpu*/online) or  
   idle thread  that does smp_init()).
   
   But now they are called _after_ all tasks were suspended, so if 
   cpu_down tries for example to take a lock
   that is taken by different process, it can't since the different 
   proccess is frozen and can't release the lock.
   
  
  Thanks for detailed explanation.
  
  ...but, on my machine suspend works ok in -rc4. I'm not seeing this.
  
  ...by design, frozen tasks must not hold any locks. If frozen task
  holds a lock, that's a bug.
  
   Or, it is also possible to revert this change.
  
  Are you using xfs?
 
 Well, this is the only case that can trigger it.  There are no other 
 freezable
 workqueues.
 
 Greetings,
 Rafael
 

Hello,

Yes, you are right and it is XFS

System suspends and resumes with xfs and your patch correctly,
   
   Could you please sent this information to the list?  I'd like it to reach 
   all
   of the CCed parites. ;-)
  
  I did now ( sorry I just keep using this Answer command, instead of Answer 
  to everybody)
  I didn't intend to send private email.
   
Of course I need to mention that I had to unload microcode 
update driver because it prevented resume,
because it calls firmware loader helper, and again sleeps on 
lock
   
   This is interesting.  Did it happen before or is it a regression?
  
  It is from the same group of bugs , I mean hang because cpu_up/down is 
  called with frozen tasks
  Of course it didn't happen before those reordering commits were introduced
 
 Well, we want cpu_up/down to be called after processes have been frozen, for
 various reasons (one of them being that applications shouldn't see us playing
 with the CPUs).
 
 Thanks for reporting this, I'll have a look at the microcode update driver.
 
And also I noticed now that system oopses on second attempt to 
suspend ether to ram or disk
in pci_restore_msi_state which is called indirectly by 
ahci_pci_device_resume, I will investigate this soon.
   
   Thanks.  We've had such reports earlier, but I think the problem is still 
   unresolved.  Any
   additional information will be valuable.
  
  I will do my best,
  Also I want to note that the above problem is 100% repeatable, and happens 
  independently whenever suspend to disk
  or suspend to ram was used in first successful try ( or at least, I got 
  back-trace using kdb, after suspend to disk, after
  suspend to ram system hang, so I assume, that this it is same problem , 
  because it didn't hang of first try)
 
 Thanks for the information.
 
 BTW, what's the last kernel you have tested?
 
 Rafael
 

Hello, 
Thanks for quick response, 
I will continue to test my system

I use literally latest Linus's git tree.

The kernel that works is 2.6.20 , and except very weird hang that happens 
sometimes (1 in 5~6 times) on resume from ram, everything works
I described it in http://lkml.org/lkml/2007/3/16/126

Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:47:05 Nigel Cunningham wrote:
 Hi.
 
 On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote:
   Do you know exactly which mutex was being waited on and where it was
   taken? If you can say that, it would be much more helpful.
 
 Yeah, me too, but assuming too much sometimes bites me :)
 
  I think this is the XFS problem with freezable workqueues.
  
  Maxim, please try to apply the appended patch and see if it helps.
 
 Thanks for your subsequent messages, Maxim. Could you confirm for us
 that the patch Rafael attached fixes it?
 
 Regards,
 
 Nigel
 
  ---
  Since freezable workqueues are broken in 2.6.21-rc
  (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
  http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
  it's better to remove them altogether for 2.6.21 and change the only user of
  them (XFS) accordingly.
  
  ---
   fs/xfs/linux-2.6/xfs_buf.c |4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
  ===
  --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
  +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
  @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
  if (!xfs_buf_zone)
  goto out_free_trace_buf;
   
  -   xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
  +   xfslogd_workqueue = create_workqueue(xfslogd);
  if (!xfslogd_workqueue)
  goto out_free_buf_zone;
   
  -   xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
  +   xfsdatad_workqueue = create_workqueue(xfsdatad);
  if (!xfsdatad_workqueue)
  goto out_destroy_xfslogd_workqueue;
   
 
 

Hello,

I can confirm now that the above patch work,

First as I said I did try to suspend with this patch and without XFS, and it 
did work,
Now I reverted it and now system still suspends correctly without xfs module 
loaded ( I didn't tell you that i use now ext3,
and that I generally compile everything in kernel, so i put XFS too, because I 
used it once, and I still have a XFS disk image)

But system hangs with XFS loaded, so this patch works.

Regards,
Maxim Levitsky


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread Maxim

On Thursday 22 March 2007 01:14:05 Maxim wrote:
 On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote:
  Hi.
  
  On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote:
   Hi,
   
   Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on 
   my system.
   
   I did a git-bisect and found that those commits break it:
   
   e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change 
   code ordering in main.c
   ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change 
   code ordering in disk.c
   259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change 
   code ordering in user.c
   
   I already reported about it, but now i know the reason why suspend breaks.
   
   The problem is that both cpu_up/cpu_down were allowed to sleep until now, 
   and it did work because those functions could be called only in process 
   context
   (the one that writes to /sys/devices/system/cpu/cpu*/online) or  idle 
   thread  that does smp_init()).
   
   But now they are called _after_ all tasks were suspended, so if cpu_down 
   tries for example to take a lock
   that is taken by different process, it can't since the different proccess 
   is frozen and can't release the lock.
   
   I tested this and all results are positive:
   
   I disabled 2nd cpu by hand, and then suspend to ram was successfull.
   Suspend to disk went correctly, but it hang on resume, and I know why.
   It hang in old kernel trying to disable 2nd cpu that was enabled by it.
   
   I was able using kdb to confirm that this is true because it was still 
   possible to enter kdb, and see that
   idle thread (swapper) was active, and uswsusp was waiting on mutex inside 
   workqueue_cpu_callback.
   
   The solution for this problem seems to be ether complete audit of code 
   that uses register_cpu_notifier,
   to ensure that it doesn't sleep. 
   Also documentation should be changed to note about it.
   
   Or, it is also possible to revert this change.
  
  Do you know exactly which mutex was being waited on and where it was
  taken? If you can say that, it would be much more helpful.
  
  Regards,
  
  Nigel
  
  
 
 Hello,
 
   It is workqueue_mutex
   and it is taken in kernel/workqueue.c:797
 
   this is guilt of freezable work queues , and XFS uses it (and I use XFS)
 
   Thanks to Rafael J. Wysocki for pointing it out to me.
 
   Regards,
   Maxim Levitsky
 


I think that I made a mistake,

I now reverted the patch that fixes the above error, and I wrote down 
back-trace from this hang,
and it appears that system hangs in kthread_stop:

This I have written in paper, using kdb:

workqueue_cpu_callback -
cleanup_workqueue_thread -
kthread_stop -

and then wake_up_process (I think , I didn't wrote this on paper, I will check 
again)


Regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems

2007-03-21 Thread David Chinner

On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote:
 
 I think this is the XFS problem with freezable workqueues.
 
 Maxim, please try to apply the appended patch and see if it helps.
 
 Greetings,
 Rafael
 
 
 ---
 Since freezable workqueues are broken in 2.6.21-rc
 (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755,
 http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2)
 it's better to remove them altogether for 2.6.21 and change the only user of
 them (XFS) accordingly.
 
 ---
  fs/xfs/linux-2.6/xfs_buf.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
 ===
 --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c
 +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c
 @@ -1829,11 +1829,11 @@ xfs_buf_init(void)
   if (!xfs_buf_zone)
   goto out_free_trace_buf;
  
 - xfslogd_workqueue = create_freezeable_workqueue(xfslogd);
 + xfslogd_workqueue = create_workqueue(xfslogd);
   if (!xfslogd_workqueue)
   goto out_free_buf_zone;
  
 - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad);
 + xfsdatad_workqueue = create_workqueue(xfsdatad);
   if (!xfsdatad_workqueue)
   goto out_destroy_xfslogd_workqueue;
  

Acked-by: Dave Chinner [EMAIL PROTECTED]

Rafael, it sounds like this really needs to go into the next -rc
kernel. Can you push it to Linus?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

46 matches

Mail list logo