Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Sunday, 25 March 2007 17:10, Maxim wrote: > On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote: > > On Sunday, 25 March 2007 01:40, Maxim wrote: > > > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: > > > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: > > > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: > > > > > > On Thursday, 22 March 2007 00:39, Maxim wrote: > > > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > Hi, > > > I confirm that the above patch works, > > > > > > At least system didn't hang on resume with microcode driver loaded, > > > > > > I can't really test whenever it did update microcode because I almost > > > sure there is nothing to update > > > (I use core 2 duo that I bought a month ago, and an intel motherboard > > > with latest bios ( updated yesterday) ) > > > I selected this driver just in case when I compiled kernel. > > > > OK, thanks for testing. > > > > > Hello, > I tested this patch, and it does work too, > System suspended /resumed correctly, no errors in kernel log Thanks for the confirmation. I wonder if someone who actually needs the microcode to be updated can test it ... > > - first_cpu = first_cpu(cpu_present_map); > > - if (!cpu_online(first_cpu)) { > > - error = _cpu_up(first_cpu); > > - if (error) { > > - printk(KERN_ERR "Could not bring CPU%d up.\n", > > - first_cpu); > > - goto out; > > - } > > - } > > - > > Nice, I once have seen those lines too, and they look ridiculous, but > could that be that they are still necessary on some systems, I think this was a mistake from the beginning. Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote: > On Sunday, 25 March 2007 01:40, Maxim wrote: > > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: > > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: > > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: > > > > > On Thursday, 22 March 2007 00:39, Maxim wrote: > > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > Hi, > > I confirm that the above patch works, > > > > At least system didn't hang on resume with microcode driver loaded, > > > > I can't really test whenever it did update microcode because I almost > > sure there is nothing to update > > (I use core 2 duo that I bought a month ago, and an intel motherboard > > with latest bios ( updated yesterday) ) > > I selected this driver just in case when I compiled kernel. > > OK, thanks for testing. > Hello, I tested this patch, and it does work too, System suspended /resumed correctly, no errors in kernel log > - first_cpu = first_cpu(cpu_present_map); > - if (!cpu_online(first_cpu)) { > - error = _cpu_up(first_cpu); > - if (error) { > - printk(KERN_ERR "Could not bring CPU%d up.\n", > - first_cpu); > - goto out; > - } > - } > - Nice, I once have seen those lines too, and they look ridiculous, but could that be that they are still necessary on some systems, Best regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Sunday, 25 March 2007 01:40, Maxim wrote: > On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: > > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: > > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: > > > > On Thursday, 22 March 2007 00:39, Maxim wrote: > > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > Hi, > I confirm that the above patch works, > > At least system didn't hang on resume with microcode driver loaded, > > I can't really test whenever it did update microcode because I almost > sure there is nothing to update > (I use core 2 duo that I bought a month ago, and an intel motherboard > with latest bios ( updated yesterday) ) > I selected this driver just in case when I compiled kernel. OK, thanks for testing. In the meantime I've prepared the more sophisticated version of the patch that is appended. Could you please check if it still works for you? Greetings, Rafael --- arch/i386/kernel/microcode.c | 71 --- include/linux/cpu.h |2 + kernel/cpu.c | 32 +-- 3 files changed, 85 insertions(+), 20 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c 2007-03-23 22:57:28.0 +0100 +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c 2007-03-23 23:43:27.0 +0100 @@ -567,6 +567,53 @@ static int cpu_request_microcode(int cpu return error; } +static int apply_microcode_on_cpu(int cpu) +{ + struct cpuinfo_x86 *c = cpu_data + cpu; + struct ucode_cpu_info *uci = ucode_cpu_info + cpu; + cpumask_t old; + unsigned int val[2]; + int err = 0; + + if (!uci->mc) + return -EINVAL; + + old = current->cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + + /* Check if the microcode we have in memory matches the CPU */ + if (c->x86_vendor != X86_VENDOR_INTEL || c->x86 < 6 || + cpu_has(c, X86_FEATURE_IA64) || uci->sig != cpuid_eax(0x0001)) + err = -EINVAL; + + if (!err && ((c->x86_model >= 5) || (c->x86 > 6))) { + /* get processor flags from MSR 0x17 */ + rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]); + if (uci->pf != (1 << ((val[1] >> 18) & 7))) + err = -EINVAL; + } + + if (!err) { + wrmsr(MSR_IA32_UCODE_REV, 0, 0); + /* see notes above for revision 1.07. Apparent chip bug */ + sync_core(); + /* get the current revision from MSR 0x8B */ + rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]); + if (uci->rev != val[1]) + err = -EINVAL; + } + + if (!err) + apply_microcode(cpu); + else + printk(KERN_ERR "microcode: Could not apply microcode to CPU%d:" + " sig=0x%x, pf=0x%x, rev=0x%x\n", + cpu, uci->sig, uci->pf, uci->rev); + + set_cpus_allowed(current, old); + return err; +} + static void microcode_init_cpu(int cpu) { cpumask_t old; @@ -577,7 +624,8 @@ static void microcode_init_cpu(int cpu) set_cpus_allowed(current, cpumask_of_cpu(cpu)); mutex_lock(_mutex); collect_cpu_info(cpu); - if (uci->valid && system_state == SYSTEM_RUNNING) + if (uci->valid && system_state == SYSTEM_RUNNING && + !suspend_cpu_hotplug) cpu_request_microcode(cpu); mutex_unlock(_mutex); set_cpus_allowed(current, old); @@ -663,13 +711,24 @@ static int mc_sysdev_add(struct sys_devi return 0; pr_debug("Microcode:CPU %d added\n", cpu); - memset(uci, 0, sizeof(*uci)); + /* If suspend_cpu_hotplug is set, the system is resuming and we should +* use the data from before the suspend. +*/ + if (suspend_cpu_hotplug) { + err = apply_microcode_on_cpu(cpu); + if (err) + microcode_fini_cpu(cpu); + } + if (!uci->valid) + memset(uci, 0, sizeof(*uci)); err = sysfs_create_group(_dev->kobj, _attr_group); if (err) return err; - microcode_init_cpu(cpu); + if (!uci->valid) + microcode_init_cpu(cpu); + return 0; } @@ -680,7 +739,11 @@ static int mc_sysdev_remove(struct sys_d if (!cpu_online(cpu)) return 0; pr_debug("Microcode:CPU %d removed\n", cpu); - microcode_fini_cpu(cpu); + /* If suspend_cpu_hotplug is set, the system is
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Sunday, 25 March 2007 01:40, Maxim wrote: On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi, I confirm that the above patch works, At least system didn't hang on resume with microcode driver loaded, I can't really test whenever it did update microcode because I almost sure there is nothing to update (I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) ) I selected this driver just in case when I compiled kernel. OK, thanks for testing. In the meantime I've prepared the more sophisticated version of the patch that is appended. Could you please check if it still works for you? Greetings, Rafael --- arch/i386/kernel/microcode.c | 71 --- include/linux/cpu.h |2 + kernel/cpu.c | 32 +-- 3 files changed, 85 insertions(+), 20 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c 2007-03-23 22:57:28.0 +0100 +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c 2007-03-23 23:43:27.0 +0100 @@ -567,6 +567,53 @@ static int cpu_request_microcode(int cpu return error; } +static int apply_microcode_on_cpu(int cpu) +{ + struct cpuinfo_x86 *c = cpu_data + cpu; + struct ucode_cpu_info *uci = ucode_cpu_info + cpu; + cpumask_t old; + unsigned int val[2]; + int err = 0; + + if (!uci-mc) + return -EINVAL; + + old = current-cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + + /* Check if the microcode we have in memory matches the CPU */ + if (c-x86_vendor != X86_VENDOR_INTEL || c-x86 6 || + cpu_has(c, X86_FEATURE_IA64) || uci-sig != cpuid_eax(0x0001)) + err = -EINVAL; + + if (!err ((c-x86_model = 5) || (c-x86 6))) { + /* get processor flags from MSR 0x17 */ + rdmsr(MSR_IA32_PLATFORM_ID, val[0], val[1]); + if (uci-pf != (1 ((val[1] 18) 7))) + err = -EINVAL; + } + + if (!err) { + wrmsr(MSR_IA32_UCODE_REV, 0, 0); + /* see notes above for revision 1.07. Apparent chip bug */ + sync_core(); + /* get the current revision from MSR 0x8B */ + rdmsr(MSR_IA32_UCODE_REV, val[0], val[1]); + if (uci-rev != val[1]) + err = -EINVAL; + } + + if (!err) + apply_microcode(cpu); + else + printk(KERN_ERR microcode: Could not apply microcode to CPU%d: +sig=0x%x, pf=0x%x, rev=0x%x\n, + cpu, uci-sig, uci-pf, uci-rev); + + set_cpus_allowed(current, old); + return err; +} + static void microcode_init_cpu(int cpu) { cpumask_t old; @@ -577,7 +624,8 @@ static void microcode_init_cpu(int cpu) set_cpus_allowed(current, cpumask_of_cpu(cpu)); mutex_lock(microcode_mutex); collect_cpu_info(cpu); - if (uci-valid system_state == SYSTEM_RUNNING) + if (uci-valid system_state == SYSTEM_RUNNING + !suspend_cpu_hotplug) cpu_request_microcode(cpu); mutex_unlock(microcode_mutex); set_cpus_allowed(current, old); @@ -663,13 +711,24 @@ static int mc_sysdev_add(struct sys_devi return 0; pr_debug(Microcode:CPU %d added\n, cpu); - memset(uci, 0, sizeof(*uci)); + /* If suspend_cpu_hotplug is set, the system is resuming and we should +* use the data from before the suspend. +*/ + if (suspend_cpu_hotplug) { + err = apply_microcode_on_cpu(cpu); + if (err) + microcode_fini_cpu(cpu); + } + if (!uci-valid) + memset(uci, 0, sizeof(*uci)); err = sysfs_create_group(sys_dev-kobj, mc_attr_group); if (err) return err; - microcode_init_cpu(cpu); + if (!uci-valid) + microcode_init_cpu(cpu); + return 0; } @@ -680,7 +739,11 @@ static int mc_sysdev_remove(struct sys_d if (!cpu_online(cpu)) return 0; pr_debug(Microcode:CPU %d removed\n, cpu); - microcode_fini_cpu(cpu); + /* If suspend_cpu_hotplug is set, the system is suspending and we should +* keep the microcode in
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote: On Sunday, 25 March 2007 01:40, Maxim wrote: On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi, I confirm that the above patch works, At least system didn't hang on resume with microcode driver loaded, I can't really test whenever it did update microcode because I almost sure there is nothing to update (I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) ) I selected this driver just in case when I compiled kernel. OK, thanks for testing. Hello, I tested this patch, and it does work too, System suspended /resumed correctly, no errors in kernel log - first_cpu = first_cpu(cpu_present_map); - if (!cpu_online(first_cpu)) { - error = _cpu_up(first_cpu); - if (error) { - printk(KERN_ERR Could not bring CPU%d up.\n, - first_cpu); - goto out; - } - } - Nice, I once have seen those lines too, and they look ridiculous, but could that be that they are still necessary on some systems, Best regards, Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Sunday, 25 March 2007 17:10, Maxim wrote: On Sunday 25 March 2007 14:13:07 Rafael J. Wysocki wrote: On Sunday, 25 March 2007 01:40, Maxim wrote: On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi, I confirm that the above patch works, At least system didn't hang on resume with microcode driver loaded, I can't really test whenever it did update microcode because I almost sure there is nothing to update (I use core 2 duo that I bought a month ago, and an intel motherboard with latest bios ( updated yesterday) ) I selected this driver just in case when I compiled kernel. OK, thanks for testing. Hello, I tested this patch, and it does work too, System suspended /resumed correctly, no errors in kernel log Thanks for the confirmation. I wonder if someone who actually needs the microcode to be updated can test it ... - first_cpu = first_cpu(cpu_present_map); - if (!cpu_online(first_cpu)) { - error = _cpu_up(first_cpu); - if (error) { - printk(KERN_ERR Could not bring CPU%d up.\n, - first_cpu); - goto out; - } - } - Nice, I once have seen those lines too, and they look ridiculous, but could that be that they are still necessary on some systems, I think this was a mistake from the beginning. Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: > On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: > > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: > > > On Thursday, 22 March 2007 00:39, Maxim wrote: > > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > > > > > Hi! > > > > > > > > > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work > > > > > > > > > anymore on my system. > > > > > > > > > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] > > > > > > > > > PM: Change code ordering in main.c > > > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] > > > > > > > > > swsusp: Change code ordering in disk.c > > > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] > > > > > > > > > swsusp: Change code ordering in user.c > > > > > > > > > > > > > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > > > > > > > > > I already reported about it, but now i know the reason why > > > > > > > > > suspend breaks. > > > > > > > > > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to > > > > > > > > > sleep until now, > > > > > > > > > and it did work because those functions could be called only > > > > > > > > > in process context > > > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) > > > > > > > > > or idle thread that does smp_init()). > > > > > > > > > > > > > > > > > > But now they are called _after_ all tasks were suspended, so > > > > > > > > > if cpu_down tries for example to take a lock > > > > > > > > > that is taken by different process, it can't since the > > > > > > > > > different proccess is frozen and can't release the lock. > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing > > > > > > > > this. > > > > > > > > > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen > > > > > > > > task > > > > > > > > holds a lock, that's a bug. > > > > > > > > > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > > > > > > > > > Are you using xfs? > > > > > > > > > > > > > > Well, this is the only case that can trigger it. There are no > > > > > > > other freezable > > > > > > > workqueues. > > > > > > > > > > > > > > Greetings, > > > > > > > Rafael > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > Yes, you are right and it is XFS > > > > > > > > > > > > System suspends and resumes with xfs and your patch correctly, > > > > > > > > > > Could you please sent this information to the list? I'd like it to > > > > > reach all > > > > > of the CCed parites. ;-) > > > > > > > > I did now ( sorry I just keep using this Answer command, instead of > > > > Answer to everybody) > > > > I didn't intend to send private email. > > > > > > > > > > > Of course I need to mention that I had to unload microcode > > > > > > update driver because it prevented resume, > > > > > > because it calls firmware loader helper, and again sleeps on > > > > > > lock > > > > > > > > > > This is interesting. Did it happen before or is it a regression? > > > > > > > > It is from the same group of bugs , I mean hang because cpu_up/down is > > > > called with frozen tasks > > > > Of course it didn't happen before those reordering commits were > > > > introduced > > > > > > Well, we want cpu_up/down to be called after processes have been frozen, > > > for > > > various reasons (one of them being that applications shouldn't see us > > > playing > > > with the CPUs). > > > > > > Thanks for reporting this, I'll have a look at the microcode update > > > driver. > > > > Well, I have invented the appended workaround, but I'm not sure how much > > sense it makes with respect to the microcode driver. At least, it doesn't > > break my AMD64 SMP setup. ;-) > > Modified version of the patch is appended. Unfortunately I have no hardware > supporting the microcode updates. > > Greetings, > Rafael > > > --- > arch/i386/kernel/microcode.c | 28 +--- > include/linux/cpu.h |2 ++ > kernel/cpu.c | 32 > 3 files changed, 43 insertions(+), 19 deletions(-) > > Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c > === > --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c > +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c > @@ -567,6 +567,16 @@ static int
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Friday 23 March 2007 16:42:44 Rafael J. Wysocki wrote: On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced Well, we want cpu_up/down to be called after processes have been frozen, for various reasons (one of them being that applications shouldn't see us playing with the CPUs). Thanks for reporting this, I'll have a look at the microcode update driver. Well, I have invented the appended workaround, but I'm not sure how much sense it makes with respect to the microcode driver. At least, it doesn't break my AMD64 SMP setup. ;-) Modified version of the patch is appended. Unfortunately I have no hardware supporting the microcode updates. Greetings, Rafael --- arch/i386/kernel/microcode.c | 28 +--- include/linux/cpu.h |2 ++ kernel/cpu.c | 32 3 files changed, 43 insertions(+), 19 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c @@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu return error; } +static void apply_microcode_on_cpu(int cpu) +{ + cpumask_t old; + + old = current-cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + apply_microcode(cpu); + set_cpus_allowed(current, old); +} + static void microcode_init_cpu(int cpu) { cpumask_t old; @@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi return 0; pr_debug(Microcode:CPU %d added\n, cpu); - memset(uci, 0, sizeof(*uci)); + /* If suspend_cpu_hotplug is set, the system is resuming and we should + * use the data from before the
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: > On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: > > On Thursday, 22 March 2007 00:39, Maxim wrote: > > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > > > > Hi! > > > > > > > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work > > > > > > > > anymore on my system. > > > > > > > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: > > > > > > > > Change code ordering in main.c > > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] > > > > > > > > swsusp: Change code ordering in disk.c > > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] > > > > > > > > swsusp: Change code ordering in user.c > > > > > > > > > > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > > > > > > > I already reported about it, but now i know the reason why > > > > > > > > suspend breaks. > > > > > > > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep > > > > > > > > until now, > > > > > > > > and it did work because those functions could be called only in > > > > > > > > process context > > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or > > > > > > > > idle thread that does smp_init()). > > > > > > > > > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > > > > > cpu_down tries for example to take a lock > > > > > > > > that is taken by different process, it can't since the > > > > > > > > different proccess is frozen and can't release the lock. > > > > > > > > > > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing > > > > > > > this. > > > > > > > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen > > > > > > > task > > > > > > > holds a lock, that's a bug. > > > > > > > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > > > > > > > Are you using xfs? > > > > > > > > > > > > Well, this is the only case that can trigger it. There are no > > > > > > other freezable > > > > > > workqueues. > > > > > > > > > > > > Greetings, > > > > > > Rafael > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > Yes, you are right and it is XFS > > > > > > > > > > System suspends and resumes with xfs and your patch correctly, > > > > > > > > Could you please sent this information to the list? I'd like it to > > > > reach all > > > > of the CCed parites. ;-) > > > > > > I did now ( sorry I just keep using this Answer command, instead of > > > Answer to everybody) > > > I didn't intend to send private email. > > > > > > > > > Of course I need to mention that I had to unload microcode > > > > > update driver because it prevented resume, > > > > > because it calls firmware loader helper, and again sleeps on > > > > > lock > > > > > > > > This is interesting. Did it happen before or is it a regression? > > > > > > It is from the same group of bugs , I mean hang because cpu_up/down is > > > called with frozen tasks > > > Of course it didn't happen before those reordering commits were introduced > > > > Well, we want cpu_up/down to be called after processes have been frozen, for > > various reasons (one of them being that applications shouldn't see us > > playing > > with the CPUs). > > > > Thanks for reporting this, I'll have a look at the microcode update driver. > > Well, I have invented the appended workaround, but I'm not sure how much > sense it makes with respect to the microcode driver. At least, it doesn't > break my AMD64 SMP setup. ;-) Modified version of the patch is appended. Unfortunately I have no hardware supporting the microcode updates. Greetings, Rafael --- arch/i386/kernel/microcode.c | 28 +--- include/linux/cpu.h |2 ++ kernel/cpu.c | 32 3 files changed, 43 insertions(+), 19 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c @@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu return error; } +static void apply_microcode_on_cpu(int cpu) +{ + cpumask_t old; + + old = current->cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + apply_microcode(cpu); + set_cpus_allowed(current, old); +} + static
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Friday, 23 March 2007 00:30, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced Well, we want cpu_up/down to be called after processes have been frozen, for various reasons (one of them being that applications shouldn't see us playing with the CPUs). Thanks for reporting this, I'll have a look at the microcode update driver. Well, I have invented the appended workaround, but I'm not sure how much sense it makes with respect to the microcode driver. At least, it doesn't break my AMD64 SMP setup. ;-) Modified version of the patch is appended. Unfortunately I have no hardware supporting the microcode updates. Greetings, Rafael --- arch/i386/kernel/microcode.c | 28 +--- include/linux/cpu.h |2 ++ kernel/cpu.c | 32 3 files changed, 43 insertions(+), 19 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c @@ -567,6 +567,16 @@ static int cpu_request_microcode(int cpu return error; } +static void apply_microcode_on_cpu(int cpu) +{ + cpumask_t old; + + old = current-cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + apply_microcode(cpu); + set_cpus_allowed(current, old); +} + static void microcode_init_cpu(int cpu) { cpumask_t old; @@ -663,13 +673,21 @@ static int mc_sysdev_add(struct sys_devi return 0; pr_debug(Microcode:CPU %d added\n, cpu); - memset(uci, 0, sizeof(*uci)); + /* If suspend_cpu_hotplug is set, the system is resuming and we should +* use the data from before the suspend. +*/ + if (!suspend_cpu_hotplug) + memset(uci, 0, sizeof(*uci)); err = sysfs_create_group(sys_dev-kobj, mc_attr_group); if
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: > On Thursday, 22 March 2007 00:39, Maxim wrote: > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > > > Hi! > > > > > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work > > > > > > > anymore on my system. > > > > > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: > > > > > > > Change code ordering in main.c > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] > > > > > > > swsusp: Change code ordering in disk.c > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] > > > > > > > swsusp: Change code ordering in user.c > > > > > > > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > > > > > I already reported about it, but now i know the reason why > > > > > > > suspend breaks. > > > > > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep > > > > > > > until now, > > > > > > > and it did work because those functions could be called only in > > > > > > > process context > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or > > > > > > > idle thread that does smp_init()). > > > > > > > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > > > > cpu_down tries for example to take a lock > > > > > > > that is taken by different process, it can't since the different > > > > > > > proccess is frozen and can't release the lock. > > > > > > > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > > > > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task > > > > > > holds a lock, that's a bug. > > > > > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > > > > > Are you using xfs? > > > > > > > > > > Well, this is the only case that can trigger it. There are no other > > > > > freezable > > > > > workqueues. > > > > > > > > > > Greetings, > > > > > Rafael > > > > > > > > > > > > > Hello, > > > > > > > > Yes, you are right and it is XFS > > > > > > > > System suspends and resumes with xfs and your patch correctly, > > > > > > Could you please sent this information to the list? I'd like it to reach > > > all > > > of the CCed parites. ;-) > > > > I did now ( sorry I just keep using this Answer command, instead of Answer > > to everybody) > > I didn't intend to send private email. > > > > > > > Of course I need to mention that I had to unload microcode > > > > update driver because it prevented resume, > > > > because it calls firmware loader helper, and again sleeps on > > > > lock > > > > > > This is interesting. Did it happen before or is it a regression? > > > > It is from the same group of bugs , I mean hang because cpu_up/down is > > called with frozen tasks > > Of course it didn't happen before those reordering commits were introduced > > Well, we want cpu_up/down to be called after processes have been frozen, for > various reasons (one of them being that applications shouldn't see us playing > with the CPUs). > > Thanks for reporting this, I'll have a look at the microcode update driver. Well, I have invented the appended workaround, but I'm not sure how much sense it makes with respect to the microcode driver. At least, it doesn't break my AMD64 SMP setup. ;-) Greetings, Rafael --- arch/i386/kernel/microcode.c | 23 +-- include/linux/cpu.h |2 ++ kernel/cpu.c | 20 +++- 3 files changed, 38 insertions(+), 7 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c 2007-03-16 21:51:23.0 +0100 +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c 2007-03-22 23:55:40.0 +0100 @@ -703,6 +703,16 @@ static struct sysdev_driver mc_sysdev_dr .resume = mc_sysdev_resume, }; +static __cpuinit void apply_microcode_on_cpu(int cpu) +{ + cpumask_t old; + + old = current->cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + apply_microcode(cpu); + set_cpus_allowed(current, old); +} + static __cpuinit int mc_cpu_callback(struct notifier_block *nb, unsigned long action, void *hcpu) { @@ -713,10 +723,19 @@ mc_cpu_callback(struct notifier_block *n switch (action) { case CPU_ONLINE: case CPU_DOWN_FAILED: - mc_sysdev_add(sys_dev); +
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 08:31, Andrew Morton wrote: > On Thu, 22 Mar 2007 08:23:39 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> > wrote: > > > On Thursday, 22 March 2007 05:51, David Chinner wrote: > > > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: > > > > > > > > I think this is the XFS problem with freezable workqueues. > > > > > > > > Maxim, please try to apply the appended patch and see if it helps. > > > > > > > > Greetings, > > > > Rafael > > > > > > > > > > > > --- > > > > Since freezable workqueues are broken in 2.6.21-rc > > > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > > > > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > > > > it's better to remove them altogether for 2.6.21 and change the only > > > > user of > > > > them (XFS) accordingly. > > > > > > > > --- > > > > fs/xfs/linux-2.6/xfs_buf.c |4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > > > === > > > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c > > > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > > > > if (!xfs_buf_zone) > > > > goto out_free_trace_buf; > > > > > > > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > > > > + xfslogd_workqueue = create_workqueue("xfslogd"); > > > > if (!xfslogd_workqueue) > > > > goto out_free_buf_zone; > > > > > > > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > > > > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > > > > if (!xfsdatad_workqueue) > > > > goto out_destroy_xfslogd_workqueue; > > > > > > > > > > Acked-by: Dave Chinner <[EMAIL PROTECTED]> > > > > > > Rafael, it sounds like this really needs to go into the next -rc > > > kernel. Can you push it to Linus? > > > > I tried, but Andrew resent it to you. :-) > > > > Andrew, could we please merge it? > > Sure. > > It shouldn't be this hard :( Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thu, 22 Mar 2007 08:23:39 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > On Thursday, 22 March 2007 05:51, David Chinner wrote: > > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: > > > > > > I think this is the XFS problem with freezable workqueues. > > > > > > Maxim, please try to apply the appended patch and see if it helps. > > > > > > Greetings, > > > Rafael > > > > > > > > > --- > > > Since freezable workqueues are broken in 2.6.21-rc > > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > > > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > > > it's better to remove them altogether for 2.6.21 and change the only user > > > of > > > them (XFS) accordingly. > > > > > > --- > > > fs/xfs/linux-2.6/xfs_buf.c |4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > > === > > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c > > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > > > if (!xfs_buf_zone) > > > goto out_free_trace_buf; > > > > > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > > > + xfslogd_workqueue = create_workqueue("xfslogd"); > > > if (!xfslogd_workqueue) > > > goto out_free_buf_zone; > > > > > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > > > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > > > if (!xfsdatad_workqueue) > > > goto out_destroy_xfslogd_workqueue; > > > > > > > Acked-by: Dave Chinner <[EMAIL PROTECTED]> > > > > Rafael, it sounds like this really needs to go into the next -rc > > kernel. Can you push it to Linus? > > I tried, but Andrew resent it to you. :-) > > Andrew, could we please merge it? Sure. It shouldn't be this hard :( - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 05:51, David Chinner wrote: > On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: > > > > I think this is the XFS problem with freezable workqueues. > > > > Maxim, please try to apply the appended patch and see if it helps. > > > > Greetings, > > Rafael > > > > > > --- > > Since freezable workqueues are broken in 2.6.21-rc > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > > it's better to remove them altogether for 2.6.21 and change the only user of > > them (XFS) accordingly. > > > > --- > > fs/xfs/linux-2.6/xfs_buf.c |4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > === > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > > if (!xfs_buf_zone) > > goto out_free_trace_buf; > > > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > > + xfslogd_workqueue = create_workqueue("xfslogd"); > > if (!xfslogd_workqueue) > > goto out_free_buf_zone; > > > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > > if (!xfsdatad_workqueue) > > goto out_destroy_xfslogd_workqueue; > > > > Acked-by: Dave Chinner <[EMAIL PROTECTED]> > > Rafael, it sounds like this really needs to go into the next -rc > kernel. Can you push it to Linus? I tried, but Andrew resent it to you. :-) Andrew, could we please merge it? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 05:51, David Chinner wrote: On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Greetings, Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; Acked-by: Dave Chinner [EMAIL PROTECTED] Rafael, it sounds like this really needs to go into the next -rc kernel. Can you push it to Linus? I tried, but Andrew resent it to you. :-) Andrew, could we please merge it? Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thu, 22 Mar 2007 08:23:39 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 22 March 2007 05:51, David Chinner wrote: On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Greetings, Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; Acked-by: Dave Chinner [EMAIL PROTECTED] Rafael, it sounds like this really needs to go into the next -rc kernel. Can you push it to Linus? I tried, but Andrew resent it to you. :-) Andrew, could we please merge it? Sure. It shouldn't be this hard :( - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 08:31, Andrew Morton wrote: On Thu, 22 Mar 2007 08:23:39 +0100 Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Thursday, 22 March 2007 05:51, David Chinner wrote: On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Greetings, Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; Acked-by: Dave Chinner [EMAIL PROTECTED] Rafael, it sounds like this really needs to go into the next -rc kernel. Can you push it to Linus? I tried, but Andrew resent it to you. :-) Andrew, could we please merge it? Sure. It shouldn't be this hard :( Thanks! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 00:53, Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced Well, we want cpu_up/down to be called after processes have been frozen, for various reasons (one of them being that applications shouldn't see us playing with the CPUs). Thanks for reporting this, I'll have a look at the microcode update driver. Well, I have invented the appended workaround, but I'm not sure how much sense it makes with respect to the microcode driver. At least, it doesn't break my AMD64 SMP setup. ;-) Greetings, Rafael --- arch/i386/kernel/microcode.c | 23 +-- include/linux/cpu.h |2 ++ kernel/cpu.c | 20 +++- 3 files changed, 38 insertions(+), 7 deletions(-) Index: linux-2.6.21-rc4/arch/i386/kernel/microcode.c === --- linux-2.6.21-rc4.orig/arch/i386/kernel/microcode.c 2007-03-16 21:51:23.0 +0100 +++ linux-2.6.21-rc4/arch/i386/kernel/microcode.c 2007-03-22 23:55:40.0 +0100 @@ -703,6 +703,16 @@ static struct sysdev_driver mc_sysdev_dr .resume = mc_sysdev_resume, }; +static __cpuinit void apply_microcode_on_cpu(int cpu) +{ + cpumask_t old; + + old = current-cpus_allowed; + set_cpus_allowed(current, cpumask_of_cpu(cpu)); + apply_microcode(cpu); + set_cpus_allowed(current, old); +} + static __cpuinit int mc_cpu_callback(struct notifier_block *nb, unsigned long action, void *hcpu) { @@ -713,10 +723,19 @@ mc_cpu_callback(struct notifier_block *n switch (action) { case CPU_ONLINE: case CPU_DOWN_FAILED: - mc_sysdev_add(sys_dev); + if (suspend_cpu_hotplug) + /* This means we have been called during the resume. +* This is not the real CPU hotplug and we don't need +* to register the sysdev +*/ + apply_microcode_on_cpu(cpu); + else + mc_sysdev_add(sys_dev); break; case
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: > > I think this is the XFS problem with freezable workqueues. > > Maxim, please try to apply the appended patch and see if it helps. > > Greetings, > Rafael > > > --- > Since freezable workqueues are broken in 2.6.21-rc > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > it's better to remove them altogether for 2.6.21 and change the only user of > them (XFS) accordingly. > > --- > fs/xfs/linux-2.6/xfs_buf.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > === > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > if (!xfs_buf_zone) > goto out_free_trace_buf; > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > + xfslogd_workqueue = create_workqueue("xfslogd"); > if (!xfslogd_workqueue) > goto out_free_buf_zone; > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > if (!xfsdatad_workqueue) > goto out_destroy_xfslogd_workqueue; > Acked-by: Dave Chinner <[EMAIL PROTECTED]> Rafael, it sounds like this really needs to go into the next -rc kernel. Can you push it to Linus? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:14:05 Maxim wrote: > On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote: > > Hi. > > > > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: > > > Hi, > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on > > > my system. > > > > > > I did a git-bisect and found that those commits break it: > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change > > > code ordering in main.c > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change > > > code ordering in disk.c > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change > > > code ordering in user.c > > > > > > I already reported about it, but now i know the reason why suspend breaks. > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, > > > and it did work because those functions could be called only in process > > > context > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle > > > thread that does smp_init()). > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down > > > tries for example to take a lock > > > that is taken by different process, it can't since the different proccess > > > is frozen and can't release the lock. > > > > > > I tested this and all results are positive: > > > > > > I disabled 2nd cpu by hand, and then suspend to ram was successfull. > > > Suspend to disk went correctly, but it hang on resume, and I know why. > > > It hang in old kernel trying to disable 2nd cpu that was enabled by it. > > > > > > I was able using kdb to confirm that this is true because it was still > > > possible to enter kdb, and see that > > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside > > > workqueue_cpu_callback. > > > > > > The solution for this problem seems to be ether complete audit of code > > > that uses register_cpu_notifier, > > > to ensure that it doesn't sleep. > > > Also documentation should be changed to note about it. > > > > > > Or, it is also possible to revert this change. > > > > Do you know exactly which mutex was being waited on and where it was > > taken? If you can say that, it would be much more helpful. > > > > Regards, > > > > Nigel > > > > > > Hello, > > It is workqueue_mutex > and it is taken in kernel/workqueue.c:797 > > this is guilt of freezable work queues , and XFS uses it (and I use XFS) > > Thanks to Rafael J. Wysocki for pointing it out to me. > > Regards, > Maxim Levitsky > I think that I made a mistake, I now reverted the patch that fixes the above error, and I wrote down back-trace from this hang, and it appears that system hangs in kthread_stop: This I have written in paper, using kdb: workqueue_cpu_callback -> cleanup_workqueue_thread -> kthread_stop -> and then wake_up_process (I think , I didn't wrote this on paper, I will check again) Regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:47:05 Nigel Cunningham wrote: > Hi. > > On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote: > > > Do you know exactly which mutex was being waited on and where it was > > > taken? If you can say that, it would be much more helpful. > > Yeah, me too, but assuming too much sometimes bites me :) > > > I think this is the XFS problem with freezable workqueues. > > > > Maxim, please try to apply the appended patch and see if it helps. > > Thanks for your subsequent messages, Maxim. Could you confirm for us > that the patch Rafael attached fixes it? > > Regards, > > Nigel > > > --- > > Since freezable workqueues are broken in 2.6.21-rc > > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > > it's better to remove them altogether for 2.6.21 and change the only user of > > them (XFS) accordingly. > > > > --- > > fs/xfs/linux-2.6/xfs_buf.c |4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > === > > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c > > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > > if (!xfs_buf_zone) > > goto out_free_trace_buf; > > > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > > + xfslogd_workqueue = create_workqueue("xfslogd"); > > if (!xfslogd_workqueue) > > goto out_free_buf_zone; > > > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > > if (!xfsdatad_workqueue) > > goto out_destroy_xfslogd_workqueue; > > > > Hello, I can confirm now that the above patch work, First as I said I did try to suspend with this patch and without XFS, and it did work, Now I reverted it and now system still suspends correctly without xfs module loaded ( I didn't tell you that i use now ext3, and that I generally compile everything in kernel, so i put XFS too, because I used it once, and I still have a XFS disk image) But system hangs with XFS loaded, so this patch works. Regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:53:54 Rafael J. Wysocki wrote: > On Thursday, 22 March 2007 00:39, Maxim wrote: > > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > > > Hi! > > > > > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work > > > > > > > anymore on my system. > > > > > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: > > > > > > > Change code ordering in main.c > > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] > > > > > > > swsusp: Change code ordering in disk.c > > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] > > > > > > > swsusp: Change code ordering in user.c > > > > > > > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > > > > > I already reported about it, but now i know the reason why > > > > > > > suspend breaks. > > > > > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep > > > > > > > until now, > > > > > > > and it did work because those functions could be called only in > > > > > > > process context > > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or > > > > > > > idle thread that does smp_init()). > > > > > > > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > > > > cpu_down tries for example to take a lock > > > > > > > that is taken by different process, it can't since the different > > > > > > > proccess is frozen and can't release the lock. > > > > > > > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > > > > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task > > > > > > holds a lock, that's a bug. > > > > > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > > > > > Are you using xfs? > > > > > > > > > > Well, this is the only case that can trigger it. There are no other > > > > > freezable > > > > > workqueues. > > > > > > > > > > Greetings, > > > > > Rafael > > > > > > > > > > > > > Hello, > > > > > > > > Yes, you are right and it is XFS > > > > > > > > System suspends and resumes with xfs and your patch correctly, > > > > > > Could you please sent this information to the list? I'd like it to reach > > > all > > > of the CCed parites. ;-) > > > > I did now ( sorry I just keep using this Answer command, instead of Answer > > to everybody) > > I didn't intend to send private email. > > > > > > > Of course I need to mention that I had to unload microcode > > > > update driver because it prevented resume, > > > > because it calls firmware loader helper, and again sleeps on > > > > lock > > > > > > This is interesting. Did it happen before or is it a regression? > > > > It is from the same group of bugs , I mean hang because cpu_up/down is > > called with frozen tasks > > Of course it didn't happen before those reordering commits were introduced > > Well, we want cpu_up/down to be called after processes have been frozen, for > various reasons (one of them being that applications shouldn't see us playing > with the CPUs). > > Thanks for reporting this, I'll have a look at the microcode update driver. > > > > > And also I noticed now that system oopses on second attempt to > > > > suspend ether to ram or disk > > > > in pci_restore_msi_state which is called indirectly by > > > > ahci_pci_device_resume, I will investigate this soon. > > > > > > Thanks. We've had such reports earlier, but I think the problem is still > > > unresolved. Any > > > additional information will be valuable. > > > > I will do my best, > > Also I want to note that the above problem is 100% repeatable, and happens > > independently whenever suspend to disk > > or suspend to ram was used in first successful try ( or at least, I got > > back-trace using kdb, after suspend to disk, after > > suspend to ram system hang, so I assume, that this it is same problem , > > because it didn't hang of first try) > > Thanks for the information. > > BTW, what's the last kernel you have tested? > > Rafael > Hello, Thanks for quick response, I will continue to test my system I use literally latest Linus's git tree. The kernel that works is 2.6.20 , and except very weird hang that happens sometimes (1 in 5~6 times) on resume from ram, everything works I described it in http://lkml.org/lkml/2007/3/16/126 Regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 00:39, Maxim wrote: > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > > Hi! > > > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work > > > > > > anymore on my system. > > > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: > > > > > > Change code ordering in main.c > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: > > > > > > Change code ordering in disk.c > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: > > > > > > Change code ordering in user.c > > > > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > > > I already reported about it, but now i know the reason why suspend > > > > > > breaks. > > > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep > > > > > > until now, > > > > > > and it did work because those functions could be called only in > > > > > > process context > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or > > > > > > idle thread that does smp_init()). > > > > > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > > > cpu_down tries for example to take a lock > > > > > > that is taken by different process, it can't since the different > > > > > > proccess is frozen and can't release the lock. > > > > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task > > > > > holds a lock, that's a bug. > > > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > > > Are you using xfs? > > > > > > > > Well, this is the only case that can trigger it. There are no other > > > > freezable > > > > workqueues. > > > > > > > > Greetings, > > > > Rafael > > > > > > > > > > Hello, > > > > > > Yes, you are right and it is XFS > > > > > > System suspends and resumes with xfs and your patch correctly, > > > > Could you please sent this information to the list? I'd like it to reach > > all > > of the CCed parites. ;-) > > I did now ( sorry I just keep using this Answer command, instead of Answer to > everybody) > I didn't intend to send private email. > > > > > Of course I need to mention that I had to unload microcode update > > > driver because it prevented resume, > > > because it calls firmware loader helper, and again sleeps on lock > > > > This is interesting. Did it happen before or is it a regression? > > It is from the same group of bugs , I mean hang because cpu_up/down is called > with frozen tasks > Of course it didn't happen before those reordering commits were introduced Well, we want cpu_up/down to be called after processes have been frozen, for various reasons (one of them being that applications shouldn't see us playing with the CPUs). Thanks for reporting this, I'll have a look at the microcode update driver. > > > And also I noticed now that system oopses on second attempt to suspend > > > ether to ram or disk > > > in pci_restore_msi_state which is called indirectly by > > > ahci_pci_device_resume, I will investigate this soon. > > > > Thanks. We've had such reports earlier, but I think the problem is still > > unresolved. Any > > additional information will be valuable. > > I will do my best, > Also I want to note that the above problem is 100% repeatable, and happens > independently whenever suspend to disk > or suspend to ram was used in first successful try ( or at least, I got > back-trace using kdb, after suspend to disk, after > suspend to ram system hang, so I assume, that this it is same problem , > because it didn't hang of first try) Thanks for the information. BTW, what's the last kernel you have tested? Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Hi. On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote: > > Do you know exactly which mutex was being waited on and where it was > > taken? If you can say that, it would be much more helpful. Yeah, me too, but assuming too much sometimes bites me :) > I think this is the XFS problem with freezable workqueues. > > Maxim, please try to apply the appended patch and see if it helps. Thanks for your subsequent messages, Maxim. Could you confirm for us that the patch Rafael attached fixes it? Regards, Nigel > --- > Since freezable workqueues are broken in 2.6.21-rc > (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, > http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) > it's better to remove them altogether for 2.6.21 and change the only user of > them (XFS) accordingly. > > --- > fs/xfs/linux-2.6/xfs_buf.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > === > --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c > +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c > @@ -1829,11 +1829,11 @@ xfs_buf_init(void) > if (!xfs_buf_zone) > goto out_free_trace_buf; > > - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); > + xfslogd_workqueue = create_workqueue("xfslogd"); > if (!xfslogd_workqueue) > goto out_free_buf_zone; > > - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); > + xfsdatad_workqueue = create_workqueue("xfsdatad"); > if (!xfsdatad_workqueue) > goto out_destroy_xfslogd_workqueue; > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:39:24 Maxim wrote: > On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > > On Thursday, 22 March 2007 00:09, Maxim wrote: > > > On Thursday 22 March 2007 00:39:02 you wrote: > > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > > Hi! > > > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work > > > > > > anymore on my system. > > > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: > > > > > > Change code ordering in main.c > > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: > > > > > > Change code ordering in disk.c > > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: > > > > > > Change code ordering in user.c > > > > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > > > I already reported about it, but now i know the reason why suspend > > > > > > breaks. > > > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep > > > > > > until now, > > > > > > and it did work because those functions could be called only in > > > > > > process context > > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or > > > > > > idle thread that does smp_init()). > > > > > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > > > cpu_down tries for example to take a lock > > > > > > that is taken by different process, it can't since the different > > > > > > proccess is frozen and can't release the lock. > > > > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task > > > > > holds a lock, that's a bug. > > > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > > > Are you using xfs? > > > > > > > > Well, this is the only case that can trigger it. There are no other > > > > freezable > > > > workqueues. > > > > > > > > Greetings, > > > > Rafael > > > > > > > > > > Hello, > > > > > > Yes, you are right and it is XFS > > > > > > System suspends and resumes with xfs and your patch correctly, > > > > Could you please sent this information to the list? I'd like it to reach > > all > > of the CCed parites. ;-) > > I did now ( sorry I just keep using this Answer command, instead of Answer to > everybody) > I didn't intend to send private email. > > > > > Of course I need to mention that I had to unload microcode update > > > driver because it prevented resume, > > > because it calls firmware loader helper, and again sleeps on lock > > > > This is interesting. Did it happen before or is it a regression? > > It is from the same group of bugs , I mean hang because cpu_up/down is called > with frozen tasks > Of course it didn't happen before those reordering commits were introduced > > > > > And also I noticed now that system oopses on second attempt to suspend > > > ether to ram or disk > > > in pci_restore_msi_state which is called indirectly by > > > ahci_pci_device_resume, I will investigate this soon. > > > > Thanks. We've had such reports earlier, but I think the problem is still > > unresolved. Any > > additional information will be valuable. > > I will do my best, > Also I want to note that the above problem is 100% repeatable, and happens > independently whenever suspend to disk > or suspend to ram was used in first successful try ( or at least, I got > back-trace using kdb, after suspend to disk, after suspend to ram system > hang, > so I assume, that this it is same problem , because it didn't hang of first > try) > > > > > Greetings, > > Rafael > > > > > > > And I forgot signature too (what happens with me today,... please sorry) Regards Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: > On Thursday, 22 March 2007 00:09, Maxim wrote: > > On Thursday 22 March 2007 00:39:02 you wrote: > > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > > Hi! > > > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore > > > > > on my system. > > > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change > > > > > code ordering in main.c > > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: > > > > > Change code ordering in disk.c > > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: > > > > > Change code ordering in user.c > > > > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > > > I already reported about it, but now i know the reason why suspend > > > > > breaks. > > > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until > > > > > now, > > > > > and it did work because those functions could be called only in > > > > > process context > > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle > > > > > thread that does smp_init()). > > > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > > cpu_down tries for example to take a lock > > > > > that is taken by different process, it can't since the different > > > > > proccess is frozen and can't release the lock. > > > > > > > > > > > > > Thanks for detailed explanation. > > > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task > > > > holds a lock, that's a bug. > > > > > > > > > Or, it is also possible to revert this change. > > > > > > > > Are you using xfs? > > > > > > Well, this is the only case that can trigger it. There are no other > > > freezable > > > workqueues. > > > > > > Greetings, > > > Rafael > > > > > > > Hello, > > > > Yes, you are right and it is XFS > > > > System suspends and resumes with xfs and your patch correctly, > > Could you please sent this information to the list? I'd like it to reach all > of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. > > > Of course I need to mention that I had to unload microcode update > > driver because it prevented resume, > > because it calls firmware loader helper, and again sleeps on lock > > This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced > > > And also I noticed now that system oopses on second attempt to suspend > > ether to ram or disk > > in pci_restore_msi_state which is called indirectly by > > ahci_pci_device_resume, I will investigate this soon. > > Thanks. We've had such reports earlier, but I think the problem is still > unresolved. Any > additional information will be valuable. I will do my best, Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try) > > Greetings, > Rafael > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:09:54 Maxim wrote: > On Thursday 22 March 2007 00:39:02 you wrote: > > On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > > > Hi! > > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore > > > > on my system. > > > > > > > > I did a git-bisect and found that those commits break it: > > > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change > > > > code ordering in main.c > > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: > > > > Change code ordering in disk.c > > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: > > > > Change code ordering in user.c > > > > > > > > > > (Yep, it was in my "to analyze" queue). > > > > > > > I already reported about it, but now i know the reason why suspend > > > > breaks. > > > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until > > > > now, > > > > and it did work because those functions could be called only in process > > > > context > > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle > > > > thread that does smp_init()). > > > > > > > > But now they are called _after_ all tasks were suspended, so if > > > > cpu_down tries for example to take a lock > > > > that is taken by different process, it can't since the different > > > > proccess is frozen and can't release the lock. > > > > > > > > > > Thanks for detailed explanation. > > > > > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > > > > > ...by design, "frozen" tasks must not hold any locks. If frozen task > > > holds a lock, that's a bug. > > > > > > > Or, it is also possible to revert this change. > > > > > > Are you using xfs? > > > > Well, this is the only case that can trigger it. There are no other > > freezable > > workqueues. > > > > Greetings, > > Rafael > > > > Hello, > > Yes, you are right and it is XFS > > System suspends and resumes with xfs and your patch correctly, > > Of course I need to mention that I had to unload microcode update > driver because it prevented resume, > because it calls firmware loader helper, and again sleeps on lock > > And also I noticed now that system oopses on second attempt to suspend > ether to ram or disk > in pci_restore_msi_state which is called indirectly by > ahci_pci_device_resume, I will investigate this soon. > > Thanks a lot, > Regards, > Maxim Levitsky > > And I forgot to cc here too, I didn't intend to send private email. Regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:14:05 Maxim wrote: > On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote: > > Hi. > > > > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: > > > Hi, > > > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on > > > my system. > > > > > > I did a git-bisect and found that those commits break it: > > > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change > > > code ordering in main.c > > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change > > > code ordering in disk.c > > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change > > > code ordering in user.c > > > > > > I already reported about it, but now i know the reason why suspend breaks. > > > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, > > > and it did work because those functions could be called only in process > > > context > > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle > > > thread that does smp_init()). > > > > > > But now they are called _after_ all tasks were suspended, so if cpu_down > > > tries for example to take a lock > > > that is taken by different process, it can't since the different proccess > > > is frozen and can't release the lock. > > > > > > I tested this and all results are positive: > > > > > > I disabled 2nd cpu by hand, and then suspend to ram was successfull. > > > Suspend to disk went correctly, but it hang on resume, and I know why. > > > It hang in old kernel trying to disable 2nd cpu that was enabled by it. > > > > > > I was able using kdb to confirm that this is true because it was still > > > possible to enter kdb, and see that > > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside > > > workqueue_cpu_callback. > > > > > > The solution for this problem seems to be ether complete audit of code > > > that uses register_cpu_notifier, > > > to ensure that it doesn't sleep. > > > Also documentation should be changed to note about it. > > > > > > Or, it is also possible to revert this change. > > > > Do you know exactly which mutex was being waited on and where it was > > taken? If you can say that, it would be much more helpful. > > > > Regards, > > > > Nigel > > > > > > Hello, > > It is workqueue_mutex > and it is taken in kernel/workqueue.c:797 > > this is guilt of freezable work queues , and XFS uses it (and I use XFS) > > Thanks to Rafael J. Wysocki for pointing it out to me. > > Regards, > Maxim Levitsky > oops, Forgot to cc Regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: > Hi! > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my > > system. > > > > I did a git-bisect and found that those commits break it: > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code > > ordering in main.c > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change > > code ordering in disk.c > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change > > code ordering in user.c > > > > (Yep, it was in my "to analyze" queue). > > > I already reported about it, but now i know the reason why suspend breaks. > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, > > and it did work because those functions could be called only in process > > context > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle > > thread that does smp_init()). > > > > But now they are called _after_ all tasks were suspended, so if cpu_down > > tries for example to take a lock > > that is taken by different process, it can't since the different proccess > > is frozen and can't release the lock. > > > > Thanks for detailed explanation. > > ...but, on my machine suspend works ok in -rc4. I'm not seeing this. > > ...by design, "frozen" tasks must not hold any locks. If frozen task > holds a lock, that's a bug. > > > Or, it is also possible to revert this change. > > Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Hi! > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my > system. > > I did a git-bisect and found that those commits break it: > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code > ordering in main.c > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change > code ordering in disk.c > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change > code ordering in user.c > (Yep, it was in my "to analyze" queue). > I already reported about it, but now i know the reason why suspend breaks. > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, > and it did work because those functions could be called only in process > context > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread > that does smp_init()). > > But now they are called _after_ all tasks were suspended, so if cpu_down > tries for example to take a lock > that is taken by different process, it can't since the different proccess is > frozen and can't release the lock. > Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, "frozen" tasks must not hold any locks. If frozen task holds a lock, that's a bug. > Or, it is also possible to revert this change. Are you using xfs? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Wednesday, 21 March 2007 22:22, Nigel Cunningham wrote: > Hi. > > On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: > > Hi, > > > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my > > system. > > > > I did a git-bisect and found that those commits break it: > > > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code > > ordering in main.c > > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change > > code ordering in disk.c > > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change > > code ordering in user.c > > > > I already reported about it, but now i know the reason why suspend breaks. > > > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, > > and it did work because those functions could be called only in process > > context > > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle > > thread that does smp_init()). > > > > But now they are called _after_ all tasks were suspended, so if cpu_down > > tries for example to take a lock > > that is taken by different process, it can't since the different proccess > > is frozen and can't release the lock. > > > > I tested this and all results are positive: > > > > I disabled 2nd cpu by hand, and then suspend to ram was successfull. > > Suspend to disk went correctly, but it hang on resume, and I know why. > > It hang in old kernel trying to disable 2nd cpu that was enabled by it. > > > > I was able using kdb to confirm that this is true because it was still > > possible to enter kdb, and see that > > idle thread (swapper) was active, and uswsusp was waiting on mutex inside > > workqueue_cpu_callback. > > > > The solution for this problem seems to be ether complete audit of code that > > uses register_cpu_notifier, > > to ensure that it doesn't sleep. > > Also documentation should be changed to note about it. > > > > Or, it is also possible to revert this change. > > Do you know exactly which mutex was being waited on and where it was > taken? If you can say that, it would be much more helpful. I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Greetings, Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernel=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernel=117261312523921=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue("xfslogd"); + xfslogd_workqueue = create_workqueue("xfslogd"); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad"); + xfsdatad_workqueue = create_workqueue("xfsdatad"); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Hi. On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: > Hi, > > Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my > system. > > I did a git-bisect and found that those commits break it: > > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code > ordering in main.c > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change > code ordering in disk.c > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change > code ordering in user.c > > I already reported about it, but now i know the reason why suspend breaks. > > The problem is that both cpu_up/cpu_down were allowed to sleep until now, > and it did work because those functions could be called only in process > context > (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread > that does smp_init()). > > But now they are called _after_ all tasks were suspended, so if cpu_down > tries for example to take a lock > that is taken by different process, it can't since the different proccess is > frozen and can't release the lock. > > I tested this and all results are positive: > > I disabled 2nd cpu by hand, and then suspend to ram was successfull. > Suspend to disk went correctly, but it hang on resume, and I know why. > It hang in old kernel trying to disable 2nd cpu that was enabled by it. > > I was able using kdb to confirm that this is true because it was still > possible to enter kdb, and see that > idle thread (swapper) was active, and uswsusp was waiting on mutex inside > workqueue_cpu_callback. > > The solution for this problem seems to be ether complete audit of code that > uses register_cpu_notifier, > to ensure that it doesn't sleep. > Also documentation should be changed to note about it. > > Or, it is also possible to revert this change. Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Hi. On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: Hi, Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. I tested this and all results are positive: I disabled 2nd cpu by hand, and then suspend to ram was successfull. Suspend to disk went correctly, but it hang on resume, and I know why. It hang in old kernel trying to disable 2nd cpu that was enabled by it. I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback. The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier, to ensure that it doesn't sleep. Also documentation should be changed to note about it. Or, it is also possible to revert this change. Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. Regards, Nigel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Wednesday, 21 March 2007 22:22, Nigel Cunningham wrote: Hi. On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: Hi, Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. I tested this and all results are positive: I disabled 2nd cpu by hand, and then suspend to ram was successfull. Suspend to disk went correctly, but it hang on resume, and I know why. It hang in old kernel trying to disable 2nd cpu that was enabled by it. I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback. The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier, to ensure that it doesn't sleep. Also documentation should be changed to note about it. Or, it is also possible to revert this change. Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Greetings, Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:14:05 Maxim wrote: On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote: Hi. On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: Hi, Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. I tested this and all results are positive: I disabled 2nd cpu by hand, and then suspend to ram was successfull. Suspend to disk went correctly, but it hang on resume, and I know why. It hang in old kernel trying to disable 2nd cpu that was enabled by it. I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback. The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier, to ensure that it doesn't sleep. Also documentation should be changed to note about it. Or, it is also possible to revert this change. Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. Regards, Nigel Hello, It is workqueue_mutex and it is taken in kernel/workqueue.c:797 this is guilt of freezable work queues , and XFS uses it (and I use XFS) Thanks to Rafael J. Wysocki for pointing it out to me. Regards, Maxim Levitsky oops, Forgot to cc Regards, Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:09:54 Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock And also I noticed now that system oopses on second attempt to suspend ether to ram or disk in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon. Thanks a lot, Regards, Maxim Levitsky And I forgot to cc here too, I didn't intend to send private email. Regards, Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced And also I noticed now that system oopses on second attempt to suspend ether to ram or disk in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon. Thanks. We've had such reports earlier, but I think the problem is still unresolved. Any additional information will be valuable. I will do my best, Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try) Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:39:24 Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced And also I noticed now that system oopses on second attempt to suspend ether to ram or disk in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon. Thanks. We've had such reports earlier, but I think the problem is still unresolved. Any additional information will be valuable. I will do my best, Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try) Greetings, Rafael And I forgot signature too (what happens with me today,... please sorry) Regards Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
Hi. On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote: Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. Yeah, me too, but assuming too much sometimes bites me :) I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Thanks for your subsequent messages, Maxim. Could you confirm for us that the patch Rafael attached fixes it? Regards, Nigel --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced Well, we want cpu_up/down to be called after processes have been frozen, for various reasons (one of them being that applications shouldn't see us playing with the CPUs). Thanks for reporting this, I'll have a look at the microcode update driver. And also I noticed now that system oopses on second attempt to suspend ether to ram or disk in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon. Thanks. We've had such reports earlier, but I think the problem is still unresolved. Any additional information will be valuable. I will do my best, Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try) Thanks for the information. BTW, what's the last kernel you have tested? Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:53:54 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:39, Maxim wrote: On Thursday 22 March 2007 01:24:25 Rafael J. Wysocki wrote: On Thursday, 22 March 2007 00:09, Maxim wrote: On Thursday 22 March 2007 00:39:02 you wrote: On Wednesday, 21 March 2007 23:21, Pavel Machek wrote: Hi! Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (Yep, it was in my to analyze queue). I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. Thanks for detailed explanation. ...but, on my machine suspend works ok in -rc4. I'm not seeing this. ...by design, frozen tasks must not hold any locks. If frozen task holds a lock, that's a bug. Or, it is also possible to revert this change. Are you using xfs? Well, this is the only case that can trigger it. There are no other freezable workqueues. Greetings, Rafael Hello, Yes, you are right and it is XFS System suspends and resumes with xfs and your patch correctly, Could you please sent this information to the list? I'd like it to reach all of the CCed parites. ;-) I did now ( sorry I just keep using this Answer command, instead of Answer to everybody) I didn't intend to send private email. Of course I need to mention that I had to unload microcode update driver because it prevented resume, because it calls firmware loader helper, and again sleeps on lock This is interesting. Did it happen before or is it a regression? It is from the same group of bugs , I mean hang because cpu_up/down is called with frozen tasks Of course it didn't happen before those reordering commits were introduced Well, we want cpu_up/down to be called after processes have been frozen, for various reasons (one of them being that applications shouldn't see us playing with the CPUs). Thanks for reporting this, I'll have a look at the microcode update driver. And also I noticed now that system oopses on second attempt to suspend ether to ram or disk in pci_restore_msi_state which is called indirectly by ahci_pci_device_resume, I will investigate this soon. Thanks. We've had such reports earlier, but I think the problem is still unresolved. Any additional information will be valuable. I will do my best, Also I want to note that the above problem is 100% repeatable, and happens independently whenever suspend to disk or suspend to ram was used in first successful try ( or at least, I got back-trace using kdb, after suspend to disk, after suspend to ram system hang, so I assume, that this it is same problem , because it didn't hang of first try) Thanks for the information. BTW, what's the last kernel you have tested? Rafael Hello, Thanks for quick response, I will continue to test my system I use literally latest Linus's git tree. The kernel that works is 2.6.20 , and except very weird hang that happens sometimes (1 in 5~6 times) on resume from ram, everything works I described it in http://lkml.org/lkml/2007/3/16/126 Regards, Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:47:05 Nigel Cunningham wrote: Hi. On Wed, 2007-03-21 at 22:38 +0100, Rafael J. Wysocki wrote: Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. Yeah, me too, but assuming too much sometimes bites me :) I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Thanks for your subsequent messages, Maxim. Could you confirm for us that the patch Rafael attached fixes it? Regards, Nigel --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; Hello, I can confirm now that the above patch work, First as I said I did try to suspend with this patch and without XFS, and it did work, Now I reverted it and now system still suspends correctly without xfs module loaded ( I didn't tell you that i use now ext3, and that I generally compile everything in kernel, so i put XFS too, because I used it once, and I still have a XFS disk image) But system hangs with XFS loaded, so this patch works. Regards, Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Thursday 22 March 2007 01:14:05 Maxim wrote: On Wednesday 21 March 2007 23:22:40 Nigel Cunningham wrote: Hi. On Wed, 2007-03-21 at 18:40 +0200, Maxim Levitsky wrote: Hi, Starting with 2.6.21-rc1 suspend to ram and disk doesn't work anymore on my system. I did a git-bisect and found that those commits break it: e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c I already reported about it, but now i know the reason why suspend breaks. The problem is that both cpu_up/cpu_down were allowed to sleep until now, and it did work because those functions could be called only in process context (the one that writes to /sys/devices/system/cpu/cpu*/online) or idle thread that does smp_init()). But now they are called _after_ all tasks were suspended, so if cpu_down tries for example to take a lock that is taken by different process, it can't since the different proccess is frozen and can't release the lock. I tested this and all results are positive: I disabled 2nd cpu by hand, and then suspend to ram was successfull. Suspend to disk went correctly, but it hang on resume, and I know why. It hang in old kernel trying to disable 2nd cpu that was enabled by it. I was able using kdb to confirm that this is true because it was still possible to enter kdb, and see that idle thread (swapper) was active, and uswsusp was waiting on mutex inside workqueue_cpu_callback. The solution for this problem seems to be ether complete audit of code that uses register_cpu_notifier, to ensure that it doesn't sleep. Also documentation should be changed to note about it. Or, it is also possible to revert this change. Do you know exactly which mutex was being waited on and where it was taken? If you can say that, it would be much more helpful. Regards, Nigel Hello, It is workqueue_mutex and it is taken in kernel/workqueue.c:797 this is guilt of freezable work queues , and XFS uses it (and I use XFS) Thanks to Rafael J. Wysocki for pointing it out to me. Regards, Maxim Levitsky I think that I made a mistake, I now reverted the patch that fixes the above error, and I wrote down back-trace from this hang, and it appears that system hangs in kthread_stop: This I have written in paper, using kdb: workqueue_cpu_callback - cleanup_workqueue_thread - kthread_stop - and then wake_up_process (I think , I didn't wrote this on paper, I will check again) Regards, Maxim Levitsky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Code reordering in swsusp breaks suspend on SMP systems
On Wed, Mar 21, 2007 at 10:38:33PM +0100, Rafael J. Wysocki wrote: I think this is the XFS problem with freezable workqueues. Maxim, please try to apply the appended patch and see if it helps. Greetings, Rafael --- Since freezable workqueues are broken in 2.6.21-rc (cf. http://marc.theaimsgroup.com/?l=linux-kernelm=116855740612755, http://marc.theaimsgroup.com/?l=linux-kernelm=117261312523921w=2) it's better to remove them altogether for 2.6.21 and change the only user of them (XFS) accordingly. --- fs/xfs/linux-2.6/xfs_buf.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.21-rc4.orig/fs/xfs/linux-2.6/xfs_buf.c +++ linux-2.6.21-rc4/fs/xfs/linux-2.6/xfs_buf.c @@ -1829,11 +1829,11 @@ xfs_buf_init(void) if (!xfs_buf_zone) goto out_free_trace_buf; - xfslogd_workqueue = create_freezeable_workqueue(xfslogd); + xfslogd_workqueue = create_workqueue(xfslogd); if (!xfslogd_workqueue) goto out_free_buf_zone; - xfsdatad_workqueue = create_freezeable_workqueue(xfsdatad); + xfsdatad_workqueue = create_workqueue(xfsdatad); if (!xfsdatad_workqueue) goto out_destroy_xfslogd_workqueue; Acked-by: Dave Chinner [EMAIL PROTECTED] Rafael, it sounds like this really needs to go into the next -rc kernel. Can you push it to Linus? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/