Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Wed, 2007-05-30 at 16:04 +0200, Rafael J. Wysocki wrote: > On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote: > > On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: > > > On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: > > > > > > (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond > > > the original point) > > > > > > > Notice that we want to be able to suspend while hibernating -- for > > > > suspend to both behaviour. So drivers may _not_ rely on system being > > > > runnable. > > > > > > So keep the driver layers read-only and unfreeze the processes after > > > doing the atomic copy. > > > > I know you probably won't care, but that's not an option for Suspend2 - > > I get the possibility of a full image by overwriting LRU pages that were > > saved prior to the atomic copy. > > This generally is a problem, not only for suspend2. :-) > > Once you've unfrozen the user land, we can't rely on the hibernation image any > more, because some tasks may cause the on-disk filesystems' state to change. True. I understood, perhaps wrongly, that when Matthew spoke of keeping the drivers layers read-only, he was meaning stopping filesystem changes by some other means. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 30 May 2007, Rafael J. Wysocki wrote: > > Very true. And I think the right order should be to make the midlayers do > this and then remove the freezer from the STR code path, not the other way > around. :-) Yes. After all, STR simply shouldn't _care_. The rule should be that in a well-written setup, STR "just works" whether user processes are suspended or not. In other words, the whole freezing part isn't about STR. It should be totally immaterial. (Of course, that assumes that the freezing is _sane_, of course: ie the core kernel threads shouldn't all be frozen. I think Rafael's patch to turn the defaults around are a big step in the right direction). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond > the original point) > > > Notice that we want to be able to suspend while hibernating -- for > > suspend to both behaviour. So drivers may _not_ rely on system being > > runnable. > > So keep the driver layers read-only and unfreeze the processes after > doing the atomic copy. To read firmware you probably need to _write_ atimes. Anyway, make-disks-read-only patch would be welcome. I just think it is going to be more complex than freezer. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 30 May 2007 12:26:57 +0200 Romano Giannetti wrote: > > On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote: > > > > On Tue, 29 May 2007, Romano Giannetti wrote: > > > > > > - The good (?) news. I have made 7 suspend/resume cycle (to ram, I > > > haven't tested hibernation) with a 2.6.21.2 with that patch, applied > > > manually. The system did suspend and resume nicely even compiling a > > > kernel and opening openoffice. Normally (le me stress _normally_) no > > > delay was apparent on resume. I do not know how dangerous is this... :-) > > > > > > - The bad (?) news. One time out of 7 I had the 60 seconds delay. > > > > Interesting. If you can re-create it, please do the sysrq-T thing again, > > to see what's up. (Also, you might do "sysrq-p", which gives the current > > process data, which sysrq-T does not). > > > I've got it, but I had a problem: I filled the dmesg buffer. I will try > to find where to enlarge it. I have posted the partial result to: use 'dmesg -s 10' if it's just dmesg(8) that needs help. If it's the kernel buffer filling up, you can rebuild the kernel after changing CONFIG_LOG_BUF_SHIFT, but it's easier just to boot using this: log_buf_len=n Sets the size of the printk ring buffer, in bytes. Format: { n | nk | nM } n must be a power of two. The default size is set in the kernel config file. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, May 30, 2007 at 04:04:22PM +0200, Rafael J. Wysocki wrote: > On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote: > > On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: > > > So keep the driver layers read-only and unfreeze the processes after > > > doing the atomic copy. > > > > I know you probably won't care, but that's not an option for Suspend2 - > > I get the possibility of a full image by overwriting LRU pages that were > > saved prior to the atomic copy. > > This generally is a problem, not only for suspend2. :-) > > Once you've unfrozen the user land, we can't rely on the hibernation image any > more, because some tasks may cause the on-disk filesystems' state to change. Hence "keep the driver layers read-only" :) -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wednesday, 30 May 2007 15:29, Matthew Garrett wrote: > On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote: > > > That aside, keeping the driver layers read-only sounds more complicated > > than just freezing processes. > > It's a problem that effectively has to be solved for STR anyway if > we're going to suspend without freezing. The midlayers need to be able > to block requests when the low-level devices are suspended, Very true. And I think the right order should be to make the midlayers do this and then remove the freezer from the STR code path, not the other way around. :-) > so we can just re-use that code. Yes, that should be possible. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote: > On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: > > On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: > > > > (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond > > the original point) > > > > > Notice that we want to be able to suspend while hibernating -- for > > > suspend to both behaviour. So drivers may _not_ rely on system being > > > runnable. > > > > So keep the driver layers read-only and unfreeze the processes after > > doing the atomic copy. > > I know you probably won't care, but that's not an option for Suspend2 - > I get the possibility of a full image by overwriting LRU pages that were > saved prior to the atomic copy. This generally is a problem, not only for suspend2. :-) Once you've unfrozen the user land, we can't rely on the hibernation image any more, because some tasks may cause the on-disk filesystems' state to change. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote: > That aside, keeping the driver layers read-only sounds more complicated > than just freezing processes. It's a problem that effectively has to be solved for STR anyway if we're going to suspend without freezing. The midlayers need to be able to block requests when the low-level devices are suspended, so we can just re-use that code. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: > On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: > > (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond > the original point) > > > Notice that we want to be able to suspend while hibernating -- for > > suspend to both behaviour. So drivers may _not_ rely on system being > > runnable. > > So keep the driver layers read-only and unfreeze the processes after > doing the atomic copy. I know you probably won't care, but that's not an option for Suspend2 - I get the possibility of a full image by overwriting LRU pages that were saved prior to the atomic copy. That aside, keeping the driver layers read-only sounds more complicated than just freezing processes. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond the original point) > Notice that we want to be able to suspend while hibernating -- for > suspend to both behaviour. So drivers may _not_ rely on system being > runnable. So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to > > > the filesystem VFS callers? Or is that starting to get messy again? > > > > Yeah. Getting messy again :) > > Indeed. And also misses the point - the point being that we don't actually > need to freeze anything at all most of the time. There's nothing wrong > with making memory allocations etc. > > And yes, suspend is different from hibernate. I can see how hibernate > people are worried about people writing to things after doing the > snapshot, but those concerns don't exist with suspend. With suspend, the > biggest concern is accessing a device after it has been suspended, but on > the other hand, also the fact that we end up having driver writers used > to the system being "runnable", so they do things that really do require a > full-fledged system (and sometimes that means just some delayed action > using a kernel thread, other times it seems to rely on more complex > behaviour like firmware loading :^p ) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. (Suspend to both is: write image to disk, then suspend to RAM. If you do not run out of battery, resume is from RAM and fast, if you do, you still can do resume from disk, not loosing your data). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote: > > On Tue, 29 May 2007, Romano Giannetti wrote: > > > > - The good (?) news. I have made 7 suspend/resume cycle (to ram, I > > haven't tested hibernation) with a 2.6.21.2 with that patch, applied > > manually. The system did suspend and resume nicely even compiling a > > kernel and opening openoffice. Normally (le me stress _normally_) no > > delay was apparent on resume. I do not know how dangerous is this... :-) > > > > - The bad (?) news. One time out of 7 I had the 60 seconds delay. > > Interesting. If you can re-create it, please do the sysrq-T thing again, > to see what's up. (Also, you might do "sysrq-p", which gives the current > process data, which sysrq-T does not). I've got it, but I had a problem: I filled the dmesg buffer. I will try to find where to enlarge it. I have posted the partial result to: http://www.dea.icai.upcomillas.es/romano/linux/info/dmesg-resume-nofreeze.txt in the hope that something can be used. I am running 2.6.21.2, with the "no freeze kthreads at all" patch from Matthew Garrett, with this add-on: --- drivers/base/firmware_class.c.orig 2007-05-30 12:19:59.0 +0200 +++ drivers/base/firmware_class.c 2007-05-29 19:39:56.0 +0200 @@ -471,7 +471,11 @@ struct device *device) { int uevent = 1; -return _request_firmware(firmware_p, name, device, uevent); +int rval; +printk(KERN_ERR "FW: requesting firmware (sync) for %s\n", name); +rval = _request_firmware(firmware_p, name, device, uevent); +printk(KERN_ERR "FW: return %d\n", rval); +return rval; } /** @@ -545,7 +549,9 @@ struct task_struct *task; struct firmware_work *fw_work = kmalloc(sizeof (struct firmware_work), GFP_ATOMIC); - + +printk(KERN_ERR "FW: requesting firmware (async) for %s\n", name); + if (!fw_work) return -ENOMEM; if (!try_module_get(module)) { @@ -569,8 +575,12 @@ fw_work->cont(NULL, fw_work->context); module_put(fw_work->module); kfree(fw_work); +printk(KERN_ERR "FW: failing return %d\n", PTR_ERR(task)); return PTR_ERR(task); } + +printk(KERN_ERR "FW: normal return\n"); + return 0; } -- Romano Giannetti --- [EMAIL PROTECTED] Sorry for the following disclaimer, it's attached by our otugoing server and I cannot shut it up. -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Wed, 2007-05-30 at 16:04 +0200, Rafael J. Wysocki wrote: On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote: On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond the original point) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. I know you probably won't care, but that's not an option for Suspend2 - I get the possibility of a full image by overwriting LRU pages that were saved prior to the atomic copy. This generally is a problem, not only for suspend2. :-) Once you've unfrozen the user land, we can't rely on the hibernation image any more, because some tasks may cause the on-disk filesystems' state to change. True. I understood, perhaps wrongly, that when Matthew spoke of keeping the drivers layers read-only, he was meaning stopping filesystem changes by some other means. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote: On Tue, 29 May 2007, Romano Giannetti wrote: - The good (?) news. I have made 7 suspend/resume cycle (to ram, I haven't tested hibernation) with a 2.6.21.2 with that patch, applied manually. The system did suspend and resume nicely even compiling a kernel and opening openoffice. Normally (le me stress _normally_) no delay was apparent on resume. I do not know how dangerous is this... :-) - The bad (?) news. One time out of 7 I had the 60 seconds delay. Interesting. If you can re-create it, please do the sysrq-T thing again, to see what's up. (Also, you might do sysrq-p, which gives the current process data, which sysrq-T does not). I've got it, but I had a problem: I filled the dmesg buffer. I will try to find where to enlarge it. I have posted the partial result to: http://www.dea.icai.upcomillas.es/romano/linux/info/dmesg-resume-nofreeze.txt in the hope that something can be used. I am running 2.6.21.2, with the no freeze kthreads at all patch from Matthew Garrett, with this add-on: --- drivers/base/firmware_class.c.orig 2007-05-30 12:19:59.0 +0200 +++ drivers/base/firmware_class.c 2007-05-29 19:39:56.0 +0200 @@ -471,7 +471,11 @@ struct device *device) { int uevent = 1; -return _request_firmware(firmware_p, name, device, uevent); +int rval; +printk(KERN_ERR FW: requesting firmware (sync) for %s\n, name); +rval = _request_firmware(firmware_p, name, device, uevent); +printk(KERN_ERR FW: return %d\n, rval); +return rval; } /** @@ -545,7 +549,9 @@ struct task_struct *task; struct firmware_work *fw_work = kmalloc(sizeof (struct firmware_work), GFP_ATOMIC); - + +printk(KERN_ERR FW: requesting firmware (async) for %s\n, name); + if (!fw_work) return -ENOMEM; if (!try_module_get(module)) { @@ -569,8 +575,12 @@ fw_work-cont(NULL, fw_work-context); module_put(fw_work-module); kfree(fw_work); +printk(KERN_ERR FW: failing return %d\n, PTR_ERR(task)); return PTR_ERR(task); } + +printk(KERN_ERR FW: normal return\n); + return 0; } -- Romano Giannetti --- [EMAIL PROTECTED] Sorry for the following disclaimer, it's attached by our otugoing server and I cannot shut it up. -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to the filesystem VFS callers? Or is that starting to get messy again? Yeah. Getting messy again :) Indeed. And also misses the point - the point being that we don't actually need to freeze anything at all most of the time. There's nothing wrong with making memory allocations etc. And yes, suspend is different from hibernate. I can see how hibernate people are worried about people writing to things after doing the snapshot, but those concerns don't exist with suspend. With suspend, the biggest concern is accessing a device after it has been suspended, but on the other hand, also the fact that we end up having driver writers used to the system being runnable, so they do things that really do require a full-fledged system (and sometimes that means just some delayed action using a kernel thread, other times it seems to rely on more complex behaviour like firmware loading :^p ) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. (Suspend to both is: write image to disk, then suspend to RAM. If you do not run out of battery, resume is from RAM and fast, if you do, you still can do resume from disk, not loosing your data). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond the original point) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond the original point) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. I know you probably won't care, but that's not an option for Suspend2 - I get the possibility of a full image by overwriting LRU pages that were saved prior to the atomic copy. That aside, keeping the driver layers read-only sounds more complicated than just freezing processes. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote: That aside, keeping the driver layers read-only sounds more complicated than just freezing processes. It's a problem that effectively has to be solved for STR anyway if we're going to suspend without freezing. The midlayers need to be able to block requests when the low-level devices are suspended, so we can just re-use that code. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote: On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote: (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond the original point) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. I know you probably won't care, but that's not an option for Suspend2 - I get the possibility of a full image by overwriting LRU pages that were saved prior to the atomic copy. This generally is a problem, not only for suspend2. :-) Once you've unfrozen the user land, we can't rely on the hibernation image any more, because some tasks may cause the on-disk filesystems' state to change. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wednesday, 30 May 2007 15:29, Matthew Garrett wrote: On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote: That aside, keeping the driver layers read-only sounds more complicated than just freezing processes. It's a problem that effectively has to be solved for STR anyway if we're going to suspend without freezing. The midlayers need to be able to block requests when the low-level devices are suspended, Very true. And I think the right order should be to make the midlayers do this and then remove the freezer from the STR code path, not the other way around. :-) so we can just re-use that code. Yes, that should be possible. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, May 30, 2007 at 04:04:22PM +0200, Rafael J. Wysocki wrote: On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote: On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote: So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. I know you probably won't care, but that's not an option for Suspend2 - I get the possibility of a full image by overwriting LRU pages that were saved prior to the atomic copy. This generally is a problem, not only for suspend2. :-) Once you've unfrozen the user land, we can't rely on the hibernation image any more, because some tasks may cause the on-disk filesystems' state to change. Hence keep the driver layers read-only :) -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 30 May 2007 12:26:57 +0200 Romano Giannetti wrote: On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote: On Tue, 29 May 2007, Romano Giannetti wrote: - The good (?) news. I have made 7 suspend/resume cycle (to ram, I haven't tested hibernation) with a 2.6.21.2 with that patch, applied manually. The system did suspend and resume nicely even compiling a kernel and opening openoffice. Normally (le me stress _normally_) no delay was apparent on resume. I do not know how dangerous is this... :-) - The bad (?) news. One time out of 7 I had the 60 seconds delay. Interesting. If you can re-create it, please do the sysrq-T thing again, to see what's up. (Also, you might do sysrq-p, which gives the current process data, which sysrq-T does not). I've got it, but I had a problem: I filled the dmesg buffer. I will try to find where to enlarge it. I have posted the partial result to: use 'dmesg -s 10' if it's just dmesg(8) that needs help. If it's the kernel buffer filling up, you can rebuild the kernel after changing CONFIG_LOG_BUF_SHIFT, but it's easier just to boot using this: log_buf_len=n Sets the size of the printk ring buffer, in bytes. Format: { n | nk | nM } n must be a power of two. The default size is set in the kernel config file. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond the original point) Notice that we want to be able to suspend while hibernating -- for suspend to both behaviour. So drivers may _not_ rely on system being runnable. So keep the driver layers read-only and unfreeze the processes after doing the atomic copy. To read firmware you probably need to _write_ atimes. Anyway, make-disks-read-only patch would be welcome. I just think it is going to be more complex than freezer. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 30 May 2007, Rafael J. Wysocki wrote: Very true. And I think the right order should be to make the midlayers do this and then remove the freezer from the STR code path, not the other way around. :-) Yes. After all, STR simply shouldn't _care_. The rule should be that in a well-written setup, STR just works whether user processes are suspended or not. In other words, the whole freezing part isn't about STR. It should be totally immaterial. (Of course, that assumes that the freezing is _sane_, of course: ie the core kernel threads shouldn't all be frozen. I think Rafael's patch to turn the defaults around are a big step in the right direction). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Tue, 2007-05-29 at 14:33 -0700, Linus Torvalds wrote: > > On Wed, 30 May 2007, Nigel Cunningham wrote: > > > > On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote: > > > > > > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to > > > the filesystem VFS callers? Or is that starting to get messy again? > > > > Yeah. Getting messy again :) > > Indeed. And also misses the point - the point being that we don't actually > need to freeze anything at all most of the time. There's nothing wrong > with making memory allocations etc. > > And yes, suspend is different from hibernate. I can see how hibernate > people are worried about people writing to things after doing the > snapshot, but those concerns don't exist with suspend. With suspend, the > biggest concern is accessing a device after it has been suspended, but on > the other hand, also the fact that we end up having driver writers used > to the system being "runnable", so they do things that really do require a > full-fledged system (and sometimes that means just some delayed action > using a kernel thread, other times it seems to rely on more complex > behaviour like firmware loading :^p ) Yeah, but they can't. Even after the freezing of processes has been removed from the normal suspend to ram path, we're still going to have this issue with the suspend to ram after writing a hibernation image path. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 30 May 2007, Nigel Cunningham wrote: > > On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote: > > > > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to > > the filesystem VFS callers? Or is that starting to get messy again? > > Yeah. Getting messy again :) Indeed. And also misses the point - the point being that we don't actually need to freeze anything at all most of the time. There's nothing wrong with making memory allocations etc. And yes, suspend is different from hibernate. I can see how hibernate people are worried about people writing to things after doing the snapshot, but those concerns don't exist with suspend. With suspend, the biggest concern is accessing a device after it has been suspended, but on the other hand, also the fact that we end up having driver writers used to the system being "runnable", so they do things that really do require a full-fledged system (and sometimes that means just some delayed action using a kernel thread, other times it seems to rely on more complex behaviour like firmware loading :^p ) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote: > Nigel Cunningham wrote: > > > > I'm sorry to say it, but dropping process freezing still seems to me > > like the better way though. I prefer it because of the reliability > > aspect. With the current code, having frozen processes, I can look at > > the state of memory, calculate how much I'll need for this or that and > > know that I'll have sufficient memory for the atomic copy and for doing > > the I/O (making assumptions about how much memory drivers will > > allocate) before I start to do either. If we stop freezing processes, > > that predictability will go away. There'll always be a possibility that > > some process will get memory hungry and stop me from being able to get > > the image on disk, and I'll have to either abort or give up and try > > again and again until I can complete writing the image, the battery runs > > out or whatever... > > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to > the filesystem VFS callers? Or is that starting to get messy again? Yeah. Getting messy again :) Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tue, 29 May 2007, Romano Giannetti wrote: > > - The good (?) news. I have made 7 suspend/resume cycle (to ram, I > haven't tested hibernation) with a 2.6.21.2 with that patch, applied > manually. The system did suspend and resume nicely even compiling a > kernel and opening openoffice. Normally (le me stress _normally_) no > delay was apparent on resume. I do not know how dangerous is this... :-) > > - The bad (?) news. One time out of 7 I had the 60 seconds delay. Interesting. If you can re-create it, please do the sysrq-T thing again, to see what's up. (Also, you might do "sysrq-p", which gives the current process data, which sysrq-T does not). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Nigel Cunningham wrote: I'm sorry to say it, but dropping process freezing still seems to me like the better way though. I prefer it because of the reliability aspect. With the current code, having frozen processes, I can look at the state of memory, calculate how much I'll need for this or that and know that I'll have sufficient memory for the atomic copy and for doing the I/O (making assumptions about how much memory drivers will allocate) before I start to do either. If we stop freezing processes, that predictability will go away. There'll always be a possibility that some process will get memory hungry and stop me from being able to get the image on disk, and I'll have to either abort or give up and try again and again until I can complete writing the image, the battery runs out or whatever... How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to the filesystem VFS callers? Or is that starting to get messy again? Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Linus Torvalds wrote: On Fri, 25 May 2007, Nigel Cunningham wrote: Does that mean you never ever power off your laptop (assuming you have one), and the battery never runs out? Surely you must power it off completely sometimes? So? The bootup isn't that much worse than a disk suspend/resume, and it's reliable. I very much prefer suspend (to RAM) over hibernate (to DISK). But once in a while, primarily when travelling, I'll use hibernate. And the "swsusp" in the kernel is just plain crappy and slow, which leads many people (including our beloved chief penguin, it seems) into thinking that hibernate *has* to be too slow to be useful. But with Suspend2, it is very quick and usable by comparism. Try it, you'll like it (at least a little). Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tue, 2007-05-29 at 13:00 +0100, Michael-Luke Jones wrote: > Rafael J. Wysocki wrote: > > On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: > >> The shiny userspace firmware loading causes problems since it exists, > >> every second box has problems with it, in all sorts of situations. If > >> people are still sold to the idea of userspace firmware loading, why > >> don't we keep the data in the driver, instead of immediately > >> discarding it after the first upload? Not to waste a few hundred > >> kilobytes? That doesn't sound like a convincing deal, after all the > >> years people try to work around the issues it causes. > > > > Agreed. > > > > Rafael > > Rather than most drivers being told to make this step, can this be added > to the firmware_class such that firmware objects are cached in RAM and > subsequent calls to request_firmware() don't have to query userspace. > > This seems the least intrusive solution to this problem. Who is going to keep track of the data hiding in the firmware_class? On driver unbind, module unload, you want to release the data. Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > I guess we should warn the driver authors, then; and decide what driver > > authors should do. > > Drivers really shouldn't do anythign at all. *) > > If I'm video4linux driver for grabbing screen, have been suspended, and > > someone asks me to read a frame, should I > > > > a) return -ESORRYIMSUSPENDED > > > > b) just block the caller > > The "subsystem" thing should have stopped the queues, and the device > should never even _see_ this. Okay, _if_ there's a subsystem, subsystem should have stopped the queues. End result should be that userspace is blocked when trying to access suspended device/suspended subsystem. I guess we are in violent agreement. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tuesday, 29 May 2007 14:00, Michael-Luke Jones wrote: > Rafael J. Wysocki wrote: > > On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: > >> The shiny userspace firmware loading causes problems since it exists, > >> every second box has problems with it, in all sorts of situations. If > >> people are still sold to the idea of userspace firmware loading, why > >> don't we keep the data in the driver, instead of immediately > >> discarding it after the first upload? Not to waste a few hundred > >> kilobytes? That doesn't sound like a convincing deal, after all the > >> years people try to work around the issues it causes. > > > > Agreed. > > > > Rafael > > Rather than most drivers being told to make this step, can this be added > to the firmware_class such that firmware objects are cached in RAM and > subsequent calls to request_firmware() don't have to query userspace. > > This seems the least intrusive solution to this problem. Agreed again. :-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Rafael J. Wysocki wrote: On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Agreed. Rafael Rather than most drivers being told to make this step, can this be added to the firmware_class such that firmware objects are cached in RAM and subsequent calls to request_firmware() don't have to query userspace. This seems the least intrusive solution to this problem. Thanks, Michael-Luke - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: > On 5/25/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > On Fri, 25 May 2007, Pavel Machek wrote: > > > > > > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO > > > PEOPLE FOR FIVE YEARS NOW. > > > > And people aren't listening. Have you thought about _why_? > > > > The thing is, it should just work. Even without pre-loading. > > > > > Imageine we killed freezer. Also imagine Romano has IDE card his > > > PCMCIA slot. Kaboom, we solved nothing. > > > > Don't be silly. We solved it. The firmware has to be loadable from > > somewhere else, since otherwise his IDE card wouldn't have been accessible > > in the first place! > > The shiny userspace firmware loading causes problems since it exists, > every second box has problems with it, in all sorts of situations. If > people are still sold to the idea of userspace firmware loading, why > don't we keep the data in the driver, instead of immediately > discarding it after the first upload? Not to waste a few hundred > kilobytes? That doesn't sound like a convincing deal, after all the > years people try to work around the issues it causes. Agreed. Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, 2007-05-27 at 19:44 +0100, Matthew Garrett wrote: > > Anyway. I've tested the following patch on a dual-core x86. No obvious > issues yet, but I'll try to put it through a few hundred cycles. [patch to disable freezer deleted] First of all, excuse me for being a quite lousy tester. Could not come around to try bisecting, no time at all. Yesterday in the autobus I gave a shot to this "wild test" and I report the results here. - The good (?) news. I have made 7 suspend/resume cycle (to ram, I haven't tested hibernation) with a 2.6.21.2 with that patch, applied manually. The system did suspend and resume nicely even compiling a kernel and opening openoffice. Normally (le me stress _normally_) no delay was apparent on resume. I do not know how dangerous is this... :-) - The bad (?) news. One time out of 7 I had the 60 seconds delay. I attach here the dmesg(s) of the resumes, a good one, a delayed one, and another good one after a reboot (where you can, by the way, see the dancing serial effect... the card is sometime /dev/ttyS1, sometime /dev/ttyS2). [ 1112.984000] Back to C! [ 1112.985000] Applying VIA southbridge workaround. [ 1112.985000] PCI: Disabling Via external APIC routing [ 1113.418000] PM: Writing back config space on device :00:00.0 at offset 1 (was 216, writing 1216) [ 1113.418000] PM: Writing back config space on device :00:01.0 at offset 9 (was fff0, writing 38003800) [ 1113.418000] PCI: Setting latency timer of device :00:01.0 to 64 [ 1114.408000] ACPI: PCI Interrupt :00:07.2[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9 [ 1114.408000] PCI: Setting latency timer of device :00:07.2 to 64 [ 1114.408000] usb usb1: root hub lost power or was reset [ 1114.481000] ACPI: PCI Interrupt :00:07.3[D] -> Link [LNKD] -> GSI 9 (level, low) -> IRQ 9 [ 1114.481000] PCI: Setting latency timer of device :00:07.3 to 64 [ 1114.481000] usb usb2: root hub lost power or was reset [ 1114.657000] ACPI: PCI Interrupt :00:07.5[C] -> Link [LNKC] -> GSI 5 (level, low) -> IRQ 5 [ 1114.657000] PCI: Setting latency timer of device :00:07.5 to 64 [ 1115.347000] pccard: PCMCIA card inserted into slot 1 [ 1115.347000] pcmcia: registering new device pcmcia1.0 [ 1115.459000] pcmcia: request for exclusive IRQ could not be fulfilled. [ 1115.459000] pcmcia: the driver needs updating to supported shared IRQ lines. [ 1115.504000] eth0: 3Com 3c589, io 0x300, irq 3, hw_addr 00:00:86:1A:4E:A8 [ 1115.504000] 8K FIFO split 5:3 Rx:Tx, auto xcvr [ 1115.504000] pcmcia: registering new device pcmcia1.1 [ 1115.504000] pcmcia: request for exclusive IRQ could not be fulfilled. [ 1115.504000] pcmcia: the driver needs updating to supported shared IRQ lines. [ 1115.545000] 1.1: ttyS2 at I/O 0x3e8 (irq = 3) is a 16550A [ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 3 (was 8, writing 4008) [ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 1 (was 2100012, writing 2100016) [ 1115.609000] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[9] MMIO=[e8004000-e80047ff] Max Packet=[2048] IR/IT contexts=[4/8] [ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 5 (was 0, writing e8004800) [ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 4 (was 1, writing 1801) [ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 1 (was 290, writing 293) [ 1115.67] pnp: Device 00:08 activated. [ 1115.67] pnp: Failed to activate device 00:0a. [ 1115.67] pnp: Failed to activate device 00:0b. [ 1117.648000] 8139too Fast Ethernet driver 0.9.28 [ 1117.65] ACPI: PCI Interrupt :00:10.0[A] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10 [ 1117.653000] eth0: RealTek RTL8139 at 0xe0a72800, 08:00:46:6e:93:a8, IRQ 10 [ 1117.653000] eth0: Identified 8139 chip type 'RTL-8139C' [ 1118.403000] input: Power Button (FF) as /class/input/input13 [ 1118.404000] ACPI: Power Button (FF) [PWRF] [ 1118.404000] input: Sleep Button (CM) as /class/input/input14 [ 1118.405000] ACPI: Sleep Button (CM) [SBTN] [ 1118.406000] input: Lid Switch as /class/input/input15 [ 1118.407000] ACPI: Lid Switch [LID] [ 1118.83] ACPI: Thermal Zone [THRM] (36 C) [ 1119.431000] ACPI: AC Adapter [ACAD] (off-line) [ 1119.899000] ACPI: Battery Slot [BAT1] (battery present) [ 1119.904000] ACPI: Battery Slot [BAT2] (battery absent) [ 1126.226000] eth1: no IPv6 routers present suspend... [ 2019.31] pccard: card ejected from slot 0 [ 2019.345000] PCMCIA: socket dc99bc28: *** DANGER *** unable to remove socket power [ 2019.346000] pccard: card ejected from slot 1 [ 2020.041000] ACPI: PCI interrupt for device :00:10.0 disabled [ 2024.641000] Suspending console(s) [ 2024.656000] usbdev2.1: PM: suspend 0->2, parent usb2 already 2 [ 2024.656000] usbdev2.1_ep81: PM: suspend 0->2, parent 2-0:1.0 already 2 [ 2024.656000] hub 2-0:1.0: PM: suspend 2->2, parent usb2 already 2 [
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On 5/25/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: On Fri, 25 May 2007, Pavel Machek wrote: > > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO > PEOPLE FOR FIVE YEARS NOW. And people aren't listening. Have you thought about _why_? The thing is, it should just work. Even without pre-loading. > Imageine we killed freezer. Also imagine Romano has IDE card his > PCMCIA slot. Kaboom, we solved nothing. Don't be silly. We solved it. The firmware has to be loadable from somewhere else, since otherwise his IDE card wouldn't have been accessible in the first place! The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On 5/25/07, Linus Torvalds [EMAIL PROTECTED] wrote: On Fri, 25 May 2007, Pavel Machek wrote: 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO PEOPLE FOR FIVE YEARS NOW. And people aren't listening. Have you thought about _why_? The thing is, it should just work. Even without pre-loading. Imageine we killed freezer. Also imagine Romano has IDE card his PCMCIA slot. Kaboom, we solved nothing. Don't be silly. We solved it. The firmware has to be loadable from somewhere else, since otherwise his IDE card wouldn't have been accessible in the first place! The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Kay - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, 2007-05-27 at 19:44 +0100, Matthew Garrett wrote: Anyway. I've tested the following patch on a dual-core x86. No obvious issues yet, but I'll try to put it through a few hundred cycles. [patch to disable freezer deleted] First of all, excuse me for being a quite lousy tester. Could not come around to try bisecting, no time at all. Yesterday in the autobus I gave a shot to this wild test and I report the results here. - The good (?) news. I have made 7 suspend/resume cycle (to ram, I haven't tested hibernation) with a 2.6.21.2 with that patch, applied manually. The system did suspend and resume nicely even compiling a kernel and opening openoffice. Normally (le me stress _normally_) no delay was apparent on resume. I do not know how dangerous is this... :-) - The bad (?) news. One time out of 7 I had the 60 seconds delay. I attach here the dmesg(s) of the resumes, a good one, a delayed one, and another good one after a reboot (where you can, by the way, see the dancing serial effect... the card is sometime /dev/ttyS1, sometime /dev/ttyS2). [ 1112.984000] Back to C! [ 1112.985000] Applying VIA southbridge workaround. [ 1112.985000] PCI: Disabling Via external APIC routing [ 1113.418000] PM: Writing back config space on device :00:00.0 at offset 1 (was 216, writing 1216) [ 1113.418000] PM: Writing back config space on device :00:01.0 at offset 9 (was fff0, writing 38003800) [ 1113.418000] PCI: Setting latency timer of device :00:01.0 to 64 [ 1114.408000] ACPI: PCI Interrupt :00:07.2[D] - Link [LNKD] - GSI 9 (level, low) - IRQ 9 [ 1114.408000] PCI: Setting latency timer of device :00:07.2 to 64 [ 1114.408000] usb usb1: root hub lost power or was reset [ 1114.481000] ACPI: PCI Interrupt :00:07.3[D] - Link [LNKD] - GSI 9 (level, low) - IRQ 9 [ 1114.481000] PCI: Setting latency timer of device :00:07.3 to 64 [ 1114.481000] usb usb2: root hub lost power or was reset [ 1114.657000] ACPI: PCI Interrupt :00:07.5[C] - Link [LNKC] - GSI 5 (level, low) - IRQ 5 [ 1114.657000] PCI: Setting latency timer of device :00:07.5 to 64 [ 1115.347000] pccard: PCMCIA card inserted into slot 1 [ 1115.347000] pcmcia: registering new device pcmcia1.0 [ 1115.459000] pcmcia: request for exclusive IRQ could not be fulfilled. [ 1115.459000] pcmcia: the driver needs updating to supported shared IRQ lines. [ 1115.504000] eth0: 3Com 3c589, io 0x300, irq 3, hw_addr 00:00:86:1A:4E:A8 [ 1115.504000] 8K FIFO split 5:3 Rx:Tx, auto xcvr [ 1115.504000] pcmcia: registering new device pcmcia1.1 [ 1115.504000] pcmcia: request for exclusive IRQ could not be fulfilled. [ 1115.504000] pcmcia: the driver needs updating to supported shared IRQ lines. [ 1115.545000] 1.1: ttyS2 at I/O 0x3e8 (irq = 3) is a 16550A [ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 3 (was 8, writing 4008) [ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 1 (was 2100012, writing 2100016) [ 1115.609000] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[9] MMIO=[e8004000-e80047ff] Max Packet=[2048] IR/IT contexts=[4/8] [ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 5 (was 0, writing e8004800) [ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 4 (was 1, writing 1801) [ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 1 (was 290, writing 293) [ 1115.67] pnp: Device 00:08 activated. [ 1115.67] pnp: Failed to activate device 00:0a. [ 1115.67] pnp: Failed to activate device 00:0b. [ 1117.648000] 8139too Fast Ethernet driver 0.9.28 [ 1117.65] ACPI: PCI Interrupt :00:10.0[A] - Link [LNKB] - GSI 10 (level, low) - IRQ 10 [ 1117.653000] eth0: RealTek RTL8139 at 0xe0a72800, 08:00:46:6e:93:a8, IRQ 10 [ 1117.653000] eth0: Identified 8139 chip type 'RTL-8139C' [ 1118.403000] input: Power Button (FF) as /class/input/input13 [ 1118.404000] ACPI: Power Button (FF) [PWRF] [ 1118.404000] input: Sleep Button (CM) as /class/input/input14 [ 1118.405000] ACPI: Sleep Button (CM) [SBTN] [ 1118.406000] input: Lid Switch as /class/input/input15 [ 1118.407000] ACPI: Lid Switch [LID] [ 1118.83] ACPI: Thermal Zone [THRM] (36 C) [ 1119.431000] ACPI: AC Adapter [ACAD] (off-line) [ 1119.899000] ACPI: Battery Slot [BAT1] (battery present) [ 1119.904000] ACPI: Battery Slot [BAT2] (battery absent) [ 1126.226000] eth1: no IPv6 routers present suspend... [ 2019.31] pccard: card ejected from slot 0 [ 2019.345000] PCMCIA: socket dc99bc28: *** DANGER *** unable to remove socket power [ 2019.346000] pccard: card ejected from slot 1 [ 2020.041000] ACPI: PCI interrupt for device :00:10.0 disabled [ 2024.641000] Suspending console(s) [ 2024.656000] usbdev2.1: PM: suspend 0-2, parent usb2 already 2 [ 2024.656000] usbdev2.1_ep81: PM: suspend 0-2, parent 2-0:1.0 already 2 [ 2024.656000] hub 2-0:1.0: PM: suspend 2-2, parent usb2 already 2 [ 2024.656000]
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: On 5/25/07, Linus Torvalds [EMAIL PROTECTED] wrote: On Fri, 25 May 2007, Pavel Machek wrote: 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO PEOPLE FOR FIVE YEARS NOW. And people aren't listening. Have you thought about _why_? The thing is, it should just work. Even without pre-loading. Imageine we killed freezer. Also imagine Romano has IDE card his PCMCIA slot. Kaboom, we solved nothing. Don't be silly. We solved it. The firmware has to be loadable from somewhere else, since otherwise his IDE card wouldn't have been accessible in the first place! The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Agreed. Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Rafael J. Wysocki wrote: On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Agreed. Rafael Rather than most drivers being told to make this step, can this be added to the firmware_class such that firmware objects are cached in RAM and subsequent calls to request_firmware() don't have to query userspace. This seems the least intrusive solution to this problem. Thanks, Michael-Luke - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tuesday, 29 May 2007 14:00, Michael-Luke Jones wrote: Rafael J. Wysocki wrote: On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Agreed. Rafael Rather than most drivers being told to make this step, can this be added to the firmware_class such that firmware objects are cached in RAM and subsequent calls to request_firmware() don't have to query userspace. This seems the least intrusive solution to this problem. Agreed again. :-) Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! I guess we should warn the driver authors, then; and decide what driver authors should do. Drivers really shouldn't do anythign at all. *) If I'm video4linux driver for grabbing screen, have been suspended, and someone asks me to read a frame, should I a) return -ESORRYIMSUSPENDED b) just block the caller The subsystem thing should have stopped the queues, and the device should never even _see_ this. Okay, _if_ there's a subsystem, subsystem should have stopped the queues. End result should be that userspace is blocked when trying to access suspended device/suspended subsystem. I guess we are in violent agreement. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tue, 2007-05-29 at 13:00 +0100, Michael-Luke Jones wrote: Rafael J. Wysocki wrote: On Tuesday, 29 May 2007 08:55, Kay Sievers wrote: The shiny userspace firmware loading causes problems since it exists, every second box has problems with it, in all sorts of situations. If people are still sold to the idea of userspace firmware loading, why don't we keep the data in the driver, instead of immediately discarding it after the first upload? Not to waste a few hundred kilobytes? That doesn't sound like a convincing deal, after all the years people try to work around the issues it causes. Agreed. Rafael Rather than most drivers being told to make this step, can this be added to the firmware_class such that firmware objects are cached in RAM and subsequent calls to request_firmware() don't have to query userspace. This seems the least intrusive solution to this problem. Who is going to keep track of the data hiding in the firmware_class? On driver unbind, module unload, you want to release the data. Kay - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Linus Torvalds wrote: On Fri, 25 May 2007, Nigel Cunningham wrote: Does that mean you never ever power off your laptop (assuming you have one), and the battery never runs out? Surely you must power it off completely sometimes? So? The bootup isn't that much worse than a disk suspend/resume, and it's reliable. I very much prefer suspend (to RAM) over hibernate (to DISK). But once in a while, primarily when travelling, I'll use hibernate. And the swsusp in the kernel is just plain crappy and slow, which leads many people (including our beloved chief penguin, it seems) into thinking that hibernate *has* to be too slow to be useful. But with Suspend2, it is very quick and usable by comparism. Try it, you'll like it (at least a little). Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Nigel Cunningham wrote: I'm sorry to say it, but dropping process freezing still seems to me like the better way though. I prefer it because of the reliability aspect. With the current code, having frozen processes, I can look at the state of memory, calculate how much I'll need for this or that and know that I'll have sufficient memory for the atomic copy and for doing the I/O (making assumptions about how much memory drivers will allocate) before I start to do either. If we stop freezing processes, that predictability will go away. There'll always be a possibility that some process will get memory hungry and stop me from being able to get the image on disk, and I'll have to either abort or give up and try again and again until I can complete writing the image, the battery runs out or whatever... How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to the filesystem VFS callers? Or is that starting to get messy again? Cheers - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Tue, 29 May 2007, Romano Giannetti wrote: - The good (?) news. I have made 7 suspend/resume cycle (to ram, I haven't tested hibernation) with a 2.6.21.2 with that patch, applied manually. The system did suspend and resume nicely even compiling a kernel and opening openoffice. Normally (le me stress _normally_) no delay was apparent on resume. I do not know how dangerous is this... :-) - The bad (?) news. One time out of 7 I had the 60 seconds delay. Interesting. If you can re-create it, please do the sysrq-T thing again, to see what's up. (Also, you might do sysrq-p, which gives the current process data, which sysrq-T does not). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote: Nigel Cunningham wrote: I'm sorry to say it, but dropping process freezing still seems to me like the better way though. I prefer it because of the reliability aspect. With the current code, having frozen processes, I can look at the state of memory, calculate how much I'll need for this or that and know that I'll have sufficient memory for the atomic copy and for doing the I/O (making assumptions about how much memory drivers will allocate) before I start to do either. If we stop freezing processes, that predictability will go away. There'll always be a possibility that some process will get memory hungry and stop me from being able to get the image on disk, and I'll have to either abort or give up and try again and again until I can complete writing the image, the battery runs out or whatever... How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to the filesystem VFS callers? Or is that starting to get messy again? Yeah. Getting messy again :) Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Wed, 30 May 2007, Nigel Cunningham wrote: On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote: How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to the filesystem VFS callers? Or is that starting to get messy again? Yeah. Getting messy again :) Indeed. And also misses the point - the point being that we don't actually need to freeze anything at all most of the time. There's nothing wrong with making memory allocations etc. And yes, suspend is different from hibernate. I can see how hibernate people are worried about people writing to things after doing the snapshot, but those concerns don't exist with suspend. With suspend, the biggest concern is accessing a device after it has been suspended, but on the other hand, also the fact that we end up having driver writers used to the system being runnable, so they do things that really do require a full-fledged system (and sometimes that means just some delayed action using a kernel thread, other times it seems to rely on more complex behaviour like firmware loading :^p ) Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Tue, 2007-05-29 at 14:33 -0700, Linus Torvalds wrote: On Wed, 30 May 2007, Nigel Cunningham wrote: On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote: How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to the filesystem VFS callers? Or is that starting to get messy again? Yeah. Getting messy again :) Indeed. And also misses the point - the point being that we don't actually need to freeze anything at all most of the time. There's nothing wrong with making memory allocations etc. And yes, suspend is different from hibernate. I can see how hibernate people are worried about people writing to things after doing the snapshot, but those concerns don't exist with suspend. With suspend, the biggest concern is accessing a device after it has been suspended, but on the other hand, also the fact that we end up having driver writers used to the system being runnable, so they do things that really do require a full-fledged system (and sometimes that means just some delayed action using a kernel thread, other times it seems to rely on more complex behaviour like firmware loading :^p ) Yeah, but they can't. Even after the freezing of processes has been removed from the normal suspend to ram path, we're still going to have this issue with the suspend to ram after writing a hibernation image path. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: [stable] pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, May 28, 2007 at 09:53:50AM -0700, Linus Torvalds wrote: > > Before we suspend a device, we call the subsystem that that device has > been registered with. Ie, we have code like this: > > if (dev->class && dev->class->suspend) > error = dev->class->suspend(dev, state); > > which was very much designed so that individual devices wouldn't have to > always know - if the upper layer devices for that class can handle these > things, they should. > > Do people actually _do_ this, right now? No. But we do actually have the > infrastructure, and I think we have one or two classes that actually do > use it (at least the "rfkill_class" has a suspend member, dunno how well > this model actually works). > > I think Greg had some patches to make network drivers use this, for > example. Network drivers right now all end up doing stuff that really > doesn't belong in the driver at all when they suspend, and the > infrastructure _should_ just do it for them (ie do all the _network_ > related stuff, as opposed to the actual hardware-related stuff). Yes, I started to work on it, as it is the correct thing to do, but got sidetracked, sorry :( > (Examples of things that we probably _should_ do for network devices on a > class level: > > suspend: > netif_poll_disable() > if (netif_running(dev)) > dev->stop(dev); > > resume: > if (netif_running(dev)) > dev->start(dev); > netif_poll_enable(dev); > > or something similar). I'll try to hack something together later this week along this line and see how it works... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Mon, 2007-05-28 at 14:03 +0100, Matthew Garrett wrote: > On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote: > > > Well, PPC people are aware of this, and they think they can fix the > > drivers. We probably want to drop the freezer for suspend long-term, > > so. PPC machines use small subset of all the drivers, so it apparently > > is not big problem for them. > > I'm fairly certain that PPC uses USB. In any case, it's not limited to > PPC - APM has the same issue. Any driver that assumes processes will be > frozen during suspend to RAM is broken now, not the future. The converse is also true, though. Any process that assumes processes aren't frozen during suspend to RAM is also broken now, and will be while we allow the possibility of suspending to ram after writing a hibernation image. In short, drivers should be designed to work whether processes are frozen or not. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, 28 May 2007, Pavel Machek wrote: > > I guess we should warn the driver authors, then; and decide what driver > authors should do. Drivers really shouldn't do anythign at all. > If I'm video4linux driver for grabbing screen, have been suspended, and > someone asks me to read a frame, should I > > a) return -ESORRYIMSUSPENDED > > b) just block the caller The "subsystem" thing should have stopped the queues, and the device should never even _see_ this. Before we suspend a device, we call the subsystem that that device has been registered with. Ie, we have code like this: if (dev->class && dev->class->suspend) error = dev->class->suspend(dev, state); which was very much designed so that individual devices wouldn't have to always know - if the upper layer devices for that class can handle these things, they should. Do people actually _do_ this, right now? No. But we do actually have the infrastructure, and I think we have one or two classes that actually do use it (at least the "rfkill_class" has a suspend member, dunno how well this model actually works). I think Greg had some patches to make network drivers use this, for example. Network drivers right now all end up doing stuff that really doesn't belong in the driver at all when they suspend, and the infrastructure _should_ just do it for them (ie do all the _network_ related stuff, as opposed to the actual hardware-related stuff). (Examples of things that we probably _should_ do for network devices on a class level: suspend: netif_poll_disable() if (netif_running(dev)) dev->stop(dev); resume: if (netif_running(dev)) dev->start(dev); netif_poll_enable(dev); or something similar). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > Well, PPC people are aware of this, and they think they can fix the > > drivers. We probably want to drop the freezer for suspend long-term, > > so. PPC machines use small subset of all the drivers, so it apparently > > is not big problem for them. > > I'm fairly certain that PPC uses USB. In any case, it's not limited to > PPC - APM has the same issue. Any driver that assumes processes will be > frozen during suspend to RAM is broken now, not the future. Yup, that's a possible view. Fixes welcome. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote: > Well, PPC people are aware of this, and they think they can fix the > drivers. We probably want to drop the freezer for suspend long-term, > so. PPC machines use small subset of all the drivers, so it apparently > is not big problem for them. I'm fairly certain that PPC uses USB. In any case, it's not limited to PPC - APM has the same issue. Any driver that assumes processes will be frozen during suspend to RAM is broken now, not the future. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > > This /mostly/ works - I've had my test machine cycling through a suspend > > > cycle every 10 seconds for the past hour without any difficulties > > > providing I unload USB first. If USB is loaded, the suspend occasionally > > > fails with one of the devices returning -EBUSY and causing it to be > > > aborted. I haven't looked into this in any detail yet, but it's > > > presumably sufficiently generic code that it's potentially biting people > > > on PPC anyway. > > > > Most probably. > > > > Still, please take what I said in the other thread into consideration: We've > > been using the freezer for so long that at least some drivers started to > > rely > > on it being used. > > > > Even if there are no such drivers on your system, they can be used by other > > systems. > > Sure, but if any of these drivers run on PPC then they're broken anyway. > The assumption that processes will be frozen during suspend is true in > the specific case of ACPI and some of the ARM platforms, but not true on > PPC or APM systems. We either need to fix the drivers to stop assuming > this or add the process freezer to the other PM systems. Right now, > they're buggy. Well, PPC people are aware of this, and they think they can fix the drivers. We probably want to drop the freezer for suspend long-term, so. PPC machines use small subset of all the drivers, so it apparently is not big problem for them. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, May 28, 2007 at 10:11:15AM +0200, Rafael J. Wysocki wrote: > On Monday, 28 May 2007 03:05, Matthew Garrett wrote: > > This /mostly/ works - I've had my test machine cycling through a suspend > > cycle every 10 seconds for the past hour without any difficulties > > providing I unload USB first. If USB is loaded, the suspend occasionally > > fails with one of the devices returning -EBUSY and causing it to be > > aborted. I haven't looked into this in any detail yet, but it's > > presumably sufficiently generic code that it's potentially biting people > > on PPC anyway. > > Most probably. > > Still, please take what I said in the other thread into consideration: We've > been using the freezer for so long that at least some drivers started to rely > on it being used. > > Even if there are no such drivers on your system, they can be used by other > systems. Sure, but if any of these drivers run on PPC then they're broken anyway. The assumption that processes will be frozen during suspend is true in the specific case of ACPI and some of the ARM platforms, but not true on PPC or APM systems. We either need to fix the drivers to stop assuming this or add the process freezer to the other PM systems. Right now, they're buggy. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > As far as I can tell the PPC code simply shuts down the devices without > > worrying about userspace at all. If this was reliable, what prevents us > > from simply disabling the freezer for STR? > > Personally, I think that's the right thing to do. > > And by "disabling the freezer", I think we should just not call it at all. > However, sadly, right now it's called from common code. I'll happily take > a tested patch to split that code sequence up, and try to do it in 2.6.23, > if somebody has the energy (I'm getting to the point where I may just do > it myself, but my lazy nature still hopes for a STR person to step > forward). I guess we should warn the driver authors, then; and decide what driver authors should do. If I'm video4linux driver for grabbing screen, have been suspended, and someone asks me to read a frame, should I a) return -ESORRYIMSUSPENDED b) just block the caller ? a) seems ugly to my eyes (userspace should not know about suspend). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Monday, 28 May 2007 03:05, Matthew Garrett wrote: > On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote: > > > Anyway. I've tested the following patch on a dual-core x86. No obvious > > issues yet, but I'll try to put it through a few hundred cycles. > > This /mostly/ works - I've had my test machine cycling through a suspend > cycle every 10 seconds for the past hour without any difficulties > providing I unload USB first. If USB is loaded, the suspend occasionally > fails with one of the devices returning -EBUSY and causing it to be > aborted. I haven't looked into this in any detail yet, but it's > presumably sufficiently generic code that it's potentially biting people > on PPC anyway. Most probably. Still, please take what I said in the other thread into consideration: We've been using the freezer for so long that at least some drivers started to rely on it being used. Even if there are no such drivers on your system, they can be used by other systems. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Monday, 28 May 2007 03:05, Matthew Garrett wrote: On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote: Anyway. I've tested the following patch on a dual-core x86. No obvious issues yet, but I'll try to put it through a few hundred cycles. This /mostly/ works - I've had my test machine cycling through a suspend cycle every 10 seconds for the past hour without any difficulties providing I unload USB first. If USB is loaded, the suspend occasionally fails with one of the devices returning -EBUSY and causing it to be aborted. I haven't looked into this in any detail yet, but it's presumably sufficiently generic code that it's potentially biting people on PPC anyway. Most probably. Still, please take what I said in the other thread into consideration: We've been using the freezer for so long that at least some drivers started to rely on it being used. Even if there are no such drivers on your system, they can be used by other systems. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! As far as I can tell the PPC code simply shuts down the devices without worrying about userspace at all. If this was reliable, what prevents us from simply disabling the freezer for STR? Personally, I think that's the right thing to do. And by disabling the freezer, I think we should just not call it at all. However, sadly, right now it's called from common code. I'll happily take a tested patch to split that code sequence up, and try to do it in 2.6.23, if somebody has the energy (I'm getting to the point where I may just do it myself, but my lazy nature still hopes for a STR person to step forward). I guess we should warn the driver authors, then; and decide what driver authors should do. If I'm video4linux driver for grabbing screen, have been suspended, and someone asks me to read a frame, should I a) return -ESORRYIMSUSPENDED b) just block the caller ? a) seems ugly to my eyes (userspace should not know about suspend). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, May 28, 2007 at 10:11:15AM +0200, Rafael J. Wysocki wrote: On Monday, 28 May 2007 03:05, Matthew Garrett wrote: This /mostly/ works - I've had my test machine cycling through a suspend cycle every 10 seconds for the past hour without any difficulties providing I unload USB first. If USB is loaded, the suspend occasionally fails with one of the devices returning -EBUSY and causing it to be aborted. I haven't looked into this in any detail yet, but it's presumably sufficiently generic code that it's potentially biting people on PPC anyway. Most probably. Still, please take what I said in the other thread into consideration: We've been using the freezer for so long that at least some drivers started to rely on it being used. Even if there are no such drivers on your system, they can be used by other systems. Sure, but if any of these drivers run on PPC then they're broken anyway. The assumption that processes will be frozen during suspend is true in the specific case of ACPI and some of the ARM platforms, but not true on PPC or APM systems. We either need to fix the drivers to stop assuming this or add the process freezer to the other PM systems. Right now, they're buggy. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! This /mostly/ works - I've had my test machine cycling through a suspend cycle every 10 seconds for the past hour without any difficulties providing I unload USB first. If USB is loaded, the suspend occasionally fails with one of the devices returning -EBUSY and causing it to be aborted. I haven't looked into this in any detail yet, but it's presumably sufficiently generic code that it's potentially biting people on PPC anyway. Most probably. Still, please take what I said in the other thread into consideration: We've been using the freezer for so long that at least some drivers started to rely on it being used. Even if there are no such drivers on your system, they can be used by other systems. Sure, but if any of these drivers run on PPC then they're broken anyway. The assumption that processes will be frozen during suspend is true in the specific case of ACPI and some of the ARM platforms, but not true on PPC or APM systems. We either need to fix the drivers to stop assuming this or add the process freezer to the other PM systems. Right now, they're buggy. Well, PPC people are aware of this, and they think they can fix the drivers. We probably want to drop the freezer for suspend long-term, so. PPC machines use small subset of all the drivers, so it apparently is not big problem for them. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote: Well, PPC people are aware of this, and they think they can fix the drivers. We probably want to drop the freezer for suspend long-term, so. PPC machines use small subset of all the drivers, so it apparently is not big problem for them. I'm fairly certain that PPC uses USB. In any case, it's not limited to PPC - APM has the same issue. Any driver that assumes processes will be frozen during suspend to RAM is broken now, not the future. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! Well, PPC people are aware of this, and they think they can fix the drivers. We probably want to drop the freezer for suspend long-term, so. PPC machines use small subset of all the drivers, so it apparently is not big problem for them. I'm fairly certain that PPC uses USB. In any case, it's not limited to PPC - APM has the same issue. Any driver that assumes processes will be frozen during suspend to RAM is broken now, not the future. Yup, that's a possible view. Fixes welcome. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, 28 May 2007, Pavel Machek wrote: I guess we should warn the driver authors, then; and decide what driver authors should do. Drivers really shouldn't do anythign at all. If I'm video4linux driver for grabbing screen, have been suspended, and someone asks me to read a frame, should I a) return -ESORRYIMSUSPENDED b) just block the caller The subsystem thing should have stopped the queues, and the device should never even _see_ this. Before we suspend a device, we call the subsystem that that device has been registered with. Ie, we have code like this: if (dev-class dev-class-suspend) error = dev-class-suspend(dev, state); which was very much designed so that individual devices wouldn't have to always know - if the upper layer devices for that class can handle these things, they should. Do people actually _do_ this, right now? No. But we do actually have the infrastructure, and I think we have one or two classes that actually do use it (at least the rfkill_class has a suspend member, dunno how well this model actually works). I think Greg had some patches to make network drivers use this, for example. Network drivers right now all end up doing stuff that really doesn't belong in the driver at all when they suspend, and the infrastructure _should_ just do it for them (ie do all the _network_ related stuff, as opposed to the actual hardware-related stuff). (Examples of things that we probably _should_ do for network devices on a class level: suspend: netif_poll_disable() if (netif_running(dev)) dev-stop(dev); resume: if (netif_running(dev)) dev-start(dev); netif_poll_enable(dev); or something similar). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Mon, 2007-05-28 at 14:03 +0100, Matthew Garrett wrote: On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote: Well, PPC people are aware of this, and they think they can fix the drivers. We probably want to drop the freezer for suspend long-term, so. PPC machines use small subset of all the drivers, so it apparently is not big problem for them. I'm fairly certain that PPC uses USB. In any case, it's not limited to PPC - APM has the same issue. Any driver that assumes processes will be frozen during suspend to RAM is broken now, not the future. The converse is also true, though. Any process that assumes processes aren't frozen during suspend to RAM is also broken now, and will be while we allow the possibility of suspending to ram after writing a hibernation image. In short, drivers should be designed to work whether processes are frozen or not. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: [stable] pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Mon, May 28, 2007 at 09:53:50AM -0700, Linus Torvalds wrote: Before we suspend a device, we call the subsystem that that device has been registered with. Ie, we have code like this: if (dev-class dev-class-suspend) error = dev-class-suspend(dev, state); which was very much designed so that individual devices wouldn't have to always know - if the upper layer devices for that class can handle these things, they should. Do people actually _do_ this, right now? No. But we do actually have the infrastructure, and I think we have one or two classes that actually do use it (at least the rfkill_class has a suspend member, dunno how well this model actually works). I think Greg had some patches to make network drivers use this, for example. Network drivers right now all end up doing stuff that really doesn't belong in the driver at all when they suspend, and the infrastructure _should_ just do it for them (ie do all the _network_ related stuff, as opposed to the actual hardware-related stuff). Yes, I started to work on it, as it is the correct thing to do, but got sidetracked, sorry :( (Examples of things that we probably _should_ do for network devices on a class level: suspend: netif_poll_disable() if (netif_running(dev)) dev-stop(dev); resume: if (netif_running(dev)) dev-start(dev); netif_poll_enable(dev); or something similar). I'll try to hack something together later this week along this line and see how it works... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote: > Anyway. I've tested the following patch on a dual-core x86. No obvious > issues yet, but I'll try to put it through a few hundred cycles. This /mostly/ works - I've had my test machine cycling through a suspend cycle every 10 seconds for the past hour without any difficulties providing I unload USB first. If USB is loaded, the suspend occasionally fails with one of the devices returning -EBUSY and causing it to be aborted. I haven't looked into this in any detail yet, but it's presumably sufficiently generic code that it's potentially biting people on PPC anyway. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sunday, 27 May 2007 20:44, Matthew Garrett wrote: > On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote: > > > In particular, please see this message: > > > > https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html > > Yes, there's also the notifier chain for the hardware. However, very few > drivers seem to use that - adb seems to be the only one still in the > tree. For everything else, the device tree is used in exactly the same > way as on x86. If it's safe on Macs but not on x86, then (as far as I > can tell) it looks like it's only by luck. > > Anyway. I've tested the following patch on a dual-core x86. No obvious > issues yet, but I'll try to put it through a few hundred cycles. OK I'm working on a patch that introduces hibernation/suspend notifiers. It will conflict with this one a bit, but OTOH it might be useful here too. I'll post it in a while in a separate thread. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote: > In particular, please see this message: > > https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html Yes, there's also the notifier chain for the hardware. However, very few drivers seem to use that - adb seems to be the only one still in the tree. For everything else, the device tree is used in exactly the same way as on x86. If it's safe on Macs but not on x86, then (as far as I can tell) it looks like it's only by luck. Anyway. I've tested the following patch on a dual-core x86. No obvious issues yet, but I'll try to put it through a few hundred cycles. diff --git a/include/linux/pm.h b/include/linux/pm.h diff --git a/kernel/power/main.c b/kernel/power/main.c index 8812985..1db3012 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include "power.h" @@ -66,9 +65,10 @@ static inline void pm_finish(suspend_state_t state) * suspend_prepare - Do prep work before entering low-power state. * @state: State we're entering. * - * This is common code that is called for each state that we're - * entering. Allocate a console, stop all processes, then make sure - * the platform can enter the requested state. + * This is common code that is called for each state that we're + * entering. Allocate a console, then make sure the platform can + * enter the requested state. This is not called for + * suspend-to-disk. */ static int suspend_prepare(suspend_state_t state) @@ -81,11 +81,6 @@ static int suspend_prepare(suspend_state_t state) pm_prepare_console(); - if (freeze_processes()) { - error = -EAGAIN; - goto Thaw; - } - if ((free_pages = global_page_state(NR_FREE_PAGES)) < FREE_PAGE_NUMBER) { pr_debug("PM: free some memory\n"); @@ -93,7 +88,7 @@ static int suspend_prepare(suspend_state_t state) if (nr_free_pages() < FREE_PAGE_NUMBER) { error = -ENOMEM; printk(KERN_ERR "PM: No enough memory\n"); - goto Thaw; + goto Exit; } } @@ -118,8 +113,7 @@ static int suspend_prepare(suspend_state_t state) device_resume(); Resume_console: resume_console(); - Thaw: - thaw_processes(); + Exit: pm_restore_console(); return error; } @@ -160,8 +154,8 @@ int suspend_enter(suspend_state_t state) * suspend_finish - Do final work before exiting suspend sequence. * @state: State we're coming out of. * - * Call platform code to clean up, restart processes, and free the - * console that we've allocated. This is not called for suspend-to-disk. + * Call platform code to clean up and free the console that we've + * allocated. This is not called for suspend-to-disk. */ static void suspend_finish(suspend_state_t state) @@ -170,7 +164,6 @@ static void suspend_finish(suspend_state_t state) pm_finish(state); device_resume(); resume_console(); - thaw_processes(); pm_restore_console(); } -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sunday, 27 May 2007 18:43, Matthew Garrett wrote: > On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote: > > > And by "disabling the freezer", I think we should just not call it at all. > > However, sadly, right now it's called from common code. I'll happily take > > a tested patch to split that code sequence up, and try to do it in 2.6.23, > > if somebody has the energy (I'm getting to the point where I may just do > > it myself, but my lazy nature still hopes for a STR person to step > > forward). > > I'll take a look at this. It probably makes sense to build on Rafael's > work on splitting the codepaths up. Actaully, removing the freezer from the suspend code path is simple. You only need to remove calls to freeze_processes() and thaw_processes() from kernel/power/main.c . That said, I don't think that PPC does what you say only. We've discussed this a bit on linux-pm, in this thread: https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012242.html In particular, please see this message: https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote: > And by "disabling the freezer", I think we should just not call it at all. > However, sadly, right now it's called from common code. I'll happily take > a tested patch to split that code sequence up, and try to do it in 2.6.23, > if somebody has the energy (I'm getting to the point where I may just do > it myself, but my lazy nature still hopes for a STR person to step > forward). I'll take a look at this. It probably makes sense to build on Rafael's work on splitting the codepaths up. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, 27 May 2007, Matthew Garrett wrote: > > As far as I can tell the PPC code simply shuts down the devices without > worrying about userspace at all. If this was reliable, what prevents us > from simply disabling the freezer for STR? Personally, I think that's the right thing to do. And by "disabling the freezer", I think we should just not call it at all. However, sadly, right now it's called from common code. I'll happily take a tested patch to split that code sequence up, and try to do it in 2.6.23, if somebody has the energy (I'm getting to the point where I may just do it myself, but my lazy nature still hopes for a STR person to step forward). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Thu, May 24, 2007 at 03:53:28PM -0700, Linus Torvalds wrote: > And I repeat: PowerPC had working and stable suspend five _years_ ago, > without any of that freezing crud. We should rip it out. As far as I can tell the PPC code simply shuts down the devices without worrying about userspace at all. If this was reliable, what prevents us from simply disabling the freezer for STR? -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Thu, May 24, 2007 at 03:53:28PM -0700, Linus Torvalds wrote: And I repeat: PowerPC had working and stable suspend five _years_ ago, without any of that freezing crud. We should rip it out. As far as I can tell the PPC code simply shuts down the devices without worrying about userspace at all. If this was reliable, what prevents us from simply disabling the freezer for STR? -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, 27 May 2007, Matthew Garrett wrote: As far as I can tell the PPC code simply shuts down the devices without worrying about userspace at all. If this was reliable, what prevents us from simply disabling the freezer for STR? Personally, I think that's the right thing to do. And by disabling the freezer, I think we should just not call it at all. However, sadly, right now it's called from common code. I'll happily take a tested patch to split that code sequence up, and try to do it in 2.6.23, if somebody has the energy (I'm getting to the point where I may just do it myself, but my lazy nature still hopes for a STR person to step forward). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote: And by disabling the freezer, I think we should just not call it at all. However, sadly, right now it's called from common code. I'll happily take a tested patch to split that code sequence up, and try to do it in 2.6.23, if somebody has the energy (I'm getting to the point where I may just do it myself, but my lazy nature still hopes for a STR person to step forward). I'll take a look at this. It probably makes sense to build on Rafael's work on splitting the codepaths up. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sunday, 27 May 2007 18:43, Matthew Garrett wrote: On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote: And by disabling the freezer, I think we should just not call it at all. However, sadly, right now it's called from common code. I'll happily take a tested patch to split that code sequence up, and try to do it in 2.6.23, if somebody has the energy (I'm getting to the point where I may just do it myself, but my lazy nature still hopes for a STR person to step forward). I'll take a look at this. It probably makes sense to build on Rafael's work on splitting the codepaths up. Actaully, removing the freezer from the suspend code path is simple. You only need to remove calls to freeze_processes() and thaw_processes() from kernel/power/main.c . That said, I don't think that PPC does what you say only. We've discussed this a bit on linux-pm, in this thread: https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012242.html In particular, please see this message: https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote: In particular, please see this message: https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html Yes, there's also the notifier chain for the hardware. However, very few drivers seem to use that - adb seems to be the only one still in the tree. For everything else, the device tree is used in exactly the same way as on x86. If it's safe on Macs but not on x86, then (as far as I can tell) it looks like it's only by luck. Anyway. I've tested the following patch on a dual-core x86. No obvious issues yet, but I'll try to put it through a few hundred cycles. diff --git a/include/linux/pm.h b/include/linux/pm.h diff --git a/kernel/power/main.c b/kernel/power/main.c index 8812985..1db3012 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -19,7 +19,6 @@ #include linux/console.h #include linux/cpu.h #include linux/resume-trace.h -#include linux/freezer.h #include linux/vmstat.h #include power.h @@ -66,9 +65,10 @@ static inline void pm_finish(suspend_state_t state) * suspend_prepare - Do prep work before entering low-power state. * @state: State we're entering. * - * This is common code that is called for each state that we're - * entering. Allocate a console, stop all processes, then make sure - * the platform can enter the requested state. + * This is common code that is called for each state that we're + * entering. Allocate a console, then make sure the platform can + * enter the requested state. This is not called for + * suspend-to-disk. */ static int suspend_prepare(suspend_state_t state) @@ -81,11 +81,6 @@ static int suspend_prepare(suspend_state_t state) pm_prepare_console(); - if (freeze_processes()) { - error = -EAGAIN; - goto Thaw; - } - if ((free_pages = global_page_state(NR_FREE_PAGES)) FREE_PAGE_NUMBER) { pr_debug(PM: free some memory\n); @@ -93,7 +88,7 @@ static int suspend_prepare(suspend_state_t state) if (nr_free_pages() FREE_PAGE_NUMBER) { error = -ENOMEM; printk(KERN_ERR PM: No enough memory\n); - goto Thaw; + goto Exit; } } @@ -118,8 +113,7 @@ static int suspend_prepare(suspend_state_t state) device_resume(); Resume_console: resume_console(); - Thaw: - thaw_processes(); + Exit: pm_restore_console(); return error; } @@ -160,8 +154,8 @@ int suspend_enter(suspend_state_t state) * suspend_finish - Do final work before exiting suspend sequence. * @state: State we're coming out of. * - * Call platform code to clean up, restart processes, and free the - * console that we've allocated. This is not called for suspend-to-disk. + * Call platform code to clean up and free the console that we've + * allocated. This is not called for suspend-to-disk. */ static void suspend_finish(suspend_state_t state) @@ -170,7 +164,6 @@ static void suspend_finish(suspend_state_t state) pm_finish(state); device_resume(); resume_console(); - thaw_processes(); pm_restore_console(); } -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sunday, 27 May 2007 20:44, Matthew Garrett wrote: On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote: In particular, please see this message: https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html Yes, there's also the notifier chain for the hardware. However, very few drivers seem to use that - adb seems to be the only one still in the tree. For everything else, the device tree is used in exactly the same way as on x86. If it's safe on Macs but not on x86, then (as far as I can tell) it looks like it's only by luck. Anyway. I've tested the following patch on a dual-core x86. No obvious issues yet, but I'll try to put it through a few hundred cycles. OK I'm working on a patch that introduces hibernation/suspend notifiers. It will conflict with this one a bit, but OTOH it might be useful here too. I'll post it in a while in a separate thread. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote: Anyway. I've tested the following patch on a dual-core x86. No obvious issues yet, but I'll try to put it through a few hundred cycles. This /mostly/ works - I've had my test machine cycling through a suspend cycle every 10 seconds for the past hour without any difficulties providing I unload USB first. If USB is loaded, the suspend occasionally fails with one of the devices returning -EBUSY and causing it to be aborted. I haven't looked into this in any detail yet, but it's presumably sufficiently generic code that it's potentially biting people on PPC anyway. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Friday, 25 May 2007 01:19, Pavel Machek wrote: > On Thu 2007-05-24 20:16:38, Henrique de Moraes Holschuh wrote: > > On Fri, 25 May 2007, Pavel Machek wrote: > > > My proposed solution is "fix pcmcia to load firmware before suspend > > > even starts" > > > > s/pcmcia/all drivers that load firmware/ if you are going to go that way. > > I'm not "going that way". It always was that way. If driver tries to > load firmware during suspend, it will deadlock. Exactly. And the freezing of user land has _nothing_ to do with that. The fact is the user land is not reliable while device drivers are being suspended, regardless of whether it's frozen at that point or not. BTW, we are going (or at least I'm going) to untangle the hibernation and suspend code paths, but I have limited time for that and I just _can't_ do this any faster. In the meantime, we have bugs like this one that need to be fixed _within_ the current limitations, because we just _can't_ remove these limitations overnight.. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Need suspend-to-ram maintainer Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > To answer the question, I guess the answer is that although they're > > different creatures, they have similarities. This is one of them, which > > is why I could make the mistake I did. Nothing in the issue being > > discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus > > too much on the similarities, but that doesn't mean they're not there. > > I agree that the current bug is not unique to STR. In fact, I think Romano > tested both STD and STR, and both had the same bug with the 60s timeout. > > But what irritates me is that STR really shouldn't have _had_ that bug at > all. The only reason STR had the same bug as STD was exactly the fact that > the two features are too closely inter-twined in the kernel. And what do you expect? We have three people working on hibernation, and suspend-to-ram was created as "oh, if we do this, this, and this, we get get suspend-to-ram with existing code". > I agree that disk snapshotting is much harder. If we had a bug just in > that part, I wouldn't mind it so much. Getting hard problems wrong isn't > something you should be ashamed of. What I mind is that the _easier_ > problem got infected by all the bugs from the _harder_ issue. That just > makes me really really angry and frustrated. > > Look at it this way: if you designed a CPU, and you made the integer > code-path share everything with the floating point side, because "addition > is addition", and as a result the latency for the simple arithmetic and > logical ops in integer ALU was four cycles, what would you be? You'd be seriously overstaffed in FPU side, and seriously understaffed on ALU side. This is basically what happened here. I tell people to get hibernation to work _first_ because it is usually easier. And what does that mean? We need three people to work on suspend-to-RAM. Heck, we need at least _one_ person to work on suspend-to-RAM, but he needs to be listed in MAINTAINERS. With hibernation people trying to maintain suspend in their spare cycles, how do you expect suspend to work? Similar to hibernation, that's how it looks today. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO > > PEOPLE FOR FIVE YEARS NOW. > > And people aren't listening. Have you thought about _why_? > > The thing is, it should just work. Even without pre-loading. But it does not work, and as you demonstrated, getting it to work w/o preloading is awful lot of work. Feel free to send a patch. Unless you are ready to do that, stop confusing driver authors. > > Imageine we killed freezer. Also imagine Romano has IDE card his > > PCMCIA slot. Kaboom, we solved nothing. > > Don't be silly. We solved it. The firmware has to be loadable from > somewhere else, since otherwise his IDE card wouldn't have been accessible > in the first place! Firmware loader is complex userspace process. That's not silly. It is userland, and I'd hate to explain to its authors detailed rules. It could do 'find / -name "pcmcia-card-firmware"' for example. It could do dbus message to tell gnome-graphical-crap to display window to say that it is loading firmware. Maybe it also writes to syslog when syslogd is available. It is userland process, so it is allowed to do stupid stuff. [If you do not agree, please try to write up "Doc*/what-firmware-loader-must-do.txt" -- at that point you should realize how ugly the solution you are suggesting is.] Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Need suspend-to-ram maintainer Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! To answer the question, I guess the answer is that although they're different creatures, they have similarities. This is one of them, which is why I could make the mistake I did. Nothing in the issue being discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus too much on the similarities, but that doesn't mean they're not there. I agree that the current bug is not unique to STR. In fact, I think Romano tested both STD and STR, and both had the same bug with the 60s timeout. But what irritates me is that STR really shouldn't have _had_ that bug at all. The only reason STR had the same bug as STD was exactly the fact that the two features are too closely inter-twined in the kernel. And what do you expect? We have three people working on hibernation, and suspend-to-ram was created as oh, if we do this, this, and this, we get get suspend-to-ram with existing code. I agree that disk snapshotting is much harder. If we had a bug just in that part, I wouldn't mind it so much. Getting hard problems wrong isn't something you should be ashamed of. What I mind is that the _easier_ problem got infected by all the bugs from the _harder_ issue. That just makes me really really angry and frustrated. Look at it this way: if you designed a CPU, and you made the integer code-path share everything with the floating point side, because addition is addition, and as a result the latency for the simple arithmetic and logical ops in integer ALU was four cycles, what would you be? You'd be seriously overstaffed in FPU side, and seriously understaffed on ALU side. This is basically what happened here. I tell people to get hibernation to work _first_ because it is usually easier. And what does that mean? We need three people to work on suspend-to-RAM. Heck, we need at least _one_ person to work on suspend-to-RAM, but he needs to be listed in MAINTAINERS. With hibernation people trying to maintain suspend in their spare cycles, how do you expect suspend to work? Similar to hibernation, that's how it looks today. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO PEOPLE FOR FIVE YEARS NOW. And people aren't listening. Have you thought about _why_? The thing is, it should just work. Even without pre-loading. But it does not work, and as you demonstrated, getting it to work w/o preloading is awful lot of work. Feel free to send a patch. Unless you are ready to do that, stop confusing driver authors. Imageine we killed freezer. Also imagine Romano has IDE card his PCMCIA slot. Kaboom, we solved nothing. Don't be silly. We solved it. The firmware has to be loadable from somewhere else, since otherwise his IDE card wouldn't have been accessible in the first place! Firmware loader is complex userspace process. That's not silly. It is userland, and I'd hate to explain to its authors detailed rules. It could do 'find / -name pcmcia-card-firmware' for example. It could do dbus message to tell gnome-graphical-crap to display window to say that it is loading firmware. Maybe it also writes to syslog when syslogd is available. It is userland process, so it is allowed to do stupid stuff. [If you do not agree, please try to write up Doc*/what-firmware-loader-must-do.txt -- at that point you should realize how ugly the solution you are suggesting is.] Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Friday, 25 May 2007 01:19, Pavel Machek wrote: On Thu 2007-05-24 20:16:38, Henrique de Moraes Holschuh wrote: On Fri, 25 May 2007, Pavel Machek wrote: My proposed solution is fix pcmcia to load firmware before suspend even starts s/pcmcia/all drivers that load firmware/ if you are going to go that way. I'm not going that way. It always was that way. If driver tries to load firmware during suspend, it will deadlock. Exactly. And the freezing of user land has _nothing_ to do with that. The fact is the user land is not reliable while device drivers are being suspended, regardless of whether it's frozen at that point or not. BTW, we are going (or at least I'm going) to untangle the hibernation and suspend code paths, but I have limited time for that and I just _can't_ do this any faster. In the meantime, we have bugs like this one that need to be fixed _within_ the current limitations, because we just _can't_ remove these limitations overnight.. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Thu, 2007-05-24 at 21:49 -0700, Linus Torvalds wrote: > > On Fri, 25 May 2007, Nigel Cunningham wrote: > > > > Does that mean you never ever power off your laptop (assuming you have > > one), and the battery never runs out? Surely you must power it off > > completely sometimes? > > So? The bootup isn't that much worse than a disk suspend/resume, and it's > reliable. > > And actually, I don't use laptops much. I use mostly desktops, and STR > works fine on at least some of them. In contrast, doing some > suspend-to-disk thing would just be insane and idiotic. If I have to wait > for half a minute and have a slow system even after that because my git > trees aren't in the cache, I really might as well just shut them off. > > In contrast, STR means they are quiet and don't waste energy when I don't > use them, but they're instantly available when I care. HUGE difference. > > I really think suspend-to-disk is just a total waste of my time. Ah. That's because you're using [u]swsusp. If you used Suspend2, your git trees would be in the cache, your system wouldn't be slow and you'd still be back up in that half a minute or so - probably less time. Give it a try for a week, and then go back to rebooting. After that, tell me rebooting is better and I've wasted the last 5 or 6 years improving the code. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Fri, 25 May 2007, Nigel Cunningham wrote: > > Does that mean you never ever power off your laptop (assuming you have > one), and the battery never runs out? Surely you must power it off > completely sometimes? So? The bootup isn't that much worse than a disk suspend/resume, and it's reliable. And actually, I don't use laptops much. I use mostly desktops, and STR works fine on at least some of them. In contrast, doing some suspend-to-disk thing would just be insane and idiotic. If I have to wait for half a minute and have a slow system even after that because my git trees aren't in the cache, I really might as well just shut them off. In contrast, STR means they are quiet and don't waste energy when I don't use them, but they're instantly available when I care. HUGE difference. I really think suspend-to-disk is just a total waste of my time. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Howdy. On Thu, 2007-05-24 at 20:31 -0700, Linus Torvalds wrote: > > On Fri, 25 May 2007, Nigel Cunningham wrote: > > > > > > That said, I think freezing is crap even for > > > snapshotting/suspend-to-disk, > > > but the point of the above rant is to show how insane it is to think that > > > problems and complexity in one area should translate into problems and > > > complexity in another area. > > > > Does that imply that you'd prefer to see filesystem checkpointing code, > > that you think freezing can be done better, or do you have some other > > solution that hasn't occurred to me? > > I actually don't think that processes should be frozen really at all. > > I agree that filesystems have to be frozen (and I think that checkpointing > of the filesystem or block device is "too clever"), but I just don't think > that has anything to do with freezing processes. > > So I'd actually much prefer to freeze at the VFS (and socket layers, etc), > and make sure that anybody who tries to write or do something else that we > cannot do until resuming, will just be blocked (or perhaps just buffered)! > > See? I actually think that this process-based thing is barking up the > wrong tree. After all, it's really not the case that we need to stop > processes, and stopping processes really does have some problems. It's > simpler in some ways, but I think a more directed solution would actually > be better. That does sound doable. I'm sorry to say it, but dropping process freezing still seems to me like the better way though. I prefer it because of the reliability aspect. With the current code, having frozen processes, I can look at the state of memory, calculate how much I'll need for this or that and know that I'll have sufficient memory for the atomic copy and for doing the I/O (making assumptions about how much memory drivers will allocate) before I start to do either. If we stop freezing processes, that predictability will go away. There'll always be a possibility that some process will get memory hungry and stop me from being able to get the image on disk, and I'll have to either abort or give up and try again and again until I can complete writing the image, the battery runs out or whatever... > >bviously we _do_ want to actually try to quiesce normal user processes. > >But by "normal user", I'd be willing to limit it to non-uid-zero things, > >for example. Exactly because it does turn out that the kernel kind of > >depends on user-land things for stuff like network filesystem connection > >setup etc (ie we tend to do things like the mount encryption stuff in > >userland!). Not sure who you're quoting here, but it's not me. Pavel maybe? I was unsub'd for a couple of weeks, so guess it's from during that period. > But I really don't care that deeply per se, exactly because I don't use it > myself. I think people are going down the wrong rabbit-hole, but it > wouldn't _irritate_ me that much except for the fact that it now also > impacts suspend-to-RAM. Does that mean you never ever power off your laptop (assuming you have one), and the battery never runs out? Surely you must power it off completely sometimes? If you do, doesn't that ever happen at a time when you're part way through something and you'd like to be able to pick up your merge or whatever later without having to say "Now, where was I up to?" *shrug* Maybe you're just exceptional :) (Yeah, we know you are in other ways, but this way?...) Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Fri, 25 May 2007, Nigel Cunningham wrote: > > > > That said, I think freezing is crap even for snapshotting/suspend-to-disk, > > but the point of the above rant is to show how insane it is to think that > > problems and complexity in one area should translate into problems and > > complexity in another area. > > Does that imply that you'd prefer to see filesystem checkpointing code, > that you think freezing can be done better, or do you have some other > solution that hasn't occurred to me? I actually don't think that processes should be frozen really at all. I agree that filesystems have to be frozen (and I think that checkpointing of the filesystem or block device is "too clever"), but I just don't think that has anything to do with freezing processes. So I'd actually much prefer to freeze at the VFS (and socket layers, etc), and make sure that anybody who tries to write or do something else that we cannot do until resuming, will just be blocked (or perhaps just buffered)! See? I actually think that this process-based thing is barking up the wrong tree. After all, it's really not the case that we need to stop processes, and stopping processes really does have some problems. It's simpler in some ways, but I think a more directed solution would actually be better. >bviously we _do_ want to actually try to quiesce normal user processes. >But by "normal user", I'd be willing to limit it to non-uid-zero things, >for example. Exactly because it does turn out that the kernel kind of >depends on user-land things for stuff like network filesystem connection >setup etc (ie we tend to do things like the mount encryption stuff in >userland!). But I really don't care that deeply per se, exactly because I don't use it myself. I think people are going down the wrong rabbit-hole, but it wouldn't _irritate_ me that much except for the fact that it now also impacts suspend-to-RAM. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi. On Thu, 2007-05-24 at 19:41 -0700, Linus Torvalds wrote: > > On Fri, 25 May 2007, Nigel Cunningham wrote: > > > > To answer the question, I guess the answer is that although they're > > different creatures, they have similarities. This is one of them, which > > is why I could make the mistake I did. Nothing in the issue being > > discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus > > too much on the similarities, but that doesn't mean they're not there. > > I agree that the current bug is not unique to STR. In fact, I think Romano > tested both STD and STR, and both had the same bug with the 60s timeout. > > But what irritates me is that STR really shouldn't have _had_ that bug at > all. The only reason STR had the same bug as STD was exactly the fact that > the two features are too closely inter-twined in the kernel. > > That irritates me hugely. We had a bug we should never had had! We had a > bug because people are sharing code that shouldn't be shared! We had a bug > because of code that makes no sense in the first place! > > I agree that disk snapshotting is much harder. If we had a bug just in > that part, I wouldn't mind it so much. Getting hard problems wrong isn't > something you should be ashamed of. What I mind is that the _easier_ > problem got infected by all the bugs from the _harder_ issue. That just > makes me really really angry and frustrated. > > Look at it this way: if you designed a CPU, and you made the integer > code-path share everything with the floating point side, because "addition > is addition", and as a result the latency for the simple arithmetic and > logical ops in integer ALU was four cycles, what would you be? > > You'd be a moron, that's what. > > And that is _exactly_ what the current STD/STR code does. It says "suspend > is suspend" and tries to share the same pipeline, even though the two > operations are totally different, and share nothing but the name people > use for it (and even the name is really pretty weak, and I've tried to > get people to use some other name for STD). I think I get what you're trying to say, but I also think you're either overstating your case ("...totally different and share nothing but the name...") or underestimating the similiarity - they both need (albeit for different reasons) to do the cpu hotplugging, driver suspending (yeah, there are similarities and differences there) and irq disabling. That's _some_ similarity. Apart from that, yeah - they are totally different. Re the name, we discussed changing the name of Suspend2 on IRC a night or two ago. We came to the conclusion that, for better or for worse, it's too well known now. I can see your logic in wanting to differentiate them, but I seem to be a bit stuck :\. Push some more. Maybe we'll get there anyway :) Maybe you can get rid of that horrible, unpronounceable 'swsusp' name while you're at it? :) > So yes,the two things _do_ share the problem, but they really really > shouldn't. There's no reason to think that they should. And it drives me > absolutely bonkers that people seem to have such a hard time seeing that. > > That said, I think freezing is crap even for snapshotting/suspend-to-disk, > but the point of the above rant is to show how insane it is to think that > problems and complexity in one area should translate into problems and > complexity in another area. Does that imply that you'd prefer to see filesystem checkpointing code, that you think freezing can be done better, or do you have some other solution that hasn't occurred to me? > And if the snapshot people want to screw up their snapshots with freezing, > I don't actually care that much. I'd much rather have working STR. As it > is now, they're now _both_ broken. Fair enough :). Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Fri, 25 May 2007, Nigel Cunningham wrote: > > To answer the question, I guess the answer is that although they're > different creatures, they have similarities. This is one of them, which > is why I could make the mistake I did. Nothing in the issue being > discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus > too much on the similarities, but that doesn't mean they're not there. I agree that the current bug is not unique to STR. In fact, I think Romano tested both STD and STR, and both had the same bug with the 60s timeout. But what irritates me is that STR really shouldn't have _had_ that bug at all. The only reason STR had the same bug as STD was exactly the fact that the two features are too closely inter-twined in the kernel. That irritates me hugely. We had a bug we should never had had! We had a bug because people are sharing code that shouldn't be shared! We had a bug because of code that makes no sense in the first place! I agree that disk snapshotting is much harder. If we had a bug just in that part, I wouldn't mind it so much. Getting hard problems wrong isn't something you should be ashamed of. What I mind is that the _easier_ problem got infected by all the bugs from the _harder_ issue. That just makes me really really angry and frustrated. Look at it this way: if you designed a CPU, and you made the integer code-path share everything with the floating point side, because "addition is addition", and as a result the latency for the simple arithmetic and logical ops in integer ALU was four cycles, what would you be? You'd be a moron, that's what. And that is _exactly_ what the current STD/STR code does. It says "suspend is suspend" and tries to share the same pipeline, even though the two operations are totally different, and share nothing but the name people use for it (and even the name is really pretty weak, and I've tried to get people to use some other name for STD). So yes,the two things _do_ share the problem, but they really really shouldn't. There's no reason to think that they should. And it drives me absolutely bonkers that people seem to have such a hard time seeing that. That said, I think freezing is crap even for snapshotting/suspend-to-disk, but the point of the above rant is to show how insane it is to think that problems and complexity in one area should translate into problems and complexity in another area. And if the snapshot people want to screw up their snapshots with freezing, I don't actually care that much. I'd much rather have working STR. As it is now, they're now _both_ broken. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi Linus. On Thu, 2007-05-24 at 19:10 -0700, Linus Torvalds wrote: > > On Fri, 25 May 2007, Nigel Cunningham wrote: > > > > First, let me agree with you that for the atomic copy itself, the > > freezer is unnecessary. Disabling irqs and so on is enough to ensure the > > atomic copy is atomic. I don't think any of us are arguing with you > > there. > > First off, realize that the problem actually happens during > suspend-to-ram. > > Think about that for a second. > > In fact, think about it for a _loong_ time. Because dammit, people seem to > have a really hard time even realizing this. > > There is no "atomic copy". > > There is no "checkpointing". > > There is no "spoon". > > > Hope this helps. > > Hope _the_above_ helps. Why is it so hard for people to accept that > suspend-to-ram shouldn't break because of some IDIOTIC issues with disk > snapshots? > > And why do you people _always_ keep mixing the two up? It does. Sorry. I didn't read enough of the context. To answer the question, I guess the answer is that although they're different creatures, they have similarities. This is one of them, which is why I could make the mistake I did. Nothing in the issue being discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus too much on the similarities, but that doesn't mean they're not there. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Fri, 25 May 2007, Nigel Cunningham wrote: > > First, let me agree with you that for the atomic copy itself, the > freezer is unnecessary. Disabling irqs and so on is enough to ensure the > atomic copy is atomic. I don't think any of us are arguing with you > there. First off, realize that the problem actually happens during suspend-to-ram. Think about that for a second. In fact, think about it for a _loong_ time. Because dammit, people seem to have a really hard time even realizing this. There is no "atomic copy". There is no "checkpointing". There is no "spoon". > Hope this helps. Hope _the_above_ helps. Why is it so hard for people to accept that suspend-to-ram shouldn't break because of some IDIOTIC issues with disk snapshots? And why do you people _always_ keep mixing the two up? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi Linus. On Thu, 2007-05-24 at 17:37 -0700, Linus Torvalds wrote: > > On Fri, 25 May 2007, Pavel Machek wrote: > > > > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO > > PEOPLE FOR FIVE YEARS NOW. > > And people aren't listening. Have you thought about _why_? > > The thing is, it should just work. Even without pre-loading. > > > Imageine we killed freezer. Also imagine Romano has IDE card his > > PCMCIA slot. Kaboom, we solved nothing. > > Don't be silly. We solved it. The firmware has to be loadable from > somewhere else, since otherwise his IDE card wouldn't have been accessible > in the first place! > > So all your arguments are just bogus crap. Let me see if I can help. I'll probably fail miserably, but I can only try :) First, let me agree with you that for the atomic copy itself, the freezer is unnecessary. Disabling irqs and so on is enough to ensure the atomic copy is atomic. I don't think any of us are arguing with you there. Where we see the problem is with what happens after the atomic copy is made. The problem is that the atomic copy includes struct inodes, dnodes and such like - an in memory representation of the state of mounted filesystems. Imagine that, post atomic copy, we don't have the freezer. Processes can then make on-disk changes to these mounted filesystems in the time before we complete saving the image and powering down. If, at resume time, we then restore the atomic copy, we have a mismatch between what the in-memory data structures say and what the on-disk data says. This leads to corruption. How to avoid? Well, there are only two options as far as I can see. We either stop those changes occurring in the first place, or we make them undoable. Freezing processes, and/or filesystems would be the first path, checkpointing the second. So, as far as I can see, we're stuck with freezing processes at least until checkpointing is implemented. I have to admit though, that even if checkpointing was implemented, I'd like to see freezing processes remain. The image gets written faster if we don't have to compete for cpu and i/o. It also allows us to do a fuller image of memory than is otherwise possible (Yes, I know some people don't care for full images, but others of us have usage patterns that make the system far more useable if a full image is kept, or simply prefer to have our machines as if the power had never gone away). Without processes freezing, I'd have to work a lot harder to find a way to do that full image. The simplest way would probably be to carry the freezer code myself. (Yeah, I'm reconciled to the idea of never getting Suspend2 merged. I'd like it to happen, but won't hold my breath. Someone needs to break your suspend-to-ram or battery so you see the use for hibernation :>). Hope this helps. Nigel signature.asc Description: This is a digitally signed message part
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Fri, 25 May 2007, Pavel Machek wrote: > > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO > PEOPLE FOR FIVE YEARS NOW. And people aren't listening. Have you thought about _why_? The thing is, it should just work. Even without pre-loading. > Imageine we killed freezer. Also imagine Romano has IDE card his > PCMCIA slot. Kaboom, we solved nothing. Don't be silly. We solved it. The firmware has to be loadable from somewhere else, since otherwise his IDE card wouldn't have been accessible in the first place! So all your arguments are just bogus crap. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
On Thu 2007-05-24 20:16:38, Henrique de Moraes Holschuh wrote: > On Fri, 25 May 2007, Pavel Machek wrote: > > My proposed solution is "fix pcmcia to load firmware before suspend > > even starts" > > s/pcmcia/all drivers that load firmware/ if you are going to go that way. I'm not "going that way". It always was that way. If driver tries to load firmware during suspend, it will deadlock. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review
Hi! > > > Why the HELL cannot you realize that kernel threads are different? > > > > Ugh? We are talking about request_firmware() here, right? That's > > calling userland helper to load the firmware...? Looks like USER > > thread to me. > > Right. And if we had had the nice old /sbin/hotplug thing, it would all > have worked fine - because it would just have done an execve(), and things > would be happy. > > But people screwed that up too, and now udevd is an undebuggable user > thread. Shit happens. See my other email about why even user threads can > probably not be frozen, and the whole freezer thing is misdesigned. I'm not ready to redesign udevd :-(. Your other mail proves that either 1) we can stop freezing udevd, and pray udevd does not become confused by "half hardware not available" while system is being suspended _or_ 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO PEOPLE FOR FIVE YEARS NOW. > And I repeat: PowerPC had working and stable suspend five _years_ ago, > without any of that freezing crud. We should rip it out. Imageine we killed freezer. Also imagine Romano has IDE card his PCMCIA slot. Kaboom, we solved nothing. We'll either deadlock or do something very nasty to the filesystem on the IDE card ... because we'll have udevd running, but fs on IDE card not available. That does not work. "Not freezing udevd" only makes problems hard to trigger, see? Now... "should we rip freezer out of suspend" is a different story. It does not help _here_. We still need to load the firmware during _suspend_. [Can you ack this point and we can have nice flamewar about ripping out freezer?] But I'd actually like to get rid of freezer for suspend (I believe it is needed for hibernation) -- we'll need to do similar that for runtime suspending of devices, anyway. But "just rip it out" will cause some hard to debug breakage, we need to somehow audit the drivers, or ask driver writers to audit them or something. ... and yes, ripping freezer out _will_ make drivers more complex. Your video capture card will now have to deal with "ouch, I was already told to suspend, and now someone is calling my ioctls() ?!". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/