Re: [linux-pm] Re: Hibernation considerations
On Wed, 1 Aug 2007, Pavel Machek wrote: Hi! Do we have to block module loading? No. Registering new drivers is okay, registering new devices is bad. Of course, some modules do want to register a new device in their init method. I don't know what we should do about them. Force the registration to fail, I suppose. How often will people suspend while a module is loading? Well... plug this pcmcia card into the slot so that I do not have to carry it separately, close the lid and go? ...not that impossible to imagine... I useually leave my broadband card in the slot, but not seated. I wouldn't bet against it getting pushed in enough to be detected while putting the laptop in the bag. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Wednesday, 1 August 2007 11:22, Pavel Machek wrote: > Hi! > > > > Hmm, wonder why this isn't affecting people with VPNs? Probably > > > network mounts over VPN are rare, and ever rarer to have fs activity > > > on them during suspend. > > > > > > Anyway, I think it's long overdue to stop thinking about how to "fix" > > > fuse, and concentrate on fixing the underlying problem instead ;) > > > > To conclude this branch of the thread, I have a patch in the works that may > > help a bit with unfreezable FUSE filesystems and it only affects the > > freezer. > > I'll post it when 2.6.23-rc1 is out, because it's on top of some other > > patches > > that need to go first. > > I'm interested... which one is that? Appended, on top of this: https://lists.linux-foundation.org/pipermail/linux-pm/2007-July/014521.html Greetings, Rafael --- kernel/power/process.c | 49 - 1 file changed, 48 insertions(+), 1 deletion(-) Index: linux-2.6.23-rc1/kernel/power/process.c === --- linux-2.6.23-rc1.orig/kernel/power/process.c2007-07-24 00:14:07.0 +0200 +++ linux-2.6.23-rc1/kernel/power/process.c 2007-07-24 00:14:17.0 +0200 @@ -30,6 +30,14 @@ */ #define MAX_WAITS 5 +/* + * If the freezing of tasks fails, we attempt to thaw tasks that have already + * been frozen to give a chance the other tasks to freeze, in case one or more + * of them are blocked by the frozen ones. If this fails MAX_ATTEMPTS times + * in a row, we give up. + */ +#define MAX_ATTEMPTS 10 + #define FREEZER_KERNEL_THREADS 0 #define FREEZER_USER_SPACE 1 @@ -192,14 +200,21 @@ static void cancel_freezing(struct task_ static int try_to_freeze_tasks(int freeze_user_space) { struct task_struct *g, *p; - unsigned int todo, waits; + unsigned int todo, waits, attempts; unsigned long ret; struct timeval start, end; s64 elapsed_csecs64; unsigned int elapsed_csecs; + char *tick = "-\\|/"; + + printk(" "); + attempts = 0; do_gettimeofday(); + Repeat: + printk("\b%c", tick[attempts++ % 4]); + refrigerator_called = 0; waits = 0; do { @@ -235,11 +250,43 @@ static int try_to_freeze_tasks(int freez } } while (todo); + if (todo && attempts <= MAX_ATTEMPTS) { + /* +* Some tasks have not been able to freeze. They might be stuck +* in TASK_UNINTERRUPTIBLE waiting for the frozen tasks. Try to +* thaw the tasks that have frozen without clearing the freeze +* requests of the remaining tasks and repeat. +*/ + read_lock(_lock); + do_each_thread(g, p) { + if (frozen(p)) { + p->flags &= ~PF_FROZEN; + wake_up_process(p); + } + } while_each_thread(g, p); + read_unlock(_lock); + + ret = wait_event_timeout(refrigerator_waitq, + refrigerator_called, TIMEOUT); + if (!ret) { + /* +* There is a little hope that we will succeed, but at +* least we want to know which tasks have not been +* frozen. Thus, we are going to repeat once. +*/ + attempts = MAX_ATTEMPTS; + } + + goto Repeat; + } + do_gettimeofday(); elapsed_csecs64 = timeval_to_ns() - timeval_to_ns(); do_div(elapsed_csecs64, NSEC_PER_SEC / 100); elapsed_csecs = elapsed_csecs64; + printk("\b"); + if (todo) { /* This does not unfreeze processes that are already frozen * (we have slightly ugly calling convention in that respect, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi! > > Hmm, wonder why this isn't affecting people with VPNs? Probably > > network mounts over VPN are rare, and ever rarer to have fs activity > > on them during suspend. > > > > Anyway, I think it's long overdue to stop thinking about how to "fix" > > fuse, and concentrate on fixing the underlying problem instead ;) > > To conclude this branch of the thread, I have a patch in the works that may > help a bit with unfreezable FUSE filesystems and it only affects the freezer. > I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches > that need to go first. I'm interested... which one is that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi! > > Do we have to block module loading? > > No. Registering new drivers is okay, registering new devices is bad. > > Of course, some modules do want to register a new device in their init > method. I don't know what we should do about them. Force the > registration to fail, I suppose. How often will people suspend while a > module is loading? Well... plug this pcmcia card into the slot so that I do not have to carry it separately, close the lid and go? ...not that impossible to imagine... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi! > > > The problem with FUSE is related to the fact that the freezer can't > > > freeze uninterruptible tasks and we said that perhaps we might avoid > > > it if FUSE was made freezing-aware. Still, no one has gone in this > > > direction and I don't know of any plans to do that. > > > > I thought we have fully explored this direction. Lots of emails, and > > an IRC session with Pavel. Conclusion: > > What am I missing in the following suggested solution? > > 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, > which, > when set, causes a userspace process to be frozen with kernel threads > instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, > sync and then freeze TIF_LATEFREEZE and freezable kernel threads. > > 2) In the fuse code, the PID of the process that will do the work gets passed The list of neccessary PIDs is not known to the kernel. FUSE servers may depend on another parts of userland. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi! The problem with FUSE is related to the fact that the freezer can't freeze uninterruptible tasks and we said that perhaps we might avoid it if FUSE was made freezing-aware. Still, no one has gone in this direction and I don't know of any plans to do that. I thought we have fully explored this direction. Lots of emails, and an IRC session with Pavel. Conclusion: What am I missing in the following suggested solution? 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, which, when set, causes a userspace process to be frozen with kernel threads instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, sync and then freeze TIF_LATEFREEZE and freezable kernel threads. 2) In the fuse code, the PID of the process that will do the work gets passed The list of neccessary PIDs is not known to the kernel. FUSE servers may depend on another parts of userland. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi! Do we have to block module loading? No. Registering new drivers is okay, registering new devices is bad. Of course, some modules do want to register a new device in their init method. I don't know what we should do about them. Force the registration to fail, I suppose. How often will people suspend while a module is loading? Well... plug this pcmcia card into the slot so that I do not have to carry it separately, close the lid and go? ...not that impossible to imagine... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi! Hmm, wonder why this isn't affecting people with VPNs? Probably network mounts over VPN are rare, and ever rarer to have fs activity on them during suspend. Anyway, I think it's long overdue to stop thinking about how to fix fuse, and concentrate on fixing the underlying problem instead ;) To conclude this branch of the thread, I have a patch in the works that may help a bit with unfreezable FUSE filesystems and it only affects the freezer. I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches that need to go first. I'm interested... which one is that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Wednesday, 1 August 2007 11:22, Pavel Machek wrote: Hi! Hmm, wonder why this isn't affecting people with VPNs? Probably network mounts over VPN are rare, and ever rarer to have fs activity on them during suspend. Anyway, I think it's long overdue to stop thinking about how to fix fuse, and concentrate on fixing the underlying problem instead ;) To conclude this branch of the thread, I have a patch in the works that may help a bit with unfreezable FUSE filesystems and it only affects the freezer. I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches that need to go first. I'm interested... which one is that? Appended, on top of this: https://lists.linux-foundation.org/pipermail/linux-pm/2007-July/014521.html Greetings, Rafael --- kernel/power/process.c | 49 - 1 file changed, 48 insertions(+), 1 deletion(-) Index: linux-2.6.23-rc1/kernel/power/process.c === --- linux-2.6.23-rc1.orig/kernel/power/process.c2007-07-24 00:14:07.0 +0200 +++ linux-2.6.23-rc1/kernel/power/process.c 2007-07-24 00:14:17.0 +0200 @@ -30,6 +30,14 @@ */ #define MAX_WAITS 5 +/* + * If the freezing of tasks fails, we attempt to thaw tasks that have already + * been frozen to give a chance the other tasks to freeze, in case one or more + * of them are blocked by the frozen ones. If this fails MAX_ATTEMPTS times + * in a row, we give up. + */ +#define MAX_ATTEMPTS 10 + #define FREEZER_KERNEL_THREADS 0 #define FREEZER_USER_SPACE 1 @@ -192,14 +200,21 @@ static void cancel_freezing(struct task_ static int try_to_freeze_tasks(int freeze_user_space) { struct task_struct *g, *p; - unsigned int todo, waits; + unsigned int todo, waits, attempts; unsigned long ret; struct timeval start, end; s64 elapsed_csecs64; unsigned int elapsed_csecs; + char *tick = -\\|/; + + printk( ); + attempts = 0; do_gettimeofday(start); + Repeat: + printk(\b%c, tick[attempts++ % 4]); + refrigerator_called = 0; waits = 0; do { @@ -235,11 +250,43 @@ static int try_to_freeze_tasks(int freez } } while (todo); + if (todo attempts = MAX_ATTEMPTS) { + /* +* Some tasks have not been able to freeze. They might be stuck +* in TASK_UNINTERRUPTIBLE waiting for the frozen tasks. Try to +* thaw the tasks that have frozen without clearing the freeze +* requests of the remaining tasks and repeat. +*/ + read_lock(tasklist_lock); + do_each_thread(g, p) { + if (frozen(p)) { + p-flags = ~PF_FROZEN; + wake_up_process(p); + } + } while_each_thread(g, p); + read_unlock(tasklist_lock); + + ret = wait_event_timeout(refrigerator_waitq, + refrigerator_called, TIMEOUT); + if (!ret) { + /* +* There is a little hope that we will succeed, but at +* least we want to know which tasks have not been +* frozen. Thus, we are going to repeat once. +*/ + attempts = MAX_ATTEMPTS; + } + + goto Repeat; + } + do_gettimeofday(end); elapsed_csecs64 = timeval_to_ns(end) - timeval_to_ns(start); do_div(elapsed_csecs64, NSEC_PER_SEC / 100); elapsed_csecs = elapsed_csecs64; + printk(\b); + if (todo) { /* This does not unfreeze processes that are already frozen * (we have slightly ugly calling convention in that respect, - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Wed, 1 Aug 2007, Pavel Machek wrote: Hi! Do we have to block module loading? No. Registering new drivers is okay, registering new devices is bad. Of course, some modules do want to register a new device in their init method. I don't know what we should do about them. Force the registration to fail, I suppose. How often will people suspend while a module is loading? Well... plug this pcmcia card into the slot so that I do not have to carry it separately, close the lid and go? ...not that impossible to imagine... I useually leave my broadband card in the slot, but not seated. I wouldn't bet against it getting pushed in enough to be detected while putting the laptop in the bag. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [linux-pm] Re: Hibernation considerations
On Tue, 24 Jul 2007, Huang, Ying wrote: > >From: Alan Stern [mailto:[EMAIL PROTECTED] > >It can't. Indeed, in the absence of a freezer, user threads will need > >devices (more accurately, will submit I/O requests for devices) that > >have to be kept quiescent or low-power. Drivers will need to delay > >those requests until the devices are returned to full operation. > > > >That's exactly what I've been saying all along: Drivers will need to > >be changed to delay I/O requests, if there is no freezer. > > If it is a too big work to implement "delaying I/O requests" for every > driver, is it possible to implement it as follow: > > 1. It is triggered to suspend to RAM/DISK. > 2. Replace the driver related syscall entries (such as sys_read, > sys_write, sys_ioctl, etc) in sys_call_table with special wrapper > entries provided by "suspend to RAM/DISK" subsystem, which will delay > I/O requests if appropriate. > 3. When devices are quiesced, they are put into "low power" state and > system is put into suspend state; or the image is written to disk > (through snapshot/uswsusp or kexeced kernel). > 4. After resuming from RAM/DISK, devices are put into "normal" state and > the syscall entries replaced in step 2 are restored. Ha! I made exactly this same suggestion (URL lost in the mists of time), except that I proposed changing the syscall entries for every system call, not just the driver-related ones. Nobody seemed to think it would work very well. It leaves a few loose ends. For example, suppose a user thread is already in the middle of a system call and is about to start doing some I/O (maybe it's waiting for a timer to expire). In the end, this doesn't seem to be very different from freezing all user threads. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [linux-pm] Re: Hibernation considerations
>From: Alan Stern [mailto:[EMAIL PROTECTED] >It can't. Indeed, in the absence of a freezer, user threads will need >devices (more accurately, will submit I/O requests for devices) that >have to be kept quiescent or low-power. Drivers will need to delay >those requests until the devices are returned to full operation. > >That's exactly what I've been saying all along: Drivers will need to >be changed to delay I/O requests, if there is no freezer. If it is a too big work to implement "delaying I/O requests" for every driver, is it possible to implement it as follow: 1. It is triggered to suspend to RAM/DISK. 2. Replace the driver related syscall entries (such as sys_read, sys_write, sys_ioctl, etc) in sys_call_table with special wrapper entries provided by "suspend to RAM/DISK" subsystem, which will delay I/O requests if appropriate. 3. When devices are quiesced, they are put into "low power" state and system is put into suspend state; or the image is written to disk (through snapshot/uswsusp or kexeced kernel). 4. After resuming from RAM/DISK, devices are put into "normal" state and the syscall entries replaced in step 2 are restored. Best Regards, Huang Ying - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [linux-pm] Re: Hibernation considerations
From: Alan Stern [mailto:[EMAIL PROTECTED] It can't. Indeed, in the absence of a freezer, user threads will need devices (more accurately, will submit I/O requests for devices) that have to be kept quiescent or low-power. Drivers will need to delay those requests until the devices are returned to full operation. That's exactly what I've been saying all along: Drivers will need to be changed to delay I/O requests, if there is no freezer. If it is a too big work to implement delaying I/O requests for every driver, is it possible to implement it as follow: 1. It is triggered to suspend to RAM/DISK. 2. Replace the driver related syscall entries (such as sys_read, sys_write, sys_ioctl, etc) in sys_call_table with special wrapper entries provided by suspend to RAM/DISK subsystem, which will delay I/O requests if appropriate. 3. When devices are quiesced, they are put into low power state and system is put into suspend state; or the image is written to disk (through snapshot/uswsusp or kexeced kernel). 4. After resuming from RAM/DISK, devices are put into normal state and the syscall entries replaced in step 2 are restored. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [linux-pm] Re: Hibernation considerations
On Tue, 24 Jul 2007, Huang, Ying wrote: From: Alan Stern [mailto:[EMAIL PROTECTED] It can't. Indeed, in the absence of a freezer, user threads will need devices (more accurately, will submit I/O requests for devices) that have to be kept quiescent or low-power. Drivers will need to delay those requests until the devices are returned to full operation. That's exactly what I've been saying all along: Drivers will need to be changed to delay I/O requests, if there is no freezer. If it is a too big work to implement delaying I/O requests for every driver, is it possible to implement it as follow: 1. It is triggered to suspend to RAM/DISK. 2. Replace the driver related syscall entries (such as sys_read, sys_write, sys_ioctl, etc) in sys_call_table with special wrapper entries provided by suspend to RAM/DISK subsystem, which will delay I/O requests if appropriate. 3. When devices are quiesced, they are put into low power state and system is put into suspend state; or the image is written to disk (through snapshot/uswsusp or kexeced kernel). 4. After resuming from RAM/DISK, devices are put into normal state and the syscall entries replaced in step 2 are restored. Ha! I made exactly this same suggestion (URL lost in the mists of time), except that I proposed changing the syscall entries for every system call, not just the driver-related ones. Nobody seemed to think it would work very well. It leaves a few loose ends. For example, suppose a user thread is already in the middle of a system call and is about to start doing some I/O (maybe it's waiting for a timer to expire). In the end, this doesn't seem to be very different from freezing all user threads. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 23:55, Nigel Cunningham wrote: > Hi. > > On Tuesday 24 July 2007 01:23:15 Alan Stern wrote: > > On Mon, 23 Jul 2007, Nigel Cunningham wrote: > > > > > Take a step back for a second. > > > > > > The problem we're facing now is that we're getting some userspace > > > threads, > > > used in processing I/O, that are functioning as exceptions to the "freeze > > > userspace, then freezeable kernel threads" rule. They are only exceptions > > > because of that role in processing I/O - because they're de facto kernel > > > threads. So, if we orient our thinking more in terms of I/O processing > > > and > > > less in terms of the userspace/kernelspace distinction, we'll have a > > > solution: > > > > > > 1) Freeze processes that aren't fs related (ie stop them generating I/O). > > > > The problem here is that with things like FUSE, _every_ process is > > potentially fs related. Nothing prevents a FUSE thread from doing IPC > > with any other thread. > > Yes, but the fuse thread is going to know what other thread it's doing IPC > with, so it can get that thread flagged too. Yes, but that thread may do IPC with yet another one and so on. > > > 2) Flush pending I/O. > > > 3) Freeze filesystems in reverse order of dependency, the primary purpose > > > being to stop them generating further I/O on their metadata. > > > > > > Locks that are being held are only being held because work is being done. > If > > > we progressively focus on threads in terms of their create/process work > > > dependencies, we'll see that the problem isn't at all intractable. > > > > As has been mentioned before, keeping track of all that dependency > > information would be very fragile and time-consuming. > > I disagree. It's at least going to be less fragile and time-consuming then > maintaining new/extra code for kexec. Well, I think the issue is real, so we need to find a solution (the simpler, the better) and that need not be related to kexec. ;-) Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Tuesday 24 July 2007 01:23:15 Alan Stern wrote: > On Mon, 23 Jul 2007, Nigel Cunningham wrote: > > > Take a step back for a second. > > > > The problem we're facing now is that we're getting some userspace threads, > > used in processing I/O, that are functioning as exceptions to the "freeze > > userspace, then freezeable kernel threads" rule. They are only exceptions > > because of that role in processing I/O - because they're de facto kernel > > threads. So, if we orient our thinking more in terms of I/O processing and > > less in terms of the userspace/kernelspace distinction, we'll have a > > solution: > > > > 1) Freeze processes that aren't fs related (ie stop them generating I/O). > > The problem here is that with things like FUSE, _every_ process is > potentially fs related. Nothing prevents a FUSE thread from doing IPC > with any other thread. Yes, but the fuse thread is going to know what other thread it's doing IPC with, so it can get that thread flagged too. > > 2) Flush pending I/O. > > 3) Freeze filesystems in reverse order of dependency, the primary purpose > > being to stop them generating further I/O on their metadata. > > > > Locks that are being held are only being held because work is being done. If > > we progressively focus on threads in terms of their create/process work > > dependencies, we'll see that the problem isn't at all intractable. > > As has been mentioned before, keeping track of all that dependency > information would be very fragile and time-consuming. I disagree. It's at least going to be less fragile and time-consuming then maintaining new/extra code for kexec. Nigel pgpKo1OjveuTs.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007 [EMAIL PROTECTED] wrote: > > For one thing, checking for a suspend-in-progress at the beginning of > > each and every system call would add overhead to a hot path in the > > kernel, one which is already very heavily optimized. People wouldn't > > stand for it. > > I thought that the suspend stuff did this easily, It does not do it at all. Do you know how the freezer works? > but the freezer really > starts running into trouble when it wants to freeze some things, but not > other things. this seems to be the biggest area of churn and problems. No. The freezer starts running into trouble when it wants to freeze a thread but can't, because that thread is waiting for some event to occur and the only thread which can cause the event is already frozen. Or is itself waiting for a third thread which is already frozen... > > You get similar problems from system calls that wait in kernel mode > > until something has happened. For example, a read() call for the > > console device will wait until somebody types on the keyboard. At any > > point in time, many (or even most) user threads are blocked in a system > > call. > > but are locks held while they are blocked like this? Sometimes they are, sometimes they aren't. > > Let's let kernel K1 be the original kernel, the one which is going into > > hibernation. Kernel K2 is the one started by kexec to write out the > > memory image. > > > > Your question becomes: Why should K2 jumping back to K1 cause K1 > > immediately to start running user tasks? Answer: Because K1 has been > > running user tasks all along (except while K2 was active) and nothing > > has told it to stop. In fact, about the only things which _can_ cause > > K1 to stop running user threads are the freezer (which you want to > > eliminate) and disabling interrupts (not possible since some drivers > > require interrupts to be enabled when putting devices in low-power > > mode). > > when you jump to a body of code you jump to a specific point in the code, > not to some nebulous 'everything running' state. How is that relevant? When K2 jumps back to K1, it jumps to some designated location in K1. It might just after the place where K1 called K2; I'm not familiar with the details of kexec. In any event, K1 will still be in the same state as it was when it called K2. > > So when K2 starts up, it will have a phase in which user threads don't > > run. That doesn't affect K1. When K2 returns to K1, K1 does not go > > through this sort of phase. It simply picks up from where it left off. > > then how can it restart drivers before the user threads need them? It can't. Indeed, in the absence of a freezer, user threads will need devices (more accurately, will submit I/O requests for devices) that have to be kept quiescent or low-power. Drivers will need to delay those requests until the devices are returned to full operation. That's exactly what I've been saying all along: Drivers will need to be changed to delay I/O requests, if there is no freezer. > > However there still remains the problem of user tasks running after > > devices are supposed to be quiescent and before K1 starts. There's > > currently nothing to stop such tasks from making I/O requests and > > thereby causing a quiescent device to become active again. > > but if the devices are in low power mode then K1 needs to get them out of > low power mode before user tasks try to access them. No -- which is good because it can't. If a user task is running there's no way to stop it from submitting I/O requests. K1 needs to delay these requests until after the device has returned to full operation. > > We aren't talking about drivers initializing devices. We are talking > > about what happens during the time when drivers are trying to quiesce > > devices (i.e., before K1 has started up K2) or power them down (after > > K2 has returned to K1). > > or if you are doing a resume instead of a suspend to ram the drivers need > to initialize or otherwise move to full power on K1 before user tasks hit > them. Correct. User tasks are allowed to submit requests, but the requests can't be carried out until the device returns to full operation. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Oliver Neukum wrote: Am Montag 23 Juli 2007 schrieb Miklos Szeredi: The reason is that we want them to "park" in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. If you can provide a way to tell them apart, this would work. can you just tell the driver to try and suspend and if it reports back that it fails back out of the suspend? or will the driver deadlock instead of reporting a failure if a lock is held. David Lang
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Alan Stern wrote: On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote: Ok, I did misunderstand you. it sound slike all you need to do to make sure that locks are not held is to allow system calls to return before trying to do the suspend/kexec/etc. that sounds like not only a trivial thing to do, but something that would probably be done anyway. If you could actually do it, it would work. But you can't do it. If it were feasible, the freezer would have used that approach in the first place. For one thing, checking for a suspend-in-progress at the beginning of each and every system call would add overhead to a hot path in the kernel, one which is already very heavily optimized. People wouldn't stand for it. I thought that the suspend stuff did this easily, but the freezer really starts running into trouble when it wants to freeze some things, but not other things. this seems to be the biggest area of churn and problems. although syscalls that then call out to userspace tasks before they can complete cause potential deadlocks (without that issue you can just wait until all syscalls have returned, and not allow anything to issue new syscalls) is this the issue that's killing FUSE+suspend? You get similar problems from system calls that wait in kernel mode until something has happened. For example, a read() call for the console device will wait until somebody types on the keyboard. At any point in time, many (or even most) user threads are blocked in a system call. but are locks held while they are blocked like this? But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. here is where you loose me. why should jumping back to the original kernel immedialty start running these processes? Let's let kernel K1 be the original kernel, the one which is going into hibernation. Kernel K2 is the one started by kexec to write out the memory image. Your question becomes: Why should K2 jumping back to K1 cause K1 immediately to start running user tasks? Answer: Because K1 has been running user tasks all along (except while K2 was active) and nothing has told it to stop. In fact, about the only things which _can_ cause K1 to stop running user threads are the freezer (which you want to eliminate) and disabling interrupts (not possible since some drivers require interrupts to be enabled when putting devices in low-power mode). when you jump to a body of code you jump to a specific point in the code, not to some nebulous 'everything running' state. the process of doing a kexec requires things to happen in the drivers before normal activity can happen, so there is a phase in there where the kernel being jumped to has drivers initializing, but still does not allow anything else to run. So when K2 starts up, it will have a phase in which user threads don't run. That doesn't affect K1. When K2 returns to K1, K1 does not go through this sort of phase. It simply picks up from where it left off. then how can it restart drivers before the user threads need them? why can't this phase be extended to allow for the possibility of transitioning these drivers to a sleep mode instead of to full operation? Indeed, Rafael has suggested that K2 be responsible for putting devices in low-power mode. This has the disadvantage of requiring K2 to include drivers for every device used by K1, but otherwise it would work. However there still remains the problem of user tasks running after devices are supposed to be quiescent and before K1 starts. There's currently nothing to stop such tasks from making I/O requests and thereby causing a quiescent device to become active again. but if the devices are in low power mode then K1 needs to get them out of low power mode before user tasks try to access them. The situation as regards locking is harder to discuss since I don't know of any code examples to use as a guide. The fact remains that if user tasks aren't frozen then they can make system calls, and while running in kernel mode they can acquire locks, which might cause problems -- even though I can't identify any definite examples. yes, if userspace is running jobs and submitting I/O and system calls while drivers are trying to initalize there is a big problem, but I am missing the reason this must be the case. We aren't talking about drivers initializing devices. We are talking about what happens during the time when drivers are trying to quiesce devices (i.e., before K1 has started up K2) or power them down (after K2 has returned to K1). or if you are doing a resume instead of a suspend to ram the drivers need to initialize or otherwise move to full power on K1 before user tasks hit them. the
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Nigel Cunningham wrote: > Take a step back for a second. > > The problem we're facing now is that we're getting some userspace threads, > used in processing I/O, that are functioning as exceptions to the "freeze > userspace, then freezeable kernel threads" rule. They are only exceptions > because of that role in processing I/O - because they're de facto kernel > threads. So, if we orient our thinking more in terms of I/O processing and > less in terms of the userspace/kernelspace distinction, we'll have a > solution: > > 1) Freeze processes that aren't fs related (ie stop them generating I/O). The problem here is that with things like FUSE, _every_ process is potentially fs related. Nothing prevents a FUSE thread from doing IPC with any other thread. > 2) Flush pending I/O. > 3) Freeze filesystems in reverse order of dependency, the primary purpose > being to stop them generating further I/O on their metadata. > > Locks that are being held are only being held because work is being done. If > we progressively focus on threads in terms of their create/process work > dependencies, we'll see that the problem isn't at all intractable. As has been mentioned before, keeping track of all that dependency information would be very fragile and time-consuming. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote: > > You are confusing "userspace" with "user tasks". And not only that, > > you often use the term "userspace" when you should say "user mode". > > > > If you want I can explain the differences. > > please do, I have been treating all three as the same catagory. Very briefly then: "User mode" and "kernel mode" refer to the CPU's hardware privilege level. A process makes the transition from user mode to kernel mode by executing a system call. Interrupt and exception handlers also run in kernel mode, but they generally are not considered to be part of any process. The reverse transition occurs when a process returns from a system call, or when an interrupt which occurred while the CPU was in user mode completes. (It's interesting to note that system calls are somewhat similar to interrupts; in fact sometimes they are implemented by a "software interrupt".) "Kernel threads" are processes that run entirely in kernel mode. They usually don't have a memory mapping for any user-owned memory and they never go into user mode. All other processes are "user threads". "Userspace" is a rather general term referring to things not in the kernel. It comprises both user tasks (while running in user mode) and user memory. > Ok, I did misunderstand you. it sound slike all you need to do to make > sure that locks are not held is to allow system calls to return before > trying to do the suspend/kexec/etc. that sounds like not only a trivial > thing to do, but something that would probably be done anyway. If you could actually do it, it would work. But you can't do it. If it were feasible, the freezer would have used that approach in the first place. For one thing, checking for a suspend-in-progress at the beginning of each and every system call would add overhead to a hot path in the kernel, one which is already very heavily optimized. People wouldn't stand for it. > although syscalls that then call out to userspace tasks before they can > complete cause potential deadlocks (without that issue you can just wait > until all syscalls have returned, and not allow anything to issue new > syscalls) is this the issue that's killing FUSE+suspend? You get similar problems from system calls that wait in kernel mode until something has happened. For example, a read() call for the console device will wait until somebody types on the keyboard. At any point in time, many (or even most) user threads are blocked in a system call. > > Here's what you are missing: > > > > The new kexec approach eliminates the freezer and relies instead on the > > fact that none of the tasks in the original kernel can execute while > > the new kexec'd kernel is running. This means the new kernel can write > > out a memory image with no fear of interference or corruption. > > correct > > > But it also means that tasks which otherwise would have been frozen are > > actually free to run before the kexec call is made (and after the call > > returns, if the kexec'd kernel returns back to the original kernel). > > Any driver which was written with the assumption that tasks would be > > frozen at those times will need to be changed. > > here is where you loose me. > > why should jumping back to the original kernel immedialty start running > these processes? Let's let kernel K1 be the original kernel, the one which is going into hibernation. Kernel K2 is the one started by kexec to write out the memory image. Your question becomes: Why should K2 jumping back to K1 cause K1 immediately to start running user tasks? Answer: Because K1 has been running user tasks all along (except while K2 was active) and nothing has told it to stop. In fact, about the only things which _can_ cause K1 to stop running user threads are the freezer (which you want to eliminate) and disabling interrupts (not possible since some drivers require interrupts to be enabled when putting devices in low-power mode). > the process of doing a kexec requires things to happen in > the drivers before normal activity can happen, so there is a phase in > there where the kernel being jumped to has drivers initializing, but still > does not allow anything else to run. So when K2 starts up, it will have a phase in which user threads don't run. That doesn't affect K1. When K2 returns to K1, K1 does not go through this sort of phase. It simply picks up from where it left off. > why can't this phase be extended to > allow for the possibility of transitioning these drivers to a sleep mode > instead of to full operation? Indeed, Rafael has suggested that K2 be responsible for putting devices in low-power mode. This has the disadvantage of requiring K2 to include drivers for every device used by K1, but otherwise it would work. However there still remains the problem of user tasks running after devices are supposed to be quiescent and before K1 starts. There's currently nothing to stop such tasks from
Re: [linux-pm] Re: Hibernation considerations
Am Samstag 21 Juli 2007 schrieb Alan Stern: > On Fri, 20 Jul 2007, Oliver Neukum wrote: > > > > We already have a pre-suspend notification available for drivers that > > > need to allocate large amounts of memory. > > > > Is that facility fine grained enough? > > It's a notifier chain that gets called at several points during the > suspend transition. One of those points is right at the start, while > userspace is still running and reasonably large amounts of memory can > be allocated. > > Is it fine-grained enough? I don't know -- hard to tell, since nothing > much is using it yet. > > > > You are correct about the need to delay/stop device addition. I don't > > > know how this can be done in general; each code path calling > > > device_add() may have to be treated individually. > > > > What about the old API? > > What old API do you mean? The find_device() stuff. > > Do we have to block module loading? > > No. Registering new drivers is okay, registering new devices is bad. What if it is a driver for virtual devices that don't need probe() for actual hardware? > Of course, some modules do want to register a new device in their init > method. I don't know what we should do about them. Force the > registration to fail, I suppose. How often will people suspend while a > module is loading? > > > What happens if a scsi error handler is woken? If it cannot be woken, > > how are errors handled? > > Why should the error handler wake up? There isn't supposed to be any > I/O going on, hence no errors to handle. What about shared busses? Firewire, FibreChannel? They can get external resets, etc ... Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
> Alan has recently proposed to introduce "suspend locks" to be acquired during > a suspend/hibernation and such that we can leave uninterruptible tasks that > don't hold any of them. Sounds sane. A global rwsem could be acquired for read by drivers, and for write by suspend/hibernate. Just need to add it to all drivers that have PM, but that shouldn't need a heroic effort. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 15:08, Miklos Szeredi wrote: > > > > The reason is that we want them to "park" in safe places, ie. where > > > > there > > > > are no locks held etc. Thus, these safe places need to be chosen > > > > somehow > > > > and since they are not marked throughout the code, we choose the obvious > > > > one. :-) > > > > > > Why shouldn't locks be held? > > > > > > No locks which are required for suspend must be held, sure. But > > > otherwise holding locks doesn't matter at all. > > > > If you can provide a way to tell them apart, this would work. > > Without some marking we can't tell obviously. > > Are there many such locks? We can easily check by adding some > debugging code to the lock primitives, to make them yell if they are > used during suspend. This way we can only obtain information from systems that use hibernation quite often. Alan has recently proposed to introduce "suspend locks" to be acquired during a suspend/hibernation and such that we can leave uninterruptible tasks that don't hold any of them. Unfortunately, I have no link to his original message at hand. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
> > > The reason is that we want them to "park" in safe places, ie. where there > > > are no locks held etc. Thus, these safe places need to be chosen somehow > > > and since they are not marked throughout the code, we choose the obvious > > > one. :-) > > > > Why shouldn't locks be held? > > > > No locks which are required for suspend must be held, sure. But > > otherwise holding locks doesn't matter at all. > > If you can provide a way to tell them apart, this would work. Without some marking we can't tell obviously. Are there many such locks? We can easily check by adding some debugging code to the lock primitives, to make them yell if they are used during suspend. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Am Montag 23 Juli 2007 schrieb Miklos Szeredi: > > The reason is that we want them to "park" in safe places, ie. where there > > are no locks held etc. Thus, these safe places need to be chosen somehow > > and since they are not marked throughout the code, we choose the obvious > > one. :-) > > Why shouldn't locks be held? > > No locks which are required for suspend must be held, sure. But > otherwise holding locks doesn't matter at all. If you can provide a way to tell them apart, this would work. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 14:14, Miklos Szeredi wrote: > > On Monday, 23 July 2007 12:24, Miklos Szeredi wrote: > > > > > The only thing to do is what Rafael has been working on: unfreeze > > > > > things, hope the tasks sort themselves out, and try again. > > > > > > > > That's what I'm questioning. Is there a more reliable way and we've > > > > just given up too quickly? > > > > > > There obviously _are_ more reliable ways. A trivial one seems to be > > > to just not require user tasks to finish syscalls. > > > > > > Yeah, stopping user processes outside the kernel is convenient, but > > > there's no fundamental reason why it is the only place where those > > > tasks can be stopped. > > > > The reason is that we want them to "park" in safe places, ie. where there > > are no locks held etc. Thus, these safe places need to be chosen somehow > > and since they are not marked throughout the code, we choose the obvious > > one. :-) > > Why shouldn't locks be held? > > No locks which are required for suspend must be held, sure. But > otherwise holding locks doesn't matter at all. > > And I'm not saying that is trivial to do, but it might not be too hard > either. > > Rafael, can you please tell, what happened to that patch, that did not > wait for tasks in uninterruptible sleep to be frozen? > > That seemed like a magnificent approach compared to anything that has > been proposed since. Well, the freezer have failed to freeze tasks for a couple of times in my test setup and I've had a couple of hangs. I have an idea how to improve it, but that still requires some pending freezer patches to go first. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
> On Monday, 23 July 2007 12:24, Miklos Szeredi wrote: > > > > The only thing to do is what Rafael has been working on: unfreeze > > > > things, hope the tasks sort themselves out, and try again. > > > > > > That's what I'm questioning. Is there a more reliable way and we've > > > just given up too quickly? > > > > There obviously _are_ more reliable ways. A trivial one seems to be > > to just not require user tasks to finish syscalls. > > > > Yeah, stopping user processes outside the kernel is convenient, but > > there's no fundamental reason why it is the only place where those > > tasks can be stopped. > > The reason is that we want them to "park" in safe places, ie. where there > are no locks held etc. Thus, these safe places need to be chosen somehow > and since they are not marked throughout the code, we choose the obvious > one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. And I'm not saying that is trivial to do, but it might not be too hard either. Rafael, can you please tell, what happened to that patch, that did not wait for tasks in uninterruptible sleep to be frozen? That seemed like a magnificent approach compared to anything that has been proposed since. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 12:24, Miklos Szeredi wrote: > > > The only thing to do is what Rafael has been working on: unfreeze > > > things, hope the tasks sort themselves out, and try again. > > > > That's what I'm questioning. Is there a more reliable way and we've > > just given up too quickly? > > There obviously _are_ more reliable ways. A trivial one seems to be > to just not require user tasks to finish syscalls. > > Yeah, stopping user processes outside the kernel is convenient, but > there's no fundamental reason why it is the only place where those > tasks can be stopped. The reason is that we want them to "park" in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) > And there are very fundamental reasons to _not_ require this. Not > just in the fuse case, but in any case where a syscall requires > another user task to run before it can be finished (e.g. NFS over > OpenVPN). Yeah. Mark the safe places for us and we'll use them. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
> > The only thing to do is what Rafael has been working on: unfreeze > > things, hope the tasks sort themselves out, and try again. > > That's what I'm questioning. Is there a more reliable way and we've > just given up too quickly? There obviously _are_ more reliable ways. A trivial one seems to be to just not require user tasks to finish syscalls. Yeah, stopping user processes outside the kernel is convenient, but there's no fundamental reason why it is the only place where those tasks can be stopped. And there are very fundamental reasons to _not_ require this. Not just in the fuse case, but in any case where a syscall requires another user task to run before it can be finished (e.g. NFS over OpenVPN). Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? There obviously _are_ more reliable ways. A trivial one seems to be to just not require user tasks to finish syscalls. Yeah, stopping user processes outside the kernel is convenient, but there's no fundamental reason why it is the only place where those tasks can be stopped. And there are very fundamental reasons to _not_ require this. Not just in the fuse case, but in any case where a syscall requires another user task to run before it can be finished (e.g. NFS over OpenVPN). Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 12:24, Miklos Szeredi wrote: The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? There obviously _are_ more reliable ways. A trivial one seems to be to just not require user tasks to finish syscalls. Yeah, stopping user processes outside the kernel is convenient, but there's no fundamental reason why it is the only place where those tasks can be stopped. The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) And there are very fundamental reasons to _not_ require this. Not just in the fuse case, but in any case where a syscall requires another user task to run before it can be finished (e.g. NFS over OpenVPN). Yeah. Mark the safe places for us and we'll use them. Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 12:24, Miklos Szeredi wrote: The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? There obviously _are_ more reliable ways. A trivial one seems to be to just not require user tasks to finish syscalls. Yeah, stopping user processes outside the kernel is convenient, but there's no fundamental reason why it is the only place where those tasks can be stopped. The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. And I'm not saying that is trivial to do, but it might not be too hard either. Rafael, can you please tell, what happened to that patch, that did not wait for tasks in uninterruptible sleep to be frozen? That seemed like a magnificent approach compared to anything that has been proposed since. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 14:14, Miklos Szeredi wrote: On Monday, 23 July 2007 12:24, Miklos Szeredi wrote: The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? There obviously _are_ more reliable ways. A trivial one seems to be to just not require user tasks to finish syscalls. Yeah, stopping user processes outside the kernel is convenient, but there's no fundamental reason why it is the only place where those tasks can be stopped. The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. And I'm not saying that is trivial to do, but it might not be too hard either. Rafael, can you please tell, what happened to that patch, that did not wait for tasks in uninterruptible sleep to be frozen? That seemed like a magnificent approach compared to anything that has been proposed since. Well, the freezer have failed to freeze tasks for a couple of times in my test setup and I've had a couple of hangs. I have an idea how to improve it, but that still requires some pending freezer patches to go first. Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Am Montag 23 Juli 2007 schrieb Miklos Szeredi: The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. If you can provide a way to tell them apart, this would work. Regards Oliver - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. If you can provide a way to tell them apart, this would work. Without some marking we can't tell obviously. Are there many such locks? We can easily check by adding some debugging code to the lock primitives, to make them yell if they are used during suspend. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 15:08, Miklos Szeredi wrote: The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. If you can provide a way to tell them apart, this would work. Without some marking we can't tell obviously. Are there many such locks? We can easily check by adding some debugging code to the lock primitives, to make them yell if they are used during suspend. This way we can only obtain information from systems that use hibernation quite often. Alan has recently proposed to introduce suspend locks to be acquired during a suspend/hibernation and such that we can leave uninterruptible tasks that don't hold any of them. Unfortunately, I have no link to his original message at hand. Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Alan has recently proposed to introduce suspend locks to be acquired during a suspend/hibernation and such that we can leave uninterruptible tasks that don't hold any of them. Sounds sane. A global rwsem could be acquired for read by drivers, and for write by suspend/hibernate. Just need to add it to all drivers that have PM, but that shouldn't need a heroic effort. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Am Samstag 21 Juli 2007 schrieb Alan Stern: On Fri, 20 Jul 2007, Oliver Neukum wrote: We already have a pre-suspend notification available for drivers that need to allocate large amounts of memory. Is that facility fine grained enough? It's a notifier chain that gets called at several points during the suspend transition. One of those points is right at the start, while userspace is still running and reasonably large amounts of memory can be allocated. Is it fine-grained enough? I don't know -- hard to tell, since nothing much is using it yet. You are correct about the need to delay/stop device addition. I don't know how this can be done in general; each code path calling device_add() may have to be treated individually. What about the old API? What old API do you mean? The find_device() stuff. Do we have to block module loading? No. Registering new drivers is okay, registering new devices is bad. What if it is a driver for virtual devices that don't need probe() for actual hardware? Of course, some modules do want to register a new device in their init method. I don't know what we should do about them. Force the registration to fail, I suppose. How often will people suspend while a module is loading? What happens if a scsi error handler is woken? If it cannot be woken, how are errors handled? Why should the error handler wake up? There isn't supposed to be any I/O going on, hence no errors to handle. What about shared busses? Firewire, FibreChannel? They can get external resets, etc ... Regards Oliver - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote: You are confusing userspace with user tasks. And not only that, you often use the term userspace when you should say user mode. If you want I can explain the differences. please do, I have been treating all three as the same catagory. Very briefly then: User mode and kernel mode refer to the CPU's hardware privilege level. A process makes the transition from user mode to kernel mode by executing a system call. Interrupt and exception handlers also run in kernel mode, but they generally are not considered to be part of any process. The reverse transition occurs when a process returns from a system call, or when an interrupt which occurred while the CPU was in user mode completes. (It's interesting to note that system calls are somewhat similar to interrupts; in fact sometimes they are implemented by a software interrupt.) Kernel threads are processes that run entirely in kernel mode. They usually don't have a memory mapping for any user-owned memory and they never go into user mode. All other processes are user threads. Userspace is a rather general term referring to things not in the kernel. It comprises both user tasks (while running in user mode) and user memory. Ok, I did misunderstand you. it sound slike all you need to do to make sure that locks are not held is to allow system calls to return before trying to do the suspend/kexec/etc. that sounds like not only a trivial thing to do, but something that would probably be done anyway. If you could actually do it, it would work. But you can't do it. If it were feasible, the freezer would have used that approach in the first place. For one thing, checking for a suspend-in-progress at the beginning of each and every system call would add overhead to a hot path in the kernel, one which is already very heavily optimized. People wouldn't stand for it. although syscalls that then call out to userspace tasks before they can complete cause potential deadlocks (without that issue you can just wait until all syscalls have returned, and not allow anything to issue new syscalls) is this the issue that's killing FUSE+suspend? You get similar problems from system calls that wait in kernel mode until something has happened. For example, a read() call for the console device will wait until somebody types on the keyboard. At any point in time, many (or even most) user threads are blocked in a system call. Here's what you are missing: The new kexec approach eliminates the freezer and relies instead on the fact that none of the tasks in the original kernel can execute while the new kexec'd kernel is running. This means the new kernel can write out a memory image with no fear of interference or corruption. correct But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. here is where you loose me. why should jumping back to the original kernel immedialty start running these processes? Let's let kernel K1 be the original kernel, the one which is going into hibernation. Kernel K2 is the one started by kexec to write out the memory image. Your question becomes: Why should K2 jumping back to K1 cause K1 immediately to start running user tasks? Answer: Because K1 has been running user tasks all along (except while K2 was active) and nothing has told it to stop. In fact, about the only things which _can_ cause K1 to stop running user threads are the freezer (which you want to eliminate) and disabling interrupts (not possible since some drivers require interrupts to be enabled when putting devices in low-power mode). the process of doing a kexec requires things to happen in the drivers before normal activity can happen, so there is a phase in there where the kernel being jumped to has drivers initializing, but still does not allow anything else to run. So when K2 starts up, it will have a phase in which user threads don't run. That doesn't affect K1. When K2 returns to K1, K1 does not go through this sort of phase. It simply picks up from where it left off. why can't this phase be extended to allow for the possibility of transitioning these drivers to a sleep mode instead of to full operation? Indeed, Rafael has suggested that K2 be responsible for putting devices in low-power mode. This has the disadvantage of requiring K2 to include drivers for every device used by K1, but otherwise it would work. However there still remains the problem of user tasks running after devices are supposed to be quiescent and before K1 starts. There's currently nothing to stop such tasks from making I/O requests and thereby causing a quiescent device to become active
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Nigel Cunningham wrote: Take a step back for a second. The problem we're facing now is that we're getting some userspace threads, used in processing I/O, that are functioning as exceptions to the freeze userspace, then freezeable kernel threads rule. They are only exceptions because of that role in processing I/O - because they're de facto kernel threads. So, if we orient our thinking more in terms of I/O processing and less in terms of the userspace/kernelspace distinction, we'll have a solution: 1) Freeze processes that aren't fs related (ie stop them generating I/O). The problem here is that with things like FUSE, _every_ process is potentially fs related. Nothing prevents a FUSE thread from doing IPC with any other thread. 2) Flush pending I/O. 3) Freeze filesystems in reverse order of dependency, the primary purpose being to stop them generating further I/O on their metadata. Locks that are being held are only being held because work is being done. If we progressively focus on threads in terms of their create/process work dependencies, we'll see that the problem isn't at all intractable. As has been mentioned before, keeping track of all that dependency information would be very fragile and time-consuming. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Alan Stern wrote: On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote: Ok, I did misunderstand you. it sound slike all you need to do to make sure that locks are not held is to allow system calls to return before trying to do the suspend/kexec/etc. that sounds like not only a trivial thing to do, but something that would probably be done anyway. If you could actually do it, it would work. But you can't do it. If it were feasible, the freezer would have used that approach in the first place. For one thing, checking for a suspend-in-progress at the beginning of each and every system call would add overhead to a hot path in the kernel, one which is already very heavily optimized. People wouldn't stand for it. I thought that the suspend stuff did this easily, but the freezer really starts running into trouble when it wants to freeze some things, but not other things. this seems to be the biggest area of churn and problems. although syscalls that then call out to userspace tasks before they can complete cause potential deadlocks (without that issue you can just wait until all syscalls have returned, and not allow anything to issue new syscalls) is this the issue that's killing FUSE+suspend? You get similar problems from system calls that wait in kernel mode until something has happened. For example, a read() call for the console device will wait until somebody types on the keyboard. At any point in time, many (or even most) user threads are blocked in a system call. but are locks held while they are blocked like this? But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. here is where you loose me. why should jumping back to the original kernel immedialty start running these processes? Let's let kernel K1 be the original kernel, the one which is going into hibernation. Kernel K2 is the one started by kexec to write out the memory image. Your question becomes: Why should K2 jumping back to K1 cause K1 immediately to start running user tasks? Answer: Because K1 has been running user tasks all along (except while K2 was active) and nothing has told it to stop. In fact, about the only things which _can_ cause K1 to stop running user threads are the freezer (which you want to eliminate) and disabling interrupts (not possible since some drivers require interrupts to be enabled when putting devices in low-power mode). when you jump to a body of code you jump to a specific point in the code, not to some nebulous 'everything running' state. the process of doing a kexec requires things to happen in the drivers before normal activity can happen, so there is a phase in there where the kernel being jumped to has drivers initializing, but still does not allow anything else to run. So when K2 starts up, it will have a phase in which user threads don't run. That doesn't affect K1. When K2 returns to K1, K1 does not go through this sort of phase. It simply picks up from where it left off. then how can it restart drivers before the user threads need them? why can't this phase be extended to allow for the possibility of transitioning these drivers to a sleep mode instead of to full operation? Indeed, Rafael has suggested that K2 be responsible for putting devices in low-power mode. This has the disadvantage of requiring K2 to include drivers for every device used by K1, but otherwise it would work. However there still remains the problem of user tasks running after devices are supposed to be quiescent and before K1 starts. There's currently nothing to stop such tasks from making I/O requests and thereby causing a quiescent device to become active again. but if the devices are in low power mode then K1 needs to get them out of low power mode before user tasks try to access them. The situation as regards locking is harder to discuss since I don't know of any code examples to use as a guide. The fact remains that if user tasks aren't frozen then they can make system calls, and while running in kernel mode they can acquire locks, which might cause problems -- even though I can't identify any definite examples. yes, if userspace is running jobs and submitting I/O and system calls while drivers are trying to initalize there is a big problem, but I am missing the reason this must be the case. We aren't talking about drivers initializing devices. We are talking about what happens during the time when drivers are trying to quiesce devices (i.e., before K1 has started up K2) or power them down (after K2 has returned to K1). or if you are doing a resume instead of a suspend to ram the drivers need to initialize or otherwise move to full power on K1 before user tasks hit them. the
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Oliver Neukum wrote: Am Montag 23 Juli 2007 schrieb Miklos Szeredi: The reason is that we want them to park in safe places, ie. where there are no locks held etc. Thus, these safe places need to be chosen somehow and since they are not marked throughout the code, we choose the obvious one. :-) Why shouldn't locks be held? No locks which are required for suspend must be held, sure. But otherwise holding locks doesn't matter at all. If you can provide a way to tell them apart, this would work. can you just tell the driver to try and suspend and if it reports back that it fails back out of the suspend? or will the driver deadlock instead of reporting a failure if a lock is held. David Lang
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007 [EMAIL PROTECTED] wrote: For one thing, checking for a suspend-in-progress at the beginning of each and every system call would add overhead to a hot path in the kernel, one which is already very heavily optimized. People wouldn't stand for it. I thought that the suspend stuff did this easily, It does not do it at all. Do you know how the freezer works? but the freezer really starts running into trouble when it wants to freeze some things, but not other things. this seems to be the biggest area of churn and problems. No. The freezer starts running into trouble when it wants to freeze a thread but can't, because that thread is waiting for some event to occur and the only thread which can cause the event is already frozen. Or is itself waiting for a third thread which is already frozen... You get similar problems from system calls that wait in kernel mode until something has happened. For example, a read() call for the console device will wait until somebody types on the keyboard. At any point in time, many (or even most) user threads are blocked in a system call. but are locks held while they are blocked like this? Sometimes they are, sometimes they aren't. Let's let kernel K1 be the original kernel, the one which is going into hibernation. Kernel K2 is the one started by kexec to write out the memory image. Your question becomes: Why should K2 jumping back to K1 cause K1 immediately to start running user tasks? Answer: Because K1 has been running user tasks all along (except while K2 was active) and nothing has told it to stop. In fact, about the only things which _can_ cause K1 to stop running user threads are the freezer (which you want to eliminate) and disabling interrupts (not possible since some drivers require interrupts to be enabled when putting devices in low-power mode). when you jump to a body of code you jump to a specific point in the code, not to some nebulous 'everything running' state. How is that relevant? When K2 jumps back to K1, it jumps to some designated location in K1. It might just after the place where K1 called K2; I'm not familiar with the details of kexec. In any event, K1 will still be in the same state as it was when it called K2. So when K2 starts up, it will have a phase in which user threads don't run. That doesn't affect K1. When K2 returns to K1, K1 does not go through this sort of phase. It simply picks up from where it left off. then how can it restart drivers before the user threads need them? It can't. Indeed, in the absence of a freezer, user threads will need devices (more accurately, will submit I/O requests for devices) that have to be kept quiescent or low-power. Drivers will need to delay those requests until the devices are returned to full operation. That's exactly what I've been saying all along: Drivers will need to be changed to delay I/O requests, if there is no freezer. However there still remains the problem of user tasks running after devices are supposed to be quiescent and before K1 starts. There's currently nothing to stop such tasks from making I/O requests and thereby causing a quiescent device to become active again. but if the devices are in low power mode then K1 needs to get them out of low power mode before user tasks try to access them. No -- which is good because it can't. If a user task is running there's no way to stop it from submitting I/O requests. K1 needs to delay these requests until after the device has returned to full operation. We aren't talking about drivers initializing devices. We are talking about what happens during the time when drivers are trying to quiesce devices (i.e., before K1 has started up K2) or power them down (after K2 has returned to K1). or if you are doing a resume instead of a suspend to ram the drivers need to initialize or otherwise move to full power on K1 before user tasks hit them. Correct. User tasks are allowed to submit requests, but the requests can't be carried out until the device returns to full operation. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Tuesday 24 July 2007 01:23:15 Alan Stern wrote: On Mon, 23 Jul 2007, Nigel Cunningham wrote: Take a step back for a second. The problem we're facing now is that we're getting some userspace threads, used in processing I/O, that are functioning as exceptions to the freeze userspace, then freezeable kernel threads rule. They are only exceptions because of that role in processing I/O - because they're de facto kernel threads. So, if we orient our thinking more in terms of I/O processing and less in terms of the userspace/kernelspace distinction, we'll have a solution: 1) Freeze processes that aren't fs related (ie stop them generating I/O). The problem here is that with things like FUSE, _every_ process is potentially fs related. Nothing prevents a FUSE thread from doing IPC with any other thread. Yes, but the fuse thread is going to know what other thread it's doing IPC with, so it can get that thread flagged too. 2) Flush pending I/O. 3) Freeze filesystems in reverse order of dependency, the primary purpose being to stop them generating further I/O on their metadata. Locks that are being held are only being held because work is being done. If we progressively focus on threads in terms of their create/process work dependencies, we'll see that the problem isn't at all intractable. As has been mentioned before, keeping track of all that dependency information would be very fragile and time-consuming. I disagree. It's at least going to be less fragile and time-consuming then maintaining new/extra code for kexec. Nigel pgpKo1OjveuTs.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Monday, 23 July 2007 23:55, Nigel Cunningham wrote: Hi. On Tuesday 24 July 2007 01:23:15 Alan Stern wrote: On Mon, 23 Jul 2007, Nigel Cunningham wrote: Take a step back for a second. The problem we're facing now is that we're getting some userspace threads, used in processing I/O, that are functioning as exceptions to the freeze userspace, then freezeable kernel threads rule. They are only exceptions because of that role in processing I/O - because they're de facto kernel threads. So, if we orient our thinking more in terms of I/O processing and less in terms of the userspace/kernelspace distinction, we'll have a solution: 1) Freeze processes that aren't fs related (ie stop them generating I/O). The problem here is that with things like FUSE, _every_ process is potentially fs related. Nothing prevents a FUSE thread from doing IPC with any other thread. Yes, but the fuse thread is going to know what other thread it's doing IPC with, so it can get that thread flagged too. Yes, but that thread may do IPC with yet another one and so on. 2) Flush pending I/O. 3) Freeze filesystems in reverse order of dependency, the primary purpose being to stop them generating further I/O on their metadata. Locks that are being held are only being held because work is being done. If we progressively focus on threads in terms of their create/process work dependencies, we'll see that the problem isn't at all intractable. As has been mentioned before, keeping track of all that dependency information would be very fragile and time-consuming. I disagree. It's at least going to be less fragile and time-consuming then maintaining new/extra code for kexec. Well, I think the issue is real, so we need to find a solution (the simpler, the better) and that need not be related to kexec. ;-) Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Nigel Cunningham wrote: Hi Alan. On Monday 23 July 2007 01:26:23 Alan Stern wrote: On Sun, 22 Jul 2007, Nigel Cunningham wrote: Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. It does seem to be a losing battle, but I'm wondering whether that's really because it's an intractable problem, or because people have given up on it before its time. We are talking about a computer system, so things should be predictable. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, the locking issue. If we could call some function to say "What process holds this lock?", then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. this sounds like the standard priority inversion problem taken to extremes. Ingo has been working this issue, but IIRC the problem is that tracking what owns the lock so that you can get that thing to run ends up being enough overhead that it's not acceptable in the general case. David Lang The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Monday 23 July 2007 10:04:43 Paul Mackerras wrote: > Nigel Cunningham writes: > > > I guess I want to persist because all of these issues aren't utterly > > unsolvable. It's just that we don't have the infrastructure yet to > > figure out the solutions to these issues trivially. Take, for example, > > Ever heard of the halting problem? :) It's not just a matter of > infrastructure. You very quickly get into questions that are > mathematically undecideable. Is this the halting problem, though? > > the locking issue. If we could call some function to say "What process > > holds this lock?", then task A could know that it's waiting on task B > > and put that information somewhere. We could then use the information > > to freeze task B before task A. > > But how would that help? If task B holds the lock, then we can't > freeze it until it's released the lock. Then the question is, what > does task B need in order to get to the point where it releases the > lock? And so on. It rapidly gets not just extremely messy, but > actually impossible to compute in general. Take a step back for a second. The problem we're facing now is that we're getting some userspace threads, used in processing I/O, that are functioning as exceptions to the "freeze userspace, then freezeable kernel threads" rule. They are only exceptions because of that role in processing I/O - because they're de facto kernel threads. So, if we orient our thinking more in terms of I/O processing and less in terms of the userspace/kernelspace distinction, we'll have a solution: 1) Freeze processes that aren't fs related (ie stop them generating I/O). 2) Flush pending I/O. 3) Freeze filesystems in reverse order of dependency, the primary purpose being to stop them generating further I/O on their metadata. Locks that are being held are only being held because work is being done. If we progressively focus on threads in terms of their create/process work dependencies, we'll see that the problem isn't at all intractable. Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgpjTSNWacYUf.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
Nigel Cunningham writes: > I guess I want to persist because all of these issues aren't utterly > unsolvable. It's just that we don't have the infrastructure yet to > figure out the solutions to these issues trivially. Take, for example, Ever heard of the halting problem? :) It's not just a matter of infrastructure. You very quickly get into questions that are mathematically undecideable. > the locking issue. If we could call some function to say "What process > holds this lock?", then task A could know that it's waiting on task B > and put that information somewhere. We could then use the information > to freeze task B before task A. But how would that help? If task B holds the lock, then we can't freeze it until it's released the lock. Then the question is, what does task B need in order to get to the point where it releases the lock? And so on. It rapidly gets not just extremely messy, but actually impossible to compute in general. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday 23 July 2007 09:09:21 Rafael J. Wysocki wrote: > Hi, > > On Monday, 23 July 2007 00:42, Nigel Cunningham wrote: > > Hi Alan. > > > > On Monday 23 July 2007 01:26:23 Alan Stern wrote: > > > On Sun, 22 Jul 2007, Nigel Cunningham wrote: > > > > > > > Hi. > > > > > > > > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: > > > > > It seems that you could still potentially get a failure to freeze if one > > > > > FUSE process depends on another, and the one that is frozen second just > > > > > happens to be waiting on the one that is frozen first when it is frozen. > > > > > I admit that this situation is unlikely, and perhaps acceptable. > > > > > > > > > > A larger concern is that it seems that freezing FUSE processes at all > > > > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > > > > > filesystem is loopback mounted from a FUSE filesystem. In that case, if > > > > > you attempt to sync or free memory once FUSE is frozen, you are sure to > > > > > get a deadlock. > > > > > > > > Ok. So then (in response to Alan too), how about keeping a tree of mounts, > > > > akin to the device tree, and working from the deepest nodes up? (In > > > > conjunction with what I already suggested)? > > > > > > Face it, Nigel, this is a losing battle. You can try to come up with > > > ever-more complex schemes to try and force FUSE into the freezer's > > > framework, but it just won't fit. Or if it does, the next filesystem > > > to come along will require an even more baroque type of special-case > > > handling. > > > > It does seem to be a losing battle, but I'm wondering whether that's really > > because it's an intractable problem, or because people have given up on it > > before its time. We are talking about a computer system, so things should be > > predictable. > > > > > The general problem is that task A may be in an unfreezable state, > > > waiting for task B to do something, while task B is already frozen. > > > Since there's no reasonable way to determine that A really is waiting > > > for B, you're just stuck. (To make matters worse, A may not even > > > realize which task it is waiting for; it may know only that it's > > > waiting for somebody to do something!) A and B could be user tasks, > > > kernel threads, or one of each. > > > > I guess I want to persist because all of these issues aren't utterly > > unsolvable. It's just that we don't have the infrastructure yet to figure out > > the solutions to these issues trivially. Take, for example, the locking > > issue. If we could call some function to say "What process holds this lock?", > > then task A could know that it's waiting on task B and put that information > > somewhere. We could then use the information to freeze task B before task A. > > > > > > > The only thing to do is what Rafael has been working on: unfreeze > > > things, hope the tasks sort themselves out, and try again. > > > > That's what I'm questioning. Is there a more reliable way and we've just given > > up too quickly? > > Well, there probably is one, but it likely would require us to make changes > that wouldn't be accepted by some people and thus would never be merged. Well, doesn't that imply that we should at least look into what changes would be needed? If they wouldn't be accepted by some people, then either the objections would be reasonable or they wouldn't (and would hopefully be overridden). But we can't know if we don't try. Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgptv0SjDRopM.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
Hi, On Monday, 23 July 2007 00:42, Nigel Cunningham wrote: > Hi Alan. > > On Monday 23 July 2007 01:26:23 Alan Stern wrote: > > On Sun, 22 Jul 2007, Nigel Cunningham wrote: > > > > > Hi. > > > > > > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: > > > > It seems that you could still potentially get a failure to freeze if one > > > > FUSE process depends on another, and the one that is frozen second just > > > > happens to be waiting on the one that is frozen first when it is frozen. > > > > I admit that this situation is unlikely, and perhaps acceptable. > > > > > > > > A larger concern is that it seems that freezing FUSE processes at all > > > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > > > > filesystem is loopback mounted from a FUSE filesystem. In that case, if > > > > you attempt to sync or free memory once FUSE is frozen, you are sure to > > > > get a deadlock. > > > > > > Ok. So then (in response to Alan too), how about keeping a tree of > > > mounts, > > > akin to the device tree, and working from the deepest nodes up? (In > > > conjunction with what I already suggested)? > > > > Face it, Nigel, this is a losing battle. You can try to come up with > > ever-more complex schemes to try and force FUSE into the freezer's > > framework, but it just won't fit. Or if it does, the next filesystem > > to come along will require an even more baroque type of special-case > > handling. > > It does seem to be a losing battle, but I'm wondering whether that's really > because it's an intractable problem, or because people have given up on it > before its time. We are talking about a computer system, so things should be > predictable. > > > The general problem is that task A may be in an unfreezable state, > > waiting for task B to do something, while task B is already frozen. > > Since there's no reasonable way to determine that A really is waiting > > for B, you're just stuck. (To make matters worse, A may not even > > realize which task it is waiting for; it may know only that it's > > waiting for somebody to do something!) A and B could be user tasks, > > kernel threads, or one of each. > > I guess I want to persist because all of these issues aren't utterly > unsolvable. It's just that we don't have the infrastructure yet to figure out > the solutions to these issues trivially. Take, for example, the locking > issue. If we could call some function to say "What process holds this lock?", > then task A could know that it's waiting on task B and put that information > somewhere. We could then use the information to freeze task B before task A. > > > > The only thing to do is what Rafael has been working on: unfreeze > > things, hope the tasks sort themselves out, and try again. > > That's what I'm questioning. Is there a more reliable way and we've just > given > up too quickly? Well, there probably is one, but it likely would require us to make changes that wouldn't be accepted by some people and thus would never be merged. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi Alan. On Monday 23 July 2007 01:26:23 Alan Stern wrote: > On Sun, 22 Jul 2007, Nigel Cunningham wrote: > > > Hi. > > > > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: > > > It seems that you could still potentially get a failure to freeze if one > > > FUSE process depends on another, and the one that is frozen second just > > > happens to be waiting on the one that is frozen first when it is frozen. > > > I admit that this situation is unlikely, and perhaps acceptable. > > > > > > A larger concern is that it seems that freezing FUSE processes at all > > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > > > filesystem is loopback mounted from a FUSE filesystem. In that case, if > > > you attempt to sync or free memory once FUSE is frozen, you are sure to > > > get a deadlock. > > > > Ok. So then (in response to Alan too), how about keeping a tree of mounts, > > akin to the device tree, and working from the deepest nodes up? (In > > conjunction with what I already suggested)? > > Face it, Nigel, this is a losing battle. You can try to come up with > ever-more complex schemes to try and force FUSE into the freezer's > framework, but it just won't fit. Or if it does, the next filesystem > to come along will require an even more baroque type of special-case > handling. It does seem to be a losing battle, but I'm wondering whether that's really because it's an intractable problem, or because people have given up on it before its time. We are talking about a computer system, so things should be predictable. > The general problem is that task A may be in an unfreezable state, > waiting for task B to do something, while task B is already frozen. > Since there's no reasonable way to determine that A really is waiting > for B, you're just stuck. (To make matters worse, A may not even > realize which task it is waiting for; it may know only that it's > waiting for somebody to do something!) A and B could be user tasks, > kernel threads, or one of each. I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, the locking issue. If we could call some function to say "What process holds this lock?", then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. > The only thing to do is what Rafael has been working on: unfreeze > things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? Regards, Nigel -- Nigel Cunningham Christian Reformed Church of Cobden 103 Curdie Street, Cobden 3266, Victoria, Australia Ph. +61 3 5595 1185 / +61 417 100 574 Communal Worship: 11 am Sunday. pgpVAIGM5vqnS.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Alan Stern wrote: On Sun, 22 Jul 2007, Miklos Szeredi wrote: The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. Have we some proof, that this will untangle the freezing tasks in a limited time? Or will it just make the problem harder to trigger? Of course there's no proof. Just the opposite -- if things get hung up the first time, they might get hung up the second time. And the third... But it ought to make the problem harder to trigger. For the present that's a worthwhile improvement. it gives the system more tries to find a spot in time where the deadlock doesn't happen, if you find one you can continue. but even if things keep getting hung up, at least you are backing out of each try safely and can eventually tell the user "I give up, try shutting some things down and suspending again" David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Alan Stern wrote: On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote: wait a min her, it's possible we are misunderstanding each other. I'd describe it as: You are misunderstanding me. :-) very possibly :-) as I see it. if userspace can aquire locks that prevent the kernel from shutting off (or doing anything else in particular) then it's possible for misbehaving userspace code to stop the kernel by simply choosing to never release the lock. this would be a trivial DOS from userspace. You are confusing "userspace" with "user tasks". And not only that, you often use the term "userspace" when you should say "user mode". If you want I can explain the differences. please do, I have been treating all three as the same catagory. now, if you are talking instead about the fact that when userspace makes a system call, the execution of that system call involves aquiring locks that are released before the system call completes you have a very different situation. That is exactly what I have been talking about. It may be different from what you _thought_, but it's not different from what I actually _said_. Ok, I did misunderstand you. it sound slike all you need to do to make sure that locks are not held is to allow system calls to return before trying to do the suspend/kexec/etc. that sounds like not only a trivial thing to do, but something that would probably be done anyway. although syscalls that then call out to userspace tasks before they can complete cause potential deadlocks (without that issue you can just wait until all syscalls have returned, and not allow anything to issue new syscalls) is this the issue that's killing FUSE+suspend? if you have locks that are held across system calls then you should already have problems. becouse you can't count on userspace ever taking whatever action is appropriate to release the lock. what am I missing that concerns you so much? Here's what you are missing: The new kexec approach eliminates the freezer and relies instead on the fact that none of the tasks in the original kernel can execute while the new kexec'd kernel is running. This means the new kernel can write out a memory image with no fear of interference or corruption. correct But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. here is where you loose me. why should jumping back to the original kernel immedialty start running these processes? the process of doing a kexec requires things to happen in the drivers before normal activity can happen, so there is a phase in there where the kernel being jumped to has drivers initializing, but still does not allow anything else to run. why can't this phase be extended to allow for the possibility of transitioning these drivers to a sleep mode instead of to full operation? For example, drivers know that they have to quiesce their device in preparation for creating the memory snapshot. But they assume that no I/O requests will be made while the device is quiesced (because no user task is capable of generating an I/O request if they are all frozen), so the driver doesn't try to prevent such requests from reactivating the device. The situation as regards locking is harder to discuss since I don't know of any code examples to use as a guide. The fact remains that if user tasks aren't frozen then they can make system calls, and while running in kernel mode they can acquire locks, which might cause problems -- even though I can't identify any definite examples. yes, if userspace is running jobs and submitting I/O and system calls while drivers are trying to initalize there is a big problem, but I am missing the reason this must be the case. Because of these problems, it's too early to start trying to use kexec to avoid the need for the freezer. Of course, exactly the same possible problems exist when one tries to remove the freezer from suspend-to-RAM. It has nothing to do with kexec in particular (and certainly nothing to do with ACPI). the part of the freezer that everyone is trying to eliminate is the exceptions (freeze everything except X,Y,Z becouse we will need to use those later for A) having read through Documentation/power/devices.txt I remain convinced that you are making a fundamental mistake. you are designing a system I'm not designing anything! _You_ are. I'm merely pointing out problems in your design which you haven't considered. a better way of phrasing what I meant goes more along the lines of 'the current design of the system...' that will only work if everything (every driver, every state transition) participates fully in the process at all times. You started with the facts 'this is
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Miklos Szeredi wrote: > > The only thing to do is what Rafael has been working on: unfreeze > > things, hope the tasks sort themselves out, and try again. > > Have we some proof, that this will untangle the freezing tasks in a > limited time? Or will it just make the problem harder to trigger? Of course there's no proof. Just the opposite -- if things get hung up the first time, they might get hung up the second time. And the third... But it ought to make the problem harder to trigger. For the present that's a worthwhile improvement. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
> The only thing to do is what Rafael has been working on: unfreeze > things, hope the tasks sort themselves out, and try again. Have we some proof, that this will untangle the freezing tasks in a limited time? Or will it just make the problem harder to trigger? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote: > wait a min her, it's possible we are misunderstanding each other. I'd describe it as: You are misunderstanding me. :-) > as I see it. > > if userspace can aquire locks that prevent the kernel from shutting off > (or doing anything else in particular) then it's possible for misbehaving > userspace code to stop the kernel by simply choosing to never release the > lock. > > this would be a trivial DOS from userspace. You are confusing "userspace" with "user tasks". And not only that, you often use the term "userspace" when you should say "user mode". If you want I can explain the differences. > now, if you are talking instead about the fact that when userspace makes a > system call, the execution of that system call involves aquiring locks > that are released before the system call completes you have a very > different situation. That is exactly what I have been talking about. It may be different from what you _thought_, but it's not different from what I actually _said_. > if you have locks that are held across system calls then you should > already have problems. becouse you can't count on userspace ever taking > whatever action is appropriate to release the lock. > > what am I missing that concerns you so much? Here's what you are missing: The new kexec approach eliminates the freezer and relies instead on the fact that none of the tasks in the original kernel can execute while the new kexec'd kernel is running. This means the new kernel can write out a memory image with no fear of interference or corruption. But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. For example, drivers know that they have to quiesce their device in preparation for creating the memory snapshot. But they assume that no I/O requests will be made while the device is quiesced (because no user task is capable of generating an I/O request if they are all frozen), so the driver doesn't try to prevent such requests from reactivating the device. The situation as regards locking is harder to discuss since I don't know of any code examples to use as a guide. The fact remains that if user tasks aren't frozen then they can make system calls, and while running in kernel mode they can acquire locks, which might cause problems -- even though I can't identify any definite examples. Because of these problems, it's too early to start trying to use kexec to avoid the need for the freezer. Of course, exactly the same possible problems exist when one tries to remove the freezer from suspend-to-RAM. It has nothing to do with kexec in particular (and certainly nothing to do with ACPI). > having read through Documentation/power/devices.txt I remain convinced > that you are making a fundamental mistake. > > you are designing a system I'm not designing anything! _You_ are. I'm merely pointing out problems in your design which you haven't considered. > that will only work if everything (every > driver, every state transition) participates fully in the process at all > times. You started with the facts 'this is the info that ACPI provides Look again; I wasn't talking about ACPI. You have mixed up the issues in this email thread. (Not hard to do, since it has been a very long and complicated thread.) > and > this is how it is designed to be used' and worked from there instead of > looking to see what the kernel really needed and figuring how to provide a > good interface for that that happens to be implemented (today) with ACPI. > (a proper power management framework shouldn't care if you have ACPI, APM, > or some other method of controlling the devices) This and the rest of your email have no bearing on what I was talking about, so I have snipped out the remainder. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Nigel Cunningham wrote: > Hi. > > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: > > It seems that you could still potentially get a failure to freeze if one > > FUSE process depends on another, and the one that is frozen second just > > happens to be waiting on the one that is frozen first when it is frozen. > > I admit that this situation is unlikely, and perhaps acceptable. > > > > A larger concern is that it seems that freezing FUSE processes at all > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > > filesystem is loopback mounted from a FUSE filesystem. In that case, if > > you attempt to sync or free memory once FUSE is frozen, you are sure to > > get a deadlock. > > Ok. So then (in response to Alan too), how about keeping a tree of mounts, > akin to the device tree, and working from the deepest nodes up? (In > conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Nigel Cunningham wrote: Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote: wait a min her, it's possible we are misunderstanding each other. I'd describe it as: You are misunderstanding me. :-) as I see it. if userspace can aquire locks that prevent the kernel from shutting off (or doing anything else in particular) then it's possible for misbehaving userspace code to stop the kernel by simply choosing to never release the lock. this would be a trivial DOS from userspace. You are confusing userspace with user tasks. And not only that, you often use the term userspace when you should say user mode. If you want I can explain the differences. now, if you are talking instead about the fact that when userspace makes a system call, the execution of that system call involves aquiring locks that are released before the system call completes you have a very different situation. That is exactly what I have been talking about. It may be different from what you _thought_, but it's not different from what I actually _said_. if you have locks that are held across system calls then you should already have problems. becouse you can't count on userspace ever taking whatever action is appropriate to release the lock. what am I missing that concerns you so much? Here's what you are missing: The new kexec approach eliminates the freezer and relies instead on the fact that none of the tasks in the original kernel can execute while the new kexec'd kernel is running. This means the new kernel can write out a memory image with no fear of interference or corruption. But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. For example, drivers know that they have to quiesce their device in preparation for creating the memory snapshot. But they assume that no I/O requests will be made while the device is quiesced (because no user task is capable of generating an I/O request if they are all frozen), so the driver doesn't try to prevent such requests from reactivating the device. The situation as regards locking is harder to discuss since I don't know of any code examples to use as a guide. The fact remains that if user tasks aren't frozen then they can make system calls, and while running in kernel mode they can acquire locks, which might cause problems -- even though I can't identify any definite examples. Because of these problems, it's too early to start trying to use kexec to avoid the need for the freezer. Of course, exactly the same possible problems exist when one tries to remove the freezer from suspend-to-RAM. It has nothing to do with kexec in particular (and certainly nothing to do with ACPI). having read through Documentation/power/devices.txt I remain convinced that you are making a fundamental mistake. you are designing a system I'm not designing anything! _You_ are. I'm merely pointing out problems in your design which you haven't considered. that will only work if everything (every driver, every state transition) participates fully in the process at all times. You started with the facts 'this is the info that ACPI provides Look again; I wasn't talking about ACPI. You have mixed up the issues in this email thread. (Not hard to do, since it has been a very long and complicated thread.) and this is how it is designed to be used' and worked from there instead of looking to see what the kernel really needed and figuring how to provide a good interface for that that happens to be implemented (today) with ACPI. (a proper power management framework shouldn't care if you have ACPI, APM, or some other method of controlling the devices) This and the rest of your email have no bearing on what I was talking about, so I have snipped out the remainder. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. Have we some proof, that this will untangle the freezing tasks in a limited time? Or will it just make the problem harder to trigger? Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Miklos Szeredi wrote: The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. Have we some proof, that this will untangle the freezing tasks in a limited time? Or will it just make the problem harder to trigger? Of course there's no proof. Just the opposite -- if things get hung up the first time, they might get hung up the second time. And the third... But it ought to make the problem harder to trigger. For the present that's a worthwhile improvement. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Alan Stern wrote: On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote: wait a min her, it's possible we are misunderstanding each other. I'd describe it as: You are misunderstanding me. :-) very possibly :-) as I see it. if userspace can aquire locks that prevent the kernel from shutting off (or doing anything else in particular) then it's possible for misbehaving userspace code to stop the kernel by simply choosing to never release the lock. this would be a trivial DOS from userspace. You are confusing userspace with user tasks. And not only that, you often use the term userspace when you should say user mode. If you want I can explain the differences. please do, I have been treating all three as the same catagory. now, if you are talking instead about the fact that when userspace makes a system call, the execution of that system call involves aquiring locks that are released before the system call completes you have a very different situation. That is exactly what I have been talking about. It may be different from what you _thought_, but it's not different from what I actually _said_. Ok, I did misunderstand you. it sound slike all you need to do to make sure that locks are not held is to allow system calls to return before trying to do the suspend/kexec/etc. that sounds like not only a trivial thing to do, but something that would probably be done anyway. although syscalls that then call out to userspace tasks before they can complete cause potential deadlocks (without that issue you can just wait until all syscalls have returned, and not allow anything to issue new syscalls) is this the issue that's killing FUSE+suspend? if you have locks that are held across system calls then you should already have problems. becouse you can't count on userspace ever taking whatever action is appropriate to release the lock. what am I missing that concerns you so much? Here's what you are missing: The new kexec approach eliminates the freezer and relies instead on the fact that none of the tasks in the original kernel can execute while the new kexec'd kernel is running. This means the new kernel can write out a memory image with no fear of interference or corruption. correct But it also means that tasks which otherwise would have been frozen are actually free to run before the kexec call is made (and after the call returns, if the kexec'd kernel returns back to the original kernel). Any driver which was written with the assumption that tasks would be frozen at those times will need to be changed. here is where you loose me. why should jumping back to the original kernel immedialty start running these processes? the process of doing a kexec requires things to happen in the drivers before normal activity can happen, so there is a phase in there where the kernel being jumped to has drivers initializing, but still does not allow anything else to run. why can't this phase be extended to allow for the possibility of transitioning these drivers to a sleep mode instead of to full operation? For example, drivers know that they have to quiesce their device in preparation for creating the memory snapshot. But they assume that no I/O requests will be made while the device is quiesced (because no user task is capable of generating an I/O request if they are all frozen), so the driver doesn't try to prevent such requests from reactivating the device. The situation as regards locking is harder to discuss since I don't know of any code examples to use as a guide. The fact remains that if user tasks aren't frozen then they can make system calls, and while running in kernel mode they can acquire locks, which might cause problems -- even though I can't identify any definite examples. yes, if userspace is running jobs and submitting I/O and system calls while drivers are trying to initalize there is a big problem, but I am missing the reason this must be the case. Because of these problems, it's too early to start trying to use kexec to avoid the need for the freezer. Of course, exactly the same possible problems exist when one tries to remove the freezer from suspend-to-RAM. It has nothing to do with kexec in particular (and certainly nothing to do with ACPI). the part of the freezer that everyone is trying to eliminate is the exceptions (freeze everything except X,Y,Z becouse we will need to use those later for A) having read through Documentation/power/devices.txt I remain convinced that you are making a fundamental mistake. you are designing a system I'm not designing anything! _You_ are. I'm merely pointing out problems in your design which you haven't considered. a better way of phrasing what I meant goes more along the lines of 'the current design of the system...' that will only work if everything (every driver, every state transition) participates fully in the process at all times. You started with the facts 'this is the info
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Alan Stern wrote: On Sun, 22 Jul 2007, Miklos Szeredi wrote: The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. Have we some proof, that this will untangle the freezing tasks in a limited time? Or will it just make the problem harder to trigger? Of course there's no proof. Just the opposite -- if things get hung up the first time, they might get hung up the second time. And the third... But it ought to make the problem harder to trigger. For the present that's a worthwhile improvement. it gives the system more tries to find a spot in time where the deadlock doesn't happen, if you find one you can continue. but even if things keep getting hung up, at least you are backing out of each try safely and can eventually tell the user I give up, try shutting some things down and suspending again David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi Alan. On Monday 23 July 2007 01:26:23 Alan Stern wrote: On Sun, 22 Jul 2007, Nigel Cunningham wrote: Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. It does seem to be a losing battle, but I'm wondering whether that's really because it's an intractable problem, or because people have given up on it before its time. We are talking about a computer system, so things should be predictable. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, the locking issue. If we could call some function to say What process holds this lock?, then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? Regards, Nigel -- Nigel Cunningham Christian Reformed Church of Cobden 103 Curdie Street, Cobden 3266, Victoria, Australia Ph. +61 3 5595 1185 / +61 417 100 574 Communal Worship: 11 am Sunday. pgpVAIGM5vqnS.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
Hi, On Monday, 23 July 2007 00:42, Nigel Cunningham wrote: Hi Alan. On Monday 23 July 2007 01:26:23 Alan Stern wrote: On Sun, 22 Jul 2007, Nigel Cunningham wrote: Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. It does seem to be a losing battle, but I'm wondering whether that's really because it's an intractable problem, or because people have given up on it before its time. We are talking about a computer system, so things should be predictable. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, the locking issue. If we could call some function to say What process holds this lock?, then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? Well, there probably is one, but it likely would require us to make changes that wouldn't be accepted by some people and thus would never be merged. Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Monday 23 July 2007 09:09:21 Rafael J. Wysocki wrote: Hi, On Monday, 23 July 2007 00:42, Nigel Cunningham wrote: Hi Alan. On Monday 23 July 2007 01:26:23 Alan Stern wrote: On Sun, 22 Jul 2007, Nigel Cunningham wrote: Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. It does seem to be a losing battle, but I'm wondering whether that's really because it's an intractable problem, or because people have given up on it before its time. We are talking about a computer system, so things should be predictable. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, the locking issue. If we could call some function to say What process holds this lock?, then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? Well, there probably is one, but it likely would require us to make changes that wouldn't be accepted by some people and thus would never be merged. Well, doesn't that imply that we should at least look into what changes would be needed? If they wouldn't be accepted by some people, then either the objections would be reasonable or they wouldn't (and would hopefully be overridden). But we can't know if we don't try. Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgptv0SjDRopM.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
Nigel Cunningham writes: I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, Ever heard of the halting problem? :) It's not just a matter of infrastructure. You very quickly get into questions that are mathematically undecideable. the locking issue. If we could call some function to say What process holds this lock?, then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. But how would that help? If task B holds the lock, then we can't freeze it until it's released the lock. Then the question is, what does task B need in order to get to the point where it releases the lock? And so on. It rapidly gets not just extremely messy, but actually impossible to compute in general. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Monday 23 July 2007 10:04:43 Paul Mackerras wrote: Nigel Cunningham writes: I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, Ever heard of the halting problem? :) It's not just a matter of infrastructure. You very quickly get into questions that are mathematically undecideable. Is this the halting problem, though? the locking issue. If we could call some function to say What process holds this lock?, then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. But how would that help? If task B holds the lock, then we can't freeze it until it's released the lock. Then the question is, what does task B need in order to get to the point where it releases the lock? And so on. It rapidly gets not just extremely messy, but actually impossible to compute in general. Take a step back for a second. The problem we're facing now is that we're getting some userspace threads, used in processing I/O, that are functioning as exceptions to the freeze userspace, then freezeable kernel threads rule. They are only exceptions because of that role in processing I/O - because they're de facto kernel threads. So, if we orient our thinking more in terms of I/O processing and less in terms of the userspace/kernelspace distinction, we'll have a solution: 1) Freeze processes that aren't fs related (ie stop them generating I/O). 2) Flush pending I/O. 3) Freeze filesystems in reverse order of dependency, the primary purpose being to stop them generating further I/O on their metadata. Locks that are being held are only being held because work is being done. If we progressively focus on threads in terms of their create/process work dependencies, we'll see that the problem isn't at all intractable. Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgpjTSNWacYUf.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Mon, 23 Jul 2007, Nigel Cunningham wrote: Hi Alan. On Monday 23 July 2007 01:26:23 Alan Stern wrote: On Sun, 22 Jul 2007, Nigel Cunningham wrote: Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Face it, Nigel, this is a losing battle. You can try to come up with ever-more complex schemes to try and force FUSE into the freezer's framework, but it just won't fit. Or if it does, the next filesystem to come along will require an even more baroque type of special-case handling. It does seem to be a losing battle, but I'm wondering whether that's really because it's an intractable problem, or because people have given up on it before its time. We are talking about a computer system, so things should be predictable. The general problem is that task A may be in an unfreezable state, waiting for task B to do something, while task B is already frozen. Since there's no reasonable way to determine that A really is waiting for B, you're just stuck. (To make matters worse, A may not even realize which task it is waiting for; it may know only that it's waiting for somebody to do something!) A and B could be user tasks, kernel threads, or one of each. I guess I want to persist because all of these issues aren't utterly unsolvable. It's just that we don't have the infrastructure yet to figure out the solutions to these issues trivially. Take, for example, the locking issue. If we could call some function to say What process holds this lock?, then task A could know that it's waiting on task B and put that information somewhere. We could then use the information to freeze task B before task A. this sounds like the standard priority inversion problem taken to extremes. Ingo has been working this issue, but IIRC the problem is that tracking what owns the lock so that you can get that thing to run ends up being enough overhead that it's not acceptable in the general case. David Lang The only thing to do is what Rafael has been working on: unfreeze things, hope the tasks sort themselves out, and try again. That's what I'm questioning. Is there a more reliable way and we've just given up too quickly? Regards, Nigel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007, Alan Stern wrote: On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote: How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? you give up on the suspend becouse you have no way of getting the user task to give up the lock. Once the deadlock has occurred it's too late. You can't give up; in fact you can't do anything at all. The system has hung. however, kernel locks should not be held by user tasks, user tasks are not expected to behave in rational ways, allowing them to compete with kernel tasks for locks is a sure way to get a deadlock or indefinate stall. What on Earth are you talking about? "Kernel locks should not be held by user tasks"? Then who _should_ hold them? You are aware, I hope, that down() and mutex_lock() can be called only in process context? what locks are accessed this way? Lots of them. For example, most drivers won't want a suspend to occur right in the middle of an I/O transfer. To prevent this, the driver might use a mutex. The task doing the I/O (which will be a user task) acquires the mutex during a transfer and the suspend routine acquires the mutex while quiescing the device. wait a min her, it's possible we are misunderstanding each other. as I see it. if userspace can aquire locks that prevent the kernel from shutting off (or doing anything else in particular) then it's possible for misbehaving userspace code to stop the kernel by simply choosing to never release the lock. this would be a trivial DOS from userspace. now, if you are talking instead about the fact that when userspace makes a system call, the execution of that system call involves aquiring locks that are released before the system call completes you have a very different situation. if you have locks that are held across system calls then you should already have problems. becouse you can't count on userspace ever taking whatever action is appropriate to release the lock. what am I missing that concerns you so much? Does it really (fundamentally) require scheduling tasks, particularly in the case that the devices have already been put in the "quiesced" state? I can't say for sure. That's the way we have been doing it. It wouldn't be easy to change, because the driver would have to busy-wait during delays -- which would mean it would need to use different code for system-wide suspend and runtime suspend. please define terms so that we are all on the same page Please read Documentation/power/devices.txt. I have done so. what do you mean by system-wide suspend That's what you would call standby, suspend-to-RAM, or hibernate. The entire system goes to sleep. runtime suspend That's when an individual device is placed in a low-power state to save energy while it isn't being used. The system as a whole remains awake and the device will be resumed the next time it is needed for anything. thanks for the defintitions. having read through Documentation/power/devices.txt I remain convinced that you are making a fundamental mistake. you are designing a system that will only work if everything (every driver, every state transition) participates fully in the process at all times. You started with the facts 'this is the info that ACPI provides and this is how it is designed to be used' and worked from there instead of looking to see what the kernel really needed and figuring how to provide a good interface for that that happens to be implemented (today) with ACPI. (a proper power management framework shouldn't care if you have ACPI, APM, or some other method of controlling the devices) this leads to resume functions that can only work if the proper suspend function was called rather then makeing 'resume' just mean 'go to full operation', which is the same thing that gets called when the device is first initialized. internally it can examine the hardware and follow different paths depending on what it finds the current state of the hardware is, but the outside world (including the rest of the kernel) should not care. the fact that the rest of the kernel needs to know if it should call 'resume' or 'initialize' is a failure in the abstraction. in fact, a better abstraction would be something like report_power_modes which would return a series of modes (sorted only by modeID) modeID, %power_used_in_this_mode, %capability_in_this_mode (I would make mode 0 always be complete power off, and mode 1 always be full capacity) report_power_mode_speed which would return a matrix giving how long it takes to transition from any mode to any other mode. this should be a relative number, not an absolute number since it will be different at different clock speeds. set_operational_mode(modeID) which would take you from whatever mode you are in now to the requested mode.
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Huang, Ying wrote: On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote: Backuping target memory before kexec and restoring it after kexec is planed feature for kexec jump. But I will work on image writing/reading first. if we can get a list of what memory is safe to backup/restore then the reading/writing of the image should be able to be done in userspace. The backup/restore here has nothing to do with the read/write of the image. It means instead of preserving memory for a new kernel like that of crash-dump, the memory for a new kernel is backupped before kexec and restored after kexec by the kexec kernel. Ok, I see the miscommunication here. you are talking about freeing up memory for the second kernel instead of reserving it from boot time. I'm talking about getting the second kernel a list of what memory pages it should write to the image if we can get the info for the list I'm looking for we should be able to demonstrate the kexec based hibernate. the change you are talking about in an enhancment that is useful after that point to save some memory. If the "scatter copy" is replaced by "scatter swap", we need not the inverse list, and the state of kexeced kernel can be backuped too. There are "scatter copy" support in normal kexec implementation in "relocate_kernel". what do you mean by "scatter swap" copy: dest=src swap: tmp=dest; dest=src; src=tmp If memory is swapped, no information is lost, both that of kexec kernel and kexeced kernel. I'm missing why you need to preserve this memory if you are talking about memory that will be used by the second kernel when you kexec to it then you don't need to preserve it (since it will be overwritten by the second kernel). if you aren't talking about memory that will be used by the second kernel why do you need to move it? David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote: > > Backuping target memory before kexec and restoring it after kexec is > > planed feature for kexec jump. But I will work on image writing/reading > > first. > > if we can get a list of what memory is safe to backup/restore then the > reading/writing of the image should be able to be done in userspace. The backup/restore here has nothing to do with the read/write of the image. It means instead of preserving memory for a new kernel like that of crash-dump, the memory for a new kernel is backupped before kexec and restored after kexec by the kexec kernel. > > If the "scatter copy" is replaced by "scatter swap", we need not the > > inverse list, and the state of kexeced kernel can be backuped too. There > > are "scatter copy" support in normal kexec implementation in > > "relocate_kernel". > > what do you mean by "scatter swap" copy: dest=src swap: tmp=dest; dest=src; src=tmp If memory is swapped, no information is lost, both that of kexec kernel and kexeced kernel. Best Regards, Huang, Ying - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: > It seems that you could still potentially get a failure to freeze if one > FUSE process depends on another, and the one that is frozen second just > happens to be waiting on the one that is frozen first when it is frozen. > I admit that this situation is unlikely, and perhaps acceptable. > > A larger concern is that it seems that freezing FUSE processes at all > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > filesystem is loopback mounted from a FUSE filesystem. In that case, if > you attempt to sync or free memory once FUSE is frozen, you are sure to > get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgp1DubIOeAxL.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
Hi. On Sunday 22 July 2007 04:12:22 Miklos Szeredi wrote: > > It seems that you could still potentially get a failure to freeze if one > > FUSE process depends on another, and the one that is frozen second just > > happens to be waiting on the one that is frozen first when it is frozen. > > I admit that this situation is unlikely, and perhaps acceptable. > > It isn't all that unlikely. There's sshfs for example, that depends > on a separate ssh process for transport. > > Oh, there are also userspace network transports, like tun/tap, > nfqueue, etc. They could block any network filesystem (not just fuse) > if frozen first, making the freezer fail. > > Hmm, wonder why this isn't affecting people with VPNs? Probably > network mounts over VPN are rare, and ever rarer to have fs activity > on them during suspend. > > Anyway, I think it's long overdue to stop thinking about how to "fix" > fuse, and concentrate on fixing the underlying problem instead ;) That's what I'm seeking to do :) > > A larger concern is that it seems that freezing FUSE processes at all > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > > filesystem is loopback mounted from a FUSE filesystem. In that case, if > > you attempt to sync or free memory once FUSE is frozen, you are sure to > > get a deadlock. > > Well, it would deadlock, if > > a) memory reclaim was synchronous, or > b) large part of the memory was used for dirty file data These are problems in normal operation, aren't they? > I can't remember if (a) was ever true. And now the dirty ratio is 10% > by default, so if we go OOM because that 10% can't be reclaimed, there > is a more serious problem. > > Swap over loop over fuse would be problematic, but that won't work for > some time yet ;) Hopefully people will wake up to the problems with Fuse and get rid of it before then :|. Of course I don't really expect that to happen. Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgpOpIxpZQh0t.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Saturday, 21 July 2007 20:12, Miklos Szeredi wrote: > > It seems that you could still potentially get a failure to freeze if one > > FUSE process depends on another, and the one that is frozen second just > > happens to be waiting on the one that is frozen first when it is frozen. > > I admit that this situation is unlikely, and perhaps acceptable. > > It isn't all that unlikely. There's sshfs for example, that depends > on a separate ssh process for transport. > > Oh, there are also userspace network transports, like tun/tap, > nfqueue, etc. They could block any network filesystem (not just fuse) > if frozen first, making the freezer fail. > > Hmm, wonder why this isn't affecting people with VPNs? Probably > network mounts over VPN are rare, and ever rarer to have fs activity > on them during suspend. > > Anyway, I think it's long overdue to stop thinking about how to "fix" > fuse, and concentrate on fixing the underlying problem instead ;) To conclude this branch of the thread, I have a patch in the works that may help a bit with unfreezable FUSE filesystems and it only affects the freezer. I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches that need to go first. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
> It seems that you could still potentially get a failure to freeze if one > FUSE process depends on another, and the one that is frozen second just > happens to be waiting on the one that is frozen first when it is frozen. > I admit that this situation is unlikely, and perhaps acceptable. It isn't all that unlikely. There's sshfs for example, that depends on a separate ssh process for transport. Oh, there are also userspace network transports, like tun/tap, nfqueue, etc. They could block any network filesystem (not just fuse) if frozen first, making the freezer fail. Hmm, wonder why this isn't affecting people with VPNs? Probably network mounts over VPN are rare, and ever rarer to have fs activity on them during suspend. Anyway, I think it's long overdue to stop thinking about how to "fix" fuse, and concentrate on fixing the underlying problem instead ;) > A larger concern is that it seems that freezing FUSE processes at all > _will_ generate deadlocks if a non-synchronous or memory-map-supporting > filesystem is loopback mounted from a FUSE filesystem. In that case, if > you attempt to sync or free memory once FUSE is frozen, you are sure to > get a deadlock. Well, it would deadlock, if a) memory reclaim was synchronous, or b) large part of the memory was used for dirty file data I can't remember if (a) was ever true. And now the dirty ratio is 10% by default, so if we go OOM because that 10% can't be reclaimed, there is a more serious problem. Swap over loop over fuse would be problematic, but that won't work for some time yet ;) Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote: > > How would you prevent tasks from being scheduled? How would you > > prevent drivers from deadlocking because in order to put their device > > in a low-power state they need to acquire a lock which is held by a > > user task? > > you give up on the suspend becouse you have no way of getting the user > task to give up the lock. Once the deadlock has occurred it's too late. You can't give up; in fact you can't do anything at all. The system has hung. > however, kernel locks should not be held by user tasks, user tasks are not > expected to behave in rational ways, allowing them to compete with kernel > tasks for locks is a sure way to get a deadlock or indefinate stall. What on Earth are you talking about? "Kernel locks should not be held by user tasks"? Then who _should_ hold them? You are aware, I hope, that down() and mutex_lock() can be called only in process context? > what locks are accessed this way? Lots of them. For example, most drivers won't want a suspend to occur right in the middle of an I/O transfer. To prevent this, the driver might use a mutex. The task doing the I/O (which will be a user task) acquires the mutex during a transfer and the suspend routine acquires the mutex while quiescing the device. > >> Does it really (fundamentally) require scheduling tasks, particularly in > >> the case that the devices have already been put in the "quiesced" state? > > > > I can't say for sure. That's the way we have been doing it. It > > wouldn't be easy to change, because the driver would have to busy-wait > > during delays -- which would mean it would need to use different code > > for system-wide suspend and runtime suspend. > > please define terms so that we are all on the same page Please read Documentation/power/devices.txt. > what do you mean by > system-wide suspend That's what you would call standby, suspend-to-RAM, or hibernate. The entire system goes to sleep. > runtime suspend That's when an individual device is placed in a low-power state to save energy while it isn't being used. The system as a whole remains awake and the device will be resumed the next time it is needed for anything. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007, Nigel Cunningham wrote: > What am I missing in the following suggested solution? > > 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, > which, > when set, causes a userspace process to be frozen with kernel threads > instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, > sync and then freeze TIF_LATEFREEZE and freezable kernel threads. > > 2) In the fuse code, the PID of the process that will do the work gets passed > to the fuse kernel code when the mount is done. The kernel code sets the > TIF_LATEFREEZE flag, and resets it on umount. What happens when one FUSE filesystem makes use of another? You'll still end up with unfreezable processes, except that now you won't detect them until the LATEFREEZE stage. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Saturday 21 July 2007 21:44:32 Miklos Szeredi wrote: > > The problem with FUSE is related to the fact that the freezer can't > > freeze uninterruptible tasks and we said that perhaps we might avoid > > it if FUSE was made freezing-aware. Still, no one has gone in this > > direction and I don't know of any plans to do that. > > I thought we have fully explored this direction. Lots of emails, and > an IRC session with Pavel. Conclusion: What am I missing in the following suggested solution? 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, which, when set, causes a userspace process to be frozen with kernel threads instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, sync and then freeze TIF_LATEFREEZE and freezable kernel threads. 2) In the fuse code, the PID of the process that will do the work gets passed to the fuse kernel code when the mount is done. The kernel code sets the TIF_LATEFREEZE flag, and resets it on umount. Sorry, but this is a hit-and-run email - I'm off to bed now. Regards, Nigel pgpvN1gXBPnTE.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
> The problem with FUSE is related to the fact that the freezer can't > freeze uninterruptible tasks and we said that perhaps we might avoid > it if FUSE was made freezing-aware. Still, no one has gone in this > direction and I don't know of any plans to do that. I thought we have fully explored this direction. Lots of emails, and an IRC session with Pavel. Conclusion: - It can't be done without VFS surgery + adding various hacks to fuse - VFS surgery for the sake of a working suspend is not realistic Although removing the freezer seems the cleanest solution, I'm not saying the freezer can't be fixed up in the mean time. Allowing tasks to remain in uninterruptible sleep seemed a nice way to get around the fuse issues. What was the problem with that patch? It was something that was supposed to have been tested in suspend2, wasn't it? The other one (trying to wake up task, so that may make other tasks freezable) didn't seem such a good approach to me. The theory is quite simple: while and after suspending devices, no tasks must be touching said devices. The very cleanest way to do this is in the drivers. The very simplest way is the current freezer. But may be there are possibilities between these two extremes. But I can almost guarantee you, that any attempt at fixing the issues though fuse will just result in an even bigger mess than what we currently have. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
The problem with FUSE is related to the fact that the freezer can't freeze uninterruptible tasks and we said that perhaps we might avoid it if FUSE was made freezing-aware. Still, no one has gone in this direction and I don't know of any plans to do that. I thought we have fully explored this direction. Lots of emails, and an IRC session with Pavel. Conclusion: - It can't be done without VFS surgery + adding various hacks to fuse - VFS surgery for the sake of a working suspend is not realistic Although removing the freezer seems the cleanest solution, I'm not saying the freezer can't be fixed up in the mean time. Allowing tasks to remain in uninterruptible sleep seemed a nice way to get around the fuse issues. What was the problem with that patch? It was something that was supposed to have been tested in suspend2, wasn't it? The other one (trying to wake up task, so that may make other tasks freezable) didn't seem such a good approach to me. The theory is quite simple: while and after suspending devices, no tasks must be touching said devices. The very cleanest way to do this is in the drivers. The very simplest way is the current freezer. But may be there are possibilities between these two extremes. But I can almost guarantee you, that any attempt at fixing the issues though fuse will just result in an even bigger mess than what we currently have. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Saturday 21 July 2007 21:44:32 Miklos Szeredi wrote: The problem with FUSE is related to the fact that the freezer can't freeze uninterruptible tasks and we said that perhaps we might avoid it if FUSE was made freezing-aware. Still, no one has gone in this direction and I don't know of any plans to do that. I thought we have fully explored this direction. Lots of emails, and an IRC session with Pavel. Conclusion: What am I missing in the following suggested solution? 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, which, when set, causes a userspace process to be frozen with kernel threads instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, sync and then freeze TIF_LATEFREEZE and freezable kernel threads. 2) In the fuse code, the PID of the process that will do the work gets passed to the fuse kernel code when the mount is done. The kernel code sets the TIF_LATEFREEZE flag, and resets it on umount. Sorry, but this is a hit-and-run email - I'm off to bed now. Regards, Nigel pgpvN1gXBPnTE.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007, Nigel Cunningham wrote: What am I missing in the following suggested solution? 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, which, when set, causes a userspace process to be frozen with kernel threads instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, sync and then freeze TIF_LATEFREEZE and freezable kernel threads. 2) In the fuse code, the PID of the process that will do the work gets passed to the fuse kernel code when the mount is done. The kernel code sets the TIF_LATEFREEZE flag, and resets it on umount. What happens when one FUSE filesystem makes use of another? You'll still end up with unfreezable processes, except that now you won't detect them until the LATEFREEZE stage. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote: How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? you give up on the suspend becouse you have no way of getting the user task to give up the lock. Once the deadlock has occurred it's too late. You can't give up; in fact you can't do anything at all. The system has hung. however, kernel locks should not be held by user tasks, user tasks are not expected to behave in rational ways, allowing them to compete with kernel tasks for locks is a sure way to get a deadlock or indefinate stall. What on Earth are you talking about? Kernel locks should not be held by user tasks? Then who _should_ hold them? You are aware, I hope, that down() and mutex_lock() can be called only in process context? what locks are accessed this way? Lots of them. For example, most drivers won't want a suspend to occur right in the middle of an I/O transfer. To prevent this, the driver might use a mutex. The task doing the I/O (which will be a user task) acquires the mutex during a transfer and the suspend routine acquires the mutex while quiescing the device. Does it really (fundamentally) require scheduling tasks, particularly in the case that the devices have already been put in the quiesced state? I can't say for sure. That's the way we have been doing it. It wouldn't be easy to change, because the driver would have to busy-wait during delays -- which would mean it would need to use different code for system-wide suspend and runtime suspend. please define terms so that we are all on the same page Please read Documentation/power/devices.txt. what do you mean by system-wide suspend That's what you would call standby, suspend-to-RAM, or hibernate. The entire system goes to sleep. runtime suspend That's when an individual device is placed in a low-power state to save energy while it isn't being used. The system as a whole remains awake and the device will be resumed the next time it is needed for anything. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. It isn't all that unlikely. There's sshfs for example, that depends on a separate ssh process for transport. Oh, there are also userspace network transports, like tun/tap, nfqueue, etc. They could block any network filesystem (not just fuse) if frozen first, making the freezer fail. Hmm, wonder why this isn't affecting people with VPNs? Probably network mounts over VPN are rare, and ever rarer to have fs activity on them during suspend. Anyway, I think it's long overdue to stop thinking about how to fix fuse, and concentrate on fixing the underlying problem instead ;) A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Well, it would deadlock, if a) memory reclaim was synchronous, or b) large part of the memory was used for dirty file data I can't remember if (a) was ever true. And now the dirty ratio is 10% by default, so if we go OOM because that 10% can't be reclaimed, there is a more serious problem. Swap over loop over fuse would be problematic, but that won't work for some time yet ;) Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Saturday, 21 July 2007 20:12, Miklos Szeredi wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. It isn't all that unlikely. There's sshfs for example, that depends on a separate ssh process for transport. Oh, there are also userspace network transports, like tun/tap, nfqueue, etc. They could block any network filesystem (not just fuse) if frozen first, making the freezer fail. Hmm, wonder why this isn't affecting people with VPNs? Probably network mounts over VPN are rare, and ever rarer to have fs activity on them during suspend. Anyway, I think it's long overdue to stop thinking about how to fix fuse, and concentrate on fixing the underlying problem instead ;) To conclude this branch of the thread, I have a patch in the works that may help a bit with unfreezable FUSE filesystems and it only affects the freezer. I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches that need to go first. Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Ok. So then (in response to Alan too), how about keeping a tree of mounts, akin to the device tree, and working from the deepest nodes up? (In conjunction with what I already suggested)? Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgp1DubIOeAxL.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
Hi. On Sunday 22 July 2007 04:12:22 Miklos Szeredi wrote: It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. It isn't all that unlikely. There's sshfs for example, that depends on a separate ssh process for transport. Oh, there are also userspace network transports, like tun/tap, nfqueue, etc. They could block any network filesystem (not just fuse) if frozen first, making the freezer fail. Hmm, wonder why this isn't affecting people with VPNs? Probably network mounts over VPN are rare, and ever rarer to have fs activity on them during suspend. Anyway, I think it's long overdue to stop thinking about how to fix fuse, and concentrate on fixing the underlying problem instead ;) That's what I'm seeking to do :) A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. Well, it would deadlock, if a) memory reclaim was synchronous, or b) large part of the memory was used for dirty file data These are problems in normal operation, aren't they? I can't remember if (a) was ever true. And now the dirty ratio is 10% by default, so if we go OOM because that 10% can't be reclaimed, there is a more serious problem. Swap over loop over fuse would be problematic, but that won't work for some time yet ;) Hopefully people will wake up to the problems with Fuse and get rid of it before then :|. Of course I don't really expect that to happen. Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgpOpIxpZQh0t.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote: Backuping target memory before kexec and restoring it after kexec is planed feature for kexec jump. But I will work on image writing/reading first. if we can get a list of what memory is safe to backup/restore then the reading/writing of the image should be able to be done in userspace. The backup/restore here has nothing to do with the read/write of the image. It means instead of preserving memory for a new kernel like that of crash-dump, the memory for a new kernel is backupped before kexec and restored after kexec by the kexec kernel. If the scatter copy is replaced by scatter swap, we need not the inverse list, and the state of kexeced kernel can be backuped too. There are scatter copy support in normal kexec implementation in relocate_kernel. what do you mean by scatter swap copy: dest=src swap: tmp=dest; dest=src; src=tmp If memory is swapped, no information is lost, both that of kexec kernel and kexeced kernel. Best Regards, Huang, Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sun, 22 Jul 2007, Huang, Ying wrote: On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote: Backuping target memory before kexec and restoring it after kexec is planed feature for kexec jump. But I will work on image writing/reading first. if we can get a list of what memory is safe to backup/restore then the reading/writing of the image should be able to be done in userspace. The backup/restore here has nothing to do with the read/write of the image. It means instead of preserving memory for a new kernel like that of crash-dump, the memory for a new kernel is backupped before kexec and restored after kexec by the kexec kernel. Ok, I see the miscommunication here. you are talking about freeing up memory for the second kernel instead of reserving it from boot time. I'm talking about getting the second kernel a list of what memory pages it should write to the image if we can get the info for the list I'm looking for we should be able to demonstrate the kexec based hibernate. the change you are talking about in an enhancment that is useful after that point to save some memory. If the scatter copy is replaced by scatter swap, we need not the inverse list, and the state of kexeced kernel can be backuped too. There are scatter copy support in normal kexec implementation in relocate_kernel. what do you mean by scatter swap copy: dest=src swap: tmp=dest; dest=src; src=tmp If memory is swapped, no information is lost, both that of kexec kernel and kexeced kernel. I'm missing why you need to preserve this memory if you are talking about memory that will be used by the second kernel when you kexec to it then you don't need to preserve it (since it will be overwritten by the second kernel). if you aren't talking about memory that will be used by the second kernel why do you need to move it? David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007, Alan Stern wrote: On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote: How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? you give up on the suspend becouse you have no way of getting the user task to give up the lock. Once the deadlock has occurred it's too late. You can't give up; in fact you can't do anything at all. The system has hung. however, kernel locks should not be held by user tasks, user tasks are not expected to behave in rational ways, allowing them to compete with kernel tasks for locks is a sure way to get a deadlock or indefinate stall. What on Earth are you talking about? Kernel locks should not be held by user tasks? Then who _should_ hold them? You are aware, I hope, that down() and mutex_lock() can be called only in process context? what locks are accessed this way? Lots of them. For example, most drivers won't want a suspend to occur right in the middle of an I/O transfer. To prevent this, the driver might use a mutex. The task doing the I/O (which will be a user task) acquires the mutex during a transfer and the suspend routine acquires the mutex while quiescing the device. wait a min her, it's possible we are misunderstanding each other. as I see it. if userspace can aquire locks that prevent the kernel from shutting off (or doing anything else in particular) then it's possible for misbehaving userspace code to stop the kernel by simply choosing to never release the lock. this would be a trivial DOS from userspace. now, if you are talking instead about the fact that when userspace makes a system call, the execution of that system call involves aquiring locks that are released before the system call completes you have a very different situation. if you have locks that are held across system calls then you should already have problems. becouse you can't count on userspace ever taking whatever action is appropriate to release the lock. what am I missing that concerns you so much? Does it really (fundamentally) require scheduling tasks, particularly in the case that the devices have already been put in the quiesced state? I can't say for sure. That's the way we have been doing it. It wouldn't be easy to change, because the driver would have to busy-wait during delays -- which would mean it would need to use different code for system-wide suspend and runtime suspend. please define terms so that we are all on the same page Please read Documentation/power/devices.txt. I have done so. what do you mean by system-wide suspend That's what you would call standby, suspend-to-RAM, or hibernate. The entire system goes to sleep. runtime suspend That's when an individual device is placed in a low-power state to save energy while it isn't being used. The system as a whole remains awake and the device will be resumed the next time it is needed for anything. thanks for the defintitions. having read through Documentation/power/devices.txt I remain convinced that you are making a fundamental mistake. you are designing a system that will only work if everything (every driver, every state transition) participates fully in the process at all times. You started with the facts 'this is the info that ACPI provides and this is how it is designed to be used' and worked from there instead of looking to see what the kernel really needed and figuring how to provide a good interface for that that happens to be implemented (today) with ACPI. (a proper power management framework shouldn't care if you have ACPI, APM, or some other method of controlling the devices) this leads to resume functions that can only work if the proper suspend function was called rather then makeing 'resume' just mean 'go to full operation', which is the same thing that gets called when the device is first initialized. internally it can examine the hardware and follow different paths depending on what it finds the current state of the hardware is, but the outside world (including the rest of the kernel) should not care. the fact that the rest of the kernel needs to know if it should call 'resume' or 'initialize' is a failure in the abstraction. in fact, a better abstraction would be something like report_power_modes which would return a series of modes (sorted only by modeID) modeID, %power_used_in_this_mode, %capability_in_this_mode (I would make mode 0 always be complete power off, and mode 1 always be full capacity) report_power_mode_speed which would return a matrix giving how long it takes to transition from any mode to any other mode. this should be a relative number, not an absolute number since it will be different at different clock speeds. set_operational_mode(modeID) which would take you from whatever mode you are in now to the requested mode. most
Re: [linux-pm] Re: Hibernation considerations
Hi. On Saturday 21 July 2007 08:43:20 [EMAIL PROTECTED] wrote: > On Fri, 20 Jul 2007, Alan Stern wrote: > > > On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: > > > when doing a suspend-to-ram you get to a point where you just don't use > any userspace. > >> > >>> What do you mean? How can you prevent user tasks from running? That's > >>> basically what the freezer does, and the whole point of this approach > >>> is to eliminate the freezer. Right? > >> > >> Presumably no tasks at all would be scheduled. > > > > How would you prevent tasks from being scheduled? How would you > > prevent drivers from deadlocking because in order to put their device > > in a low-power state they need to acquire a lock which is held by a > > user task? > > you give up on the suspend becouse you have no way of getting the user > task to give up the lock. > > however, kernel locks should not be held by user tasks, user tasks are not > expected to behave in rational ways, allowing them to compete with kernel > tasks for locks is a sure way to get a deadlock or indefinate stall. > > what locks are accessed this way? Any userspace process can do a syscall. In the process of the syscall, it can take kernel locks, and it can schedule (eg, while seeking to take a second lock). Regards, Nigel pgpl7edMXgJyR.pgp Description: PGP signature
Re: [linux-pm] Re: Hibernation considerations
On Fri, 20 Jul 2007, Alan Stern wrote: On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: when doing a suspend-to-ram you get to a point where you just don't use any userspace. What do you mean? How can you prevent user tasks from running? That's basically what the freezer does, and the whole point of this approach is to eliminate the freezer. Right? Presumably no tasks at all would be scheduled. How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? you give up on the suspend becouse you have no way of getting the user task to give up the lock. however, kernel locks should not be held by user tasks, user tasks are not expected to behave in rational ways, allowing them to compete with kernel tasks for locks is a sure way to get a deadlock or indefinate stall. what locks are accessed this way? from that point on you are just walking the device tree putting things into low-power mode. This is the point where we are talking about jumping to. Yes. And putting things into low-power mode requires the ability to run the scheduler, which means that user tasks can be scheduled, which means that they can run. Does it really (fundamentally) require scheduling tasks, particularly in the case that the devices have already been put in the "quiesced" state? I can't say for sure. That's the way we have been doing it. It wouldn't be easy to change, because the driver would have to busy-wait during delays -- which would mean it would need to use different code for system-wide suspend and runtime suspend. please define terms so that we are all on the same page what do you mean by system-wide suspend runtime suspend David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Alan Stern <[EMAIL PROTECTED]> writes: > On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: >> >> when doing a suspend-to-ram you get to a point where you just don't use >> >> any userspace. >> >> > What do you mean? How can you prevent user tasks from running? That's >> > basically what the freezer does, and the whole point of this approach >> > is to eliminate the freezer. Right? >> >> Presumably no tasks at all would be scheduled. > How would you prevent tasks from being scheduled? How would you > prevent drivers from deadlocking because in order to put their device > in a low-power state they need to acquire a lock which is held by a > user task? Perhaps this isn't an issue once the device is already quiesced. I'm just conjecturing. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Sat, 21 Jul 2007, Rafael J. Wysocki wrote: On Friday, 20 July 2007 23:39, [EMAIL PROTECTED] wrote: On Fri, 20 Jul 2007, Rafael J. Wysocki wrote: On Friday, 20 July 2007 17:36, [EMAIL PROTECTED] wrote: On Fri, 20 Jul 2007, Jim Crilly wrote: has requested the image to be not greater than 50% of RAM. In that case you have to free some memory _before_ identifying memory to save and you must not race with applications that attempt to allocate memory while you're doing it. I disagree a little bit. first off, only the suspending kernel can know what can be freed and what is needed to do so (remember this is kernel internals, it can change from patch to patch, let alone version to version) second, if you have a lot of memory to free, and you can't just throw away caches to do so, you don't know what is going to be involved in freeing the memory, it's very possilbe that it is going to involve userspace, so you can't freeze any significant portion of the system, so you can't eliminate all chance of races what you can do is 1. try to free stuff 2. stop the system and account for memory, is enough free if not goto 1 if userspace is dirtying memory fast enough, or is just useing enough memory that you can't meet your limit you just won't be able to suspend. but under any other conditions you will eventually get enough memory free. so try several times and if you still fail tell the user they have too much stuff running and they need to kill something. Which would be a pretty big regression from what we have now. With the current implementation I can hibernate under virtually any workload because the freezer stops everything and there's no competition for resources. as long as what you are trying to save is <=50% of ram (at least with some implementations). if you are trying to save more then 50% of ram with some current implmenetations you just can't With some, you can't, with the others, you can. :-) The argument given was about the freezer and IMO it was valid. Why didn't you address it directly? I thought it had been covered in other messages (with as big as this thread is I'm trying to avoid repeating the same thing more then a couple times a day :-) there was another message talking about ways that you could reduce the image size without it being racy (allocate pinned memory until the remainder is small enough, then don't backup the pinned memory) that's a much cleaner answer then what I was thinking, so I'll go with it instead ;-) Wouldn't that cause the OOM killer to act, in some cases? only in the case where the image absolutly cannot be made small enough. and this should be detectable by the process that's pinning memory (this can be a kernel process) so that it stops before the OOM killer is triggered, even if that means that it returns 'unable to fit' David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: > >> when doing a suspend-to-ram you get to a point where you just don't use > >> any userspace. > > > What do you mean? How can you prevent user tasks from running? That's > > basically what the freezer does, and the whole point of this approach > > is to eliminate the freezer. Right? > > Presumably no tasks at all would be scheduled. How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? > >> from that point on you are just walking the device tree > >> putting things into low-power mode. This is the point where we are talking > >> about jumping to. > > > Yes. And putting things into low-power mode requires the ability to > > run the scheduler, which means that user tasks can be scheduled, which > > means that they can run. > > Does it really (fundamentally) require scheduling tasks, particularly in > the case that the devices have already been put in the "quiesced" state? I can't say for sure. That's the way we have been doing it. It wouldn't be easy to change, because the driver would have to busy-wait during delays -- which would mean it would need to use different code for system-wide suspend and runtime suspend. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
On Fri, 20 Jul 2007, Oliver Neukum wrote: > > We already have a pre-suspend notification available for drivers that > > need to allocate large amounts of memory. > > Is that facility fine grained enough? It's a notifier chain that gets called at several points during the suspend transition. One of those points is right at the start, while userspace is still running and reasonably large amounts of memory can be allocated. Is it fine-grained enough? I don't know -- hard to tell, since nothing much is using it yet. > > You are correct about the need to delay/stop device addition. I don't > > know how this can be done in general; each code path calling > > device_add() may have to be treated individually. > > What about the old API? What old API do you mean? > Do we have to block module loading? No. Registering new drivers is okay, registering new devices is bad. Of course, some modules do want to register a new device in their init method. I don't know what we should do about them. Force the registration to fail, I suppose. How often will people suspend while a module is loading? > What happens if a scsi error handler is woken? If it cannot be woken, > how are errors handled? Why should the error handler wake up? There isn't supposed to be any I/O going on, hence no errors to handle. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/