Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > Ok, I'll just blame fuse here. 'You have to write to /sys > > files for SIGKILL to work' is not funny. > > SIGKILL won't work on a stopped task. Neither on a traced task. > Neither on a zombie (how many newbies are thoroughly confused about > that ;) > > And it won't work on a task that is hanging on an NFS mount, when the > network broke. > > Oh, and it won't work on a deadlocked fuse filesystem. How many of > these have you met? How many of the others? And maybe someday, when I'm feeling _really_ bored, I may start hacking the VFS to make it possible for tasks to be killed while waiting on a mutex. Can a task doing a page fault be interrupted? I don't see why not. But, it's _not_ going to happen now, to fix the problems with suspend. Hacking fuse (however broken it may be) or the VFS is not the right way to fix those problems. It may be easier than fixing the drivers, though I doubt it. But even if it's the easier way it's not the _right_ way. We want the various subsystems in the kernel to be less dependent on each other not more. And the freezer is just adding unnecessary dependence on totally unrelated code. And that does not only impact fuse, but a lot of other code in the kernel, in the form of try_to_freeze() calls all over the place. Is that so hard to understand? You can bash fuse all you want, I really don't mind. But please stop trying to prove, that this situation is best handled in fuse. It just won't happen, not because I'm lazy to do the work but because I don't have the guts to do something so obviously wrong. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > Actually there's also a non-malicious case in which waiting for > > requests to finish won't work: when one fuse filesystem is accessing > > another. > > > > Since we are blocking new fuse requests, that might block a fuse > > daemon, which in turn makes it impossible to finish the pending > > request. > > Hmm, yes, this is ugly. But will it hurt? We'll just wait for number > of active fuse requests to go down to 0 before proceeding. I'm thinking of the following situation, immediately after we begin to wait for fuse requests to subside: - task A is performing file operation O1 - O1 is being served by task B - task B needs to perform O2 before finishing O1 Since we don't allow new requests, O2 will never start. So O1 will never finish. So the number of fuse requests will never go to 0. > It may still livelock on system with busy fused-s, but we'll detect > that with timeout, and I believe it is 'don't do that' situation. There are no busy fused's. If we allowed O2 to go through, everything would be nice and dandy. But we cannot know that O2 needs to be let through, and other fuse requests must not. > > And this is not at all theoretical, I know that encfs is used over > > various other fuse filesystems like sshfs or ntfs-3g. > > > > Yeah, stacking userspace filesystems could be done entirely in > > userspace, and I'm actually working on that (with fuse-2.7.0 already > > supporting some basic stacking). > > Great. > > > But the point is, that the "wait for fuse to quiescence" hack would > > not work in todays environment. > > Ok, I'll just blame fuse here. 'You have to write to /sys > files for SIGKILL to work' is not funny. SIGKILL won't work on a stopped task. Neither on a traced task. Neither on a zombie (how many newbies are thoroughly confused about that ;) And it won't work on a task that is hanging on an NFS mount, when the network broke. Oh, and it won't work on a deadlocked fuse filesystem. How many of these have you met? How many of the others? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sysrq-t dumps of s2ram/fuse deadlock (was Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Wed 2007-07-11 14:28:20, Matthew Garrett wrote: > On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote: > > Hi! > > > > Can we get them? They are neccessary for debugging 'what in suspend > > calls fuse' problem. And yes, that problem is there even when you > > remove freezer. > > I can produce them, but haven't managed to do that in any way that lets > me get them off the system yet. > > > Matthew, you seem to be the only one able to produce them... > > sys_sync may not do anything on a fuse filesystem, but if you've > loopback-mounted an ext3 filesystem from that fuse filesystem then it's > going to result in writes to it and deadlock. Aha, yes, that explains the problem. Thanks! Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Hi! > > So you want me to handle _malicious_ filesystems now? > > > > That should be easy... :-). You already have nasty deadlocks in FUSE, > > and you solve them by "root can echo 1 > abort"... so allow me the > > same possibility. > > > > We can tell fused we are freezing, and if all the requests are not > > serviced within, say, 30 seconds, we call the filesystem malicious and > > do echo 1 > abort. > > > > Not ideal, but neither is allowing malicious filesystems in the first > > place... > > Actually there's also a non-malicious case in which waiting for > requests to finish won't work: when one fuse filesystem is accessing > another. > > Since we are blocking new fuse requests, that might block a fuse > daemon, which in turn makes it impossible to finish the pending > request. Hmm, yes, this is ugly. But will it hurt? We'll just wait for number of active fuse requests to go down to 0 before proceeding. It may still livelock on system with busy fused-s, but we'll detect that with timeout, and I believe it is 'don't do that' situation. > And this is not at all theoretical, I know that encfs is used over > various other fuse filesystems like sshfs or ntfs-3g. > > Yeah, stacking userspace filesystems could be done entirely in > userspace, and I'm actually working on that (with fuse-2.7.0 already > supporting some basic stacking). Great. > But the point is, that the "wait for fuse to quiescence" hack would > not work in todays environment. Ok, I'll just blame fuse here. 'You have to write to /sys files for SIGKILL to work' is not funny. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > They will not trigger 100% of the time, but sporadically and generally at > > random. > > > > At least the freezer problems are reproducible. ;-) > > Our experience with powermacs has been that it isn't actually all that > hard to get it right for the drivers you care about. So... will you fix input and usb? Their maintainers seem to prefer simplicity of frozen system. Also, can you write down rules for drivers to follow? Like... 'if you get a request after system suspend started, you may not kmalloc(GFP_KERNEL)' - because disk driver may be already suspended - hmm, what happens if disk driver _is_ suspended? Let's say I'm v4l driver on cpu1. Someone may talk to me _while_ disk driver is being suspended... and if I'm after disk driver in pci order, I may only learn about suspend _after_ I do the wrong kmalloc. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sysrq-t dumps of s2ram/fuse deadlock (was Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote: > Hi! > > Can we get them? They are neccessary for debugging 'what in suspend > calls fuse' problem. And yes, that problem is there even when you > remove freezer. I can produce them, but haven't managed to do that in any way that lets me get them off the system yet. > Matthew, you seem to be the only one able to produce them... sys_sync may not do anything on a fuse filesystem, but if you've loopback-mounted an ext3 filesystem from that fuse filesystem then it's going to result in writes to it and deadlock. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sysrq-t dumps of s2ram/fuse deadlock (was Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Hi! Can we get them? They are neccessary for debugging 'what in suspend calls fuse' problem. And yes, that problem is there even when you remove freezer. Matthew, you seem to be the only one able to produce them... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Thanks for the detailed reply! On Jul 09, 2007, at 22:07:15, Nigel Cunningham wrote: On Friday 06 July 2007 15:01:48 Kyle Moffett wrote: Suppose hibernate is implemented like this: (1) Userspace program calls sys_freeze_processes() (a) Pokes all CPUs with IPMIs and tells them to finish the currently running timeslot then stop (b) Atomically sends SIGSTOP to all userspace processes in a non-trappable way, except the calling process and any process which is ptracing it. (c) Returns to the calling process. Ok. First, I'll ignore the specification that userspace does this - I don't think it matters whether it's userspace or kernel that does the suspending and I'm yet to see a good reason for it to be [required to be] done from userspace. The reason it's _required_ to be done from userspace is that userspace is the only one which can figure out "These processes need to run for suspend to work", and then let those processes continue running after the freeze. The *ONLY* reason this even stops processes at all is so we can do the post-device-mapper-snapshot code with very little usably-free RAM (IE: only about 1MB for a standard desktop system). In this first step, you've reinvented the first part of the current freezer implementation. The reason we don't use a real signal is precisely so we can have an untrappable SIGSTOP. In this regard, I particularly remember Win4Lin from a few years ago. It would die if you sent it a real signal, so we had to do it this way. No doubt there are other instances I'm not aware of. Well, you *do* want it to have semi-signal semantics, processes which receive it must not get back to userspace code so that they don't start allocating more memory when we're trying to do the freeze. You also don't want a process to be able to trap it (IE: like SIGSTOP or SIGKILL). On the other hand, it should be delivered asynchronously (IE: It doesn't break an interruptable sleep or respond to most is-a-signal- present checks). You don't actually care if its sleeping in the kernel somewhere, just as long as it doesn't allocate much memory. You would probably need a new signal "SIGFREEZE" which causes the process to be ignored as runnable the next time they schedule but never actually gets delivered, and a "SIGUNFREEZE" which does the reverse. That way userspace could selectively resume processes based on its policy of "this needs to run for hibernation". (2) Userspace process sends SIGCONT to only those processes which are necessary for sync and a device-mapper snapshot. How do you determine which ones are needed? It's userspace's job to know which ones are needed. For example, if you are hibernating over NFS then you need to resume the various NFS/ RPC daemons and threads. Why stop them in the first place? So they aren't allocating memory when we are doing the device-mapper snapshot. (3) Userspace calls sys_snapshot_kernel(snapshot_overhead_pages) (a) Kernel starts freeing memory and swapping stuff out to make room for a copy of *kernel* memory (not pagecache, not process RAM). It does the same for at least snapshot_overhead_pages extra (used by userspace later). It then allocates this memory to keep it from going away. Since most processes are stopped we won't have much else competing with us for the RAM. Ok. So now you also need processes running that are needed for swapping, because freeing that memory might involve swapping. Fully agree with the logic though (not really surprising - this is what I do in Suspend2^wTuxOnIce). (a) Kernel uses the device-mapper up-call-into-filesystem machinery to get all mounted filesystems synced and ready for a DM snapshot. This may include sending data via the userspace processes resumed in (2). Any deadlocks here are userspace's fault (see (2)). Will need some modification to handle doing multiple blockdevs at a time. Anything using FUSE is basically perma-synced anyways (no dep-handling needed), and anything using loop should already be handled by DM. This includes allocating memory for the basic snapshot datastructures. (b) At this point all blockdev operations should be halted and disk caches flushed; that's all we care about. (c) Go through the device tree and quiesce DMA and shut off interrupts. Since all the disks are synced this is easy. (d) Use IPMIs again to get all the CPUs together, which should be easy as most processes are sleeping in IO or SIGSTOPed, and we're getting no interrupts. (e) One CPU turns off all interrupts on itself and takes an atomic snapshot of kernel memory into the previously allocated storage. Once again, does not include pagecache. The kernel also records a list of what pages *are* included in the pagecache. It then marks all userspace pages as copy-on-write. Hotplugging cpus (when all those locking issues are taken care of) is simp
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi. Sorry for the long delay. Busy weekend and my motivation for working on programming is almost zero at the moment... On Friday 06 July 2007 15:01:48 Kyle Moffett wrote: > On Jul 06, 2007, at 00:03:15, Nigel Cunningham wrote: > > On Friday 06 July 2007 13:54:15 Benjamin Herrenschmidt wrote: > >> On Fri, 2007-07-06 at 09:35 +1000, Nigel Cunningham wrote: > >>> > >>> Nice try :) Okay then, you remove the freezer, try hibernating, > >>> then get back to me after you've fixed your filesystem because > >>> some process that wasn't frozen started writing things after the > >>> atomic copy (making the on disk filesystem inconsistent with the > >>> snapshot). > >>> > >>> As Pavel rightly said, you can get rid of the freezer, but you're > >>> only going to have to implement another one that does the > >>> essentially the same thing, even if it is at some other level. > >> > >> I was mostly talking about STR... Regarding STD, we have a > >> different problem and we all know it. The freezer is one somewhat > >> horrible way to get it working for now, I would prefer something > >> more along the way that blocks the page cache from writing out new > >> dirty pages though, except those specifically flagged by the > >> snapshot. > >> > >> That is, some kind of proper snapshotting facility, as linus was > >> describing some time ago. > > > > The kind of thing Linus was talking about would limit you (as > > swsusp and uswsusp do now) to only half the amount of memory. > > How so? Suppose hibernate is implemented like this: > > (1) Userspace program calls sys_freeze_processes() >(a) Pokes all CPUs with IPMIs and tells them to finish the > currently running timeslot then stop >(b) Atomically sends SIGSTOP to all userspace processes in a non- > trappable way, except the calling process and any process which is > ptracing it. >(c) Returns to the calling process. Ok. First, I'll ignore the specification that userspace does this - I don't think it matters whether it's userspace or kernel that does the suspending and I'm yet to see a good reason for it to be [required to be] done from userspace. In this first step, you've reinvented the first part of the current freezer implementation. The reason we don't use a real signal is precisely so we can have an untrappable SIGSTOP. In this regard, I particularly remember Win4Lin from a few years ago. It would die if you sent it a real signal, so we had to do it this way. No doubt there are other instances I'm not aware of. > (2) Userspace process sends SIGCONT to only those processes which are > necessary for sync and a device-mapper snapshot. How do you determine which ones are needed? Why stop them in the first place? > (3) Userspace calls sys_snapshot_kernel(snapshot_overhead_pages) >(a) Kernel starts freeing memory and swapping stuff out to make > room for a copy of *kernel* memory (not pagecache, not process RAM). > It does the same for at least snapshot_overhead_pages extra (used by > userspace later). It then allocates this memory to keep it from > going away. Since most processes are stopped we won't have much else > competing with us for the RAM. Ok. So now you also need processes running that are needed for swapping, because freeing that memory might involve swapping. Fully agree with the logic though (not really surprising - this is what I do in Suspend2^wTuxOnIce). >(a) Kernel uses the device-mapper up-call-into-filesystem > machinery to get all mounted filesystems synced and ready for a DM > snapshot. This may include sending data via the userspace processes > resumed in (2). Any deadlocks here are userspace's fault (see (2)). > Will need some modification to handle doing multiple blockdevs at a > time. Anything using FUSE is basically perma-synced anyways (no dep- > handling needed), and anything using loop should already be handled > by DM. This includes allocating memory for the basic snapshot > datastructures. >(b) At this point all blockdev operations should be halted and > disk caches flushed; that's all we care about. >(c) Go through the device tree and quiesce DMA and shut off > interrupts. Since all the disks are synced this is easy. >(d) Use IPMIs again to get all the CPUs together, which should be > easy as most processes are sleeping in IO or SIGSTOPed, and we're > getting no interrupts. >(e) One CPU turns off all interrupts on itself and takes an atomic > snapshot of kernel memory into the previously allocated storage. > Once again, does not include pagecache. The kernel also records a > list of what pages *are* included in the pagecache. It then marks > all userspace pages as copy-on-write. Hotplugging cpus (when all those locking issues are taken care of) is simpler. Prior to cpu hotplugging, I used IMPIs to put secondary cpus into a tight loop, so I know it's possible to do it this way too. That way, though, y
Re: [linux-pm] Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
Nigel Cunningham writes: > (Finally catching up on emails since Saturday!) Yes, I know what you mean; I was occupied with other work stuff on Friday and not well on Saturday, and by that stage there was a backlog of about 500 emails for me to wade through... :/ > The freezer is also essential for doing a full image of memory - you > have to be able to guarantee that some of the image won't change in > order to be able to write it out before the atomic copy. Of course the > freezer alone isn't enough, but it solves 95% of the issue. Sure. As I pointed out in another email, it's unrealistic to expect the freezer to be able to stop everything in a state where any arbitrary mutex is free. But as long as you don't need to do anything that depends on getting a mutex while the system is frozen, there should be no problem. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
Hi. On Monday 09 July 2007 09:01:27 Rafael J. Wysocki wrote: > Hi, > > On Monday, 9 July 2007 00:13, Pavel Machek wrote: > > Hi! > > > > > > Actaully, I'm perfectly fine with that, as long as each task blocked by the > > > > driver due to suspend has PF_FROZEN (or something similar) set. Then, at > > > > least theoretically, we'll be able to drop the freezer from the suspend code > > > > path and move it after device_suspend() (or the hibernation-specific > > > > equivalent) for hibernation (in that case there shouldn't be a problem with > > > > any task waiting on I/O while the freezer is running ;-)). > > > > > > I don't see the need for a freezer for snapshot but that's a different > > > issue. (stop_machine looks good enough to me). > > > > Freezer is not needed for snapshot -- it is needed so that we can > > write out the snapshot to disk without the need for special > > drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then > > write to disk from userland code in uswsusp). > > Yes. > > BTW, this patch: > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch > > that's queued up in -mm contains a freezer documentation update, in which the > reasons of using it, as well as its limitations, are described. > > To summarize what was previously said in this thread: > > * Apparently, we agree that the freezer is _generally_ not needed for suspend > (ie. any transition to a system sleep state other than hibernation), but some > of us (eg. me) think that it wouldn't be reasonable to drop the freezer from > the suspend code path _right_ _now_ . > > * Some of us, including you, Nigel and me, think that the freezer is needed > for hibernation (please see the document in the patch above for details). > In the (very) long run this might be avoided too, but (IMO) certainly not at > this point. > > * We seem to agree that in order to remove the freezer from the suspend code > path some work needs to be done on device drivers, driver midlayers and the > PM core. We also need to do some work on the PM core in order to introduce > a separate hibernation framework and IMO it would be reasonable to > synchronize these efforts. > > * We are now to decide what to do so that the freezer can be safely removed > from the suspend code path and how to integrate that change with the > hibernation code path (if possible and reasonable). > > * The freezer vs FUSE issue that started this thread remains unresolved, so > it would be desirable to provide a short-term fix (need not be very nice). (Finally catching up on emails since Saturday!) The freezer is also essential for doing a full image of memory - you have to be able to guarantee that some of the image won't change in order to be able to write it out before the atomic copy. Of course the freezer alone isn't enough, but it solves 95% of the issue. Regards, Nigel -- See http://www.tuxonice.net for Howtos, FAQs, mailing lists, wiki and bugzilla info. pgpyEufTeC6X5.pgp Description: PGP signature
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > > We can just wait for all fuse requests to be serviced before > > > proceeding further with freeze, right? > > > > Right. Nice way to slow down or stop the suspend with an unprivileged > > process. Avoiding that sort of DoS is one of the design goals of > > fuse. > > So you want me to handle _malicious_ filesystems now? > > That should be easy... :-). You already have nasty deadlocks in FUSE, > and you solve them by "root can echo 1 > abort"... so allow me the > same possibility. > > We can tell fused we are freezing, and if all the requests are not > serviced within, say, 30 seconds, we call the filesystem malicious and > do echo 1 > abort. > > Not ideal, but neither is allowing malicious filesystems in the first > place... Actually there's also a non-malicious case in which waiting for requests to finish won't work: when one fuse filesystem is accessing another. Since we are blocking new fuse requests, that might block a fuse daemon, which in turn makes it impossible to finish the pending request. And this is not at all theoretical, I know that encfs is used over various other fuse filesystems like sshfs or ntfs-3g. Yeah, stacking userspace filesystems could be done entirely in userspace, and I'm actually working on that (with fuse-2.7.0 already supporting some basic stacking). But the point is, that the "wait for fuse to quiescence" hack would not work in todays environment. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Mon, 9 Jul 2007, Benjamin Herrenschmidt wrote: > > This will be up to the people responsible for the subsystems. I can > > take care of USB. > > USB is not that much of a problem in the sense that for most "leaf" > drivers, USB is a provider (ie, the bus they sit on), not the client > (like the network stack is to network drivers). That's why I thought I would be able to handle it! :-) > There is still the race of: > > drivers_sysfs_write() > try_to_icebox() > < > hit hardware > > Those are akin, in some ways, to the freezer races. Some kind of RCU > might take care of them if we enable the icebox, then wait for all tasks > to hit an explicit schedule point once (or return to userland). That > would mean that drivers need to try_to_icebox() again if they do > something that may schedule (such as __get_user). So it's not a magic > solution, it has issues, but it can handle a lot of the simplest cases. It depends on what kind of race you're worried about. In general a sysfs access that causes I/O isn't much different from a char device's read or write. For such cases the driver would have to do something like this: drivers_sysfs_write() { restart: mutex_lock(private_io_mutex); if (device is suspended) { mutex_unlock(private_io_mutex); icebox(); goto restart; } ... hit hardware ... mutex_unlock(private_io_mutex); } And of course the suspend method would have to acquire the private_io_mutex. Some drivers might be able to use the device semaphore instead of adding a new mutex. I'm more concerned about races involving device registration. The best solution I can come up with is to use an rwsem: drivers_sysfs_write() { down_read(private_rwsem); ... register/unregister devices ... up_read(private_rwsem); } where the suspend notifer callout would do down_write() and the resume callout would do up_write(). Can you suggest anything better? > > Drivers are already prepared for device registration to fail (or they > > ought to be), so this change shouldn't knock the bottom out of things. > > device_add() isn't on a hot path, so adding an extra check and > > srcu_read_lock() won't hurt. > > True. Also, bus drivers could just flag the port with something saying > "try registering again later". Don't underestimate the power of "try > later" constructs :-) -ESUSPENDING would naturally imply that the caller should retry after the resume. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Mon, Jul 09, 2007 at 07:33:56PM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2007-07-09 at 11:23 +0200, Oliver Neukum wrote: > > Suspension is to be transparent. Apart from a jump in the system clock > > user space must not notice, thus returning errors due to suspension is > > not an option. > > Who says ? > > If I'm talking to a USB printer, it will notice the USB bus being > suspended, believe me. It's actually likely to screw up whatever job is > in progress. It make perfect sense to error out attempts to talk to it > when suspended. > > I don't think "transparent" is an absolute requirement. In some cases, > it doesn't make sense and the printer is a good example of that. The > printing daemon is typically something that will ideally need to grow > knowledge about suspend/resume to interrupt things at the right time. > > That's one of the thing... we don't even have the kernel right, so we > are light-years from having the rest of userland right. > > For example, from a user point of view, what should happen if you > suspend while printing ? There might be other existing things with a semantics that might be worth looking at, e.g.: I don't have an USB printer, but what happens if you unplug the printer while printing, wait 30 seconds, and plug it in again? Wait 3 hours? Could this work theoretically? Does it work in practice? This are non-theoretical use cases kernel and userspace already somehow handle today, and for userspace an USB printer disconnected for 3 hours and a 3 hour suspend might simply be the same with the latter not requiring any extra userspace handling. Another example might be mounted NFS file systems: The case that a client's ethernet cable gets unplugged for 24 hours and plugged in again is a real life use case NFS and all applications already have to handle. > Well, it's a matter of policy (thus should be configurable), but I would > exect (as a former Mac user) something around the lines of this as a > default: > > - If the laptop's clamshell has been closed, it's likely that the > user's just picking up the laptop for a ride, makes sense to abort jobs > in progress. > > - If this is a manual action via the menus, displaying a dialog asking > if you really want to suspend before your page is finished makes sense > (with that dialog disappearing and the machine going to sleep if you > just wait and let it print). > > That sort of thing requires much more than just kernel collaboration of > course, but a whole infrastructure in userland as well. > > So no, suspend is -not- transparent, doesn't have to be, it's something > that is to be decided on a case-by-case basis (or rather, > driver-per-driver and possibly via policy settings). But keep in mind that every piece that does not work transparently might result in less use cases for suspend. There are cases like CD burning while suspending that might not be solvable, but as much as possible should work as transparently as possible. > Ben. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Monday, 9 July 2007 11:14, Benjamin Herrenschmidt wrote: > On Mon, 2007-07-09 at 08:52 +0200, Oliver Neukum wrote: > > Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt: > > > Another issue that's been a problem forever with suspend is the > > > synchronous request_firmware interface. Lots of drivers do that in > > > resume() which will generally not work. > > > > Unfortunately yes they do. We now have the notifier chains, which > > can be used. Firmware, especially if it is large, must be requested > > while the system is still fully functional and paging will work. > > > > Asynchronocity will not help much because the devices must work > > after resumingand delaying that will not help with respect to needing > > devices to get at the firmware. > > True, pre-loading is the best approach. Agreed. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Mon, 2007-07-09 at 12:02 +0200, Pavel Machek wrote: > Hi! > > > > > But I'm not sure it's a good idea in the long run. Think of a printer > > > > daemon, for example. It shouldn't have to experience unexpected I/O > > > > problems merely because someone has decided to put the system to sleep. > > > > > > Why not ? Printer is offline when machine is asleep... trying to print > > ...filesystems are offline, too, when the machine is asleep. Yet, > unmounting everything on suspend would not result in useful suspend > support. > > Yes, I believe we should be transparent. You just compared apple and oranges... Try printing and half way through the page, suspend your USB bus, and see how the printer reacts. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > > But I'm not sure it's a good idea in the long run. Think of a printer > > > daemon, for example. It shouldn't have to experience unexpected I/O > > > problems merely because someone has decided to put the system to sleep. > > > > Why not ? Printer is offline when machine is asleep... trying to print ...filesystems are offline, too, when the machine is asleep. Yet, unmounting everything on suspend would not result in useful suspend support. Yes, I believe we should be transparent. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > In that case the "we need suspend to be invisible to userspace" as a > > reason to use the freezer would also be moot, since if you don't > > schedule userspace after offlining the CPUs, it can't notice this. > > After? Can you do the offlining atomically? Don't know. Wait for all CPUs to reach a scheduling point and then take them offline? Doesn't sound too difficult. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Am Montag, 9. Juli 2007 schrieb Miklos Szeredi: > In that case the "we need suspend to be invisible to userspace" as a > reason to use the freezer would also be moot, since if you don't > schedule userspace after offlining the CPUs, it can't notice this. After? Can you do the offlining atomically? Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Mon, 2007-07-09 at 11:23 +0200, Oliver Neukum wrote: > Suspension is to be transparent. Apart from a jump in the system clock > user space must not notice, thus returning errors due to suspension is > not an option. Who says ? If I'm talking to a USB printer, it will notice the USB bus being suspended, believe me. It's actually likely to screw up whatever job is in progress. It make perfect sense to error out attempts to talk to it when suspended. I don't think "transparent" is an absolute requirement. In some cases, it doesn't make sense and the printer is a good example of that. The printing daemon is typically something that will ideally need to grow knowledge about suspend/resume to interrupt things at the right time. That's one of the thing... we don't even have the kernel right, so we are light-years from having the rest of userland right. For example, from a user point of view, what should happen if you suspend while printing ? Well, it's a matter of policy (thus should be configurable), but I would exect (as a former Mac user) something around the lines of this as a default: - If the laptop's clamshell has been closed, it's likely that the user's just picking up the laptop for a ride, makes sense to abort jobs in progress. - If this is a manual action via the menus, displaying a dialog asking if you really want to suspend before your page is finished makes sense (with that dialog disappearing and the machine going to sleep if you just wait and let it print). That sort of thing requires much more than just kernel collaboration of course, but a whole infrastructure in userland as well. So no, suspend is -not- transparent, doesn't have to be, it's something that is to be decided on a case-by-case basis (or rather, driver-per-driver and possibly via policy settings). Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> Please have a look at the documentation update at the bottom of this patch: > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch > > It says what the freezer is for in the first place. :-) Thanks, good description. > > Yes, we need to make sure, that nothing is scheduled during (and > > possibly after) taking the snapshot. But AFAICS that could be > > achieved by unplugging all but one CPU. > > Actually, we want things to get scheduled, because we need some of them to > save the image (to make things more difficult, we don't know what will be > needed to save the image in advance ;-)). Well, kexecing a new kernel (as being pursued in one of the subthreads) to do the saving should take care of that. In that case the "we need suspend to be invisible to userspace" as a reason to use the freezer would also be moot, since if you don't schedule userspace after offlining the CPUs, it can't notice this. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Am Montag, 9. Juli 2007 schrieb Benjamin Herrenschmidt: > On Mon, 2007-07-09 at 08:47 +0200, Oliver Neukum wrote: > > Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt: > > > > But I'm not sure it's a good idea in the long run. Think of a printer > > > > daemon, for example. It shouldn't have to experience unexpected I/O > > > > problems merely because someone has decided to put the system to sleep. > > > > > > Why not ? Printer is offline when machine is asleep... trying to print > > > > Not necessarily. The machine must survive going to sleep while you are > > printing. Any other error return than -ERESTARTSYS is not an option. > > We can't simply change the ABI. > > Ugh ? Why returning an error from the printer driver to the userland > print server/daemon would prevent the machine from "surviving" ? I would > be happy with -EIO personally :-) Surviving is a bit strongly worded. Suspension is to be transparent. Apart from a jump in the system clock user space must not notice, thus returning errors due to suspension is not an option. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Mon, 2007-07-09 at 08:47 +0200, Oliver Neukum wrote: > Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt: > > > But I'm not sure it's a good idea in the long run. Think of a printer > > > daemon, for example. It shouldn't have to experience unexpected I/O > > > problems merely because someone has decided to put the system to sleep. > > > > Why not ? Printer is offline when machine is asleep... trying to print > > Not necessarily. The machine must survive going to sleep while you are > printing. Any other error return than -ERESTARTSYS is not an option. > We can't simply change the ABI. Ugh ? Why returning an error from the printer driver to the userland print server/daemon would prevent the machine from "surviving" ? I would be happy with -EIO personally :-) Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Mon, 2007-07-09 at 08:52 +0200, Oliver Neukum wrote: > Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt: > > Another issue that's been a problem forever with suspend is the > > synchronous request_firmware interface. Lots of drivers do that in > > resume() which will generally not work. > > Unfortunately yes they do. We now have the notifier chains, which > can be used. Firmware, especially if it is large, must be requested > while the system is still fully functional and paging will work. > > Asynchronocity will not help much because the devices must work > after resumingand delaying that will not help with respect to needing > devices to get at the firmware. True, pre-loading is the best approach. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt: > Another issue that's been a problem forever with suspend is the > synchronous request_firmware interface. Lots of drivers do that in > resume() which will generally not work. Unfortunately yes they do. We now have the notifier chains, which can be used. Firmware, especially if it is large, must be requested while the system is still fully functional and paging will work. Asynchronocity will not help much because the devices must work after resumingand delaying that will not help with respect to needing devices to get at the firmware. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt: > > But I'm not sure it's a good idea in the long run. Think of a printer > > daemon, for example. It shouldn't have to experience unexpected I/O > > problems merely because someone has decided to put the system to sleep. > > Why not ? Printer is offline when machine is asleep... trying to print Not necessarily. The machine must survive going to sleep while you are printing. Any other error return than -ERESTARTSYS is not an option. We can't simply change the ABI. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> You could do this with a straight mutex except for the > dev_suspend_trylock/unlock bit in uninterruptable contexts, but I > seem to recall somebody saying that could be made to work if there > was a real need for it. Alternatively you could just drive the > "Generic Mutex" guys up the wall by inventing your own pseudo-mutex > with a spinlock, a boolean value, and a waitqueue. Most of the time, the interrupt/atomic contexts are part of what I call the "main path" of the driver, which doesn't need most of this stuff. Depending on what is "upstream" from the driver, you typically set a condition stopping the flow of "interrupt" activity and wait for it to drain in suspend. For example, a block driver would instruct the driver to stop processing the request queue and wait for pending in flight requests to drain, or a network driver would stop the tx queue and wait for pending/concurrent xmit() callbacks to complete (basically sync. bh and sync. with NAPI poll). In both examples, helpers can be provided in the block layer or network stack to make that easier. In many cases, the driver may just turn off something in the chip in "suspend" (using a low level spinlock that it already has if it has an irq path to sync. HW access) and set some flag/state to instruct the irq handler to not restart, then synchronize_irq(). All those techniques are fairly simple and well known. The main source of problems are the "sideband" channels, such as sysfs accesses to driver tunables, ioctl's, etc... for which I believe mutexes can solve 90% of them. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Jul 08, 2007, at 20:33:37, Benjamin Herrenschmidt wrote: drivers_sysfs_write() while(!suspend_trylock()) try_to_icebox() --> or even try_to_freeze(), what's the difference? hit hardware unlock_suspend() where the PM core must wait for the suspend lock to get released (say with a timeout)? Somewhat. What I'd like is to have that a construct of that sort on a per-driver basis though :-) Now the question is, in what way would the above be different from a simple suspend mutex ? I've been using mutexes in the past for a couple of drivers (iirc, that's how I did it for dmasound_pmac, though that driver is long past obsolescence now). I agree completely. It's a bit trickier if you want to do work in uninterruptable contexts: driver_suspend_callback(...) { dev_suspend_lock(dev); put_hardware_to_sleep(dev); } driver_resume_callback(...) { wake_hardware_up(dev); dev_suspend_unlock(dev); } Then for sleep-capable contexts: dev_suspend_lock(dev); dev_suspend_unlock(dev); And for no-sleep contexts like interrupts etc: if (!dev_suspend_trylock(dev)) return postpone_work_for_later(dev, ...); do_stuff_with(dev); dev_suspend_unlock(dev); You could do this with a straight mutex except for the dev_suspend_trylock/unlock bit in uninterruptable contexts, but I seem to recall somebody saying that could be made to work if there was a real need for it. Alternatively you could just drive the "Generic Mutex" guys up the wall by inventing your own pseudo-mutex with a spinlock, a boolean value, and a waitqueue. Then when you put your driver to sleep, things trying to do IO will automatically "freeze" themselves exactly until the device is woken again. Assuming the driver model and subsystem get the ordering right for which devices to suspend/resume, then it's impossible to deadlock or cause hardware confusion. And even further, if you manage to make the automagic mutex-debugging code work with the noninterruptable trylock it will yell at you when the driver model does nasty deadlock- y things. On the other hand, if the driver model *doesn't* get the ordering right then it's fundamentally impossible to reliably suspend and resume. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> drivers_sysfs_write() > while(!suspend_trylock()) > try_to_icebox() --> or even try_to_freeze(), what's the > difference? > > hit hardware > unlock_suspend() > > where the PM core must wait for the suspend lock to get released (say with a > timeout)? Somewhat. What I'd like is to have that a construct of that sort on a per-driver basis though :-) Now the question is, in what way would the above be different from a simple suspend mutex ? I've been using mutexes in the past for a couple of drivers (iirc, that's how I did it for dmasound_pmac, though that driver is long past obsolescence now). But yes, that's the idea, something under control of drivers, instead of a 3rd party freezer that tries to hit at processes. > Hmm, perhaps we can use a special workqueue that's created before the suspend > (say by the PM core) and starts processing jobs after the resume? Yeah, whatever the implementation is, though, the interface should be transparent. But then, I see little use of that. Mostly things like this sysfs write to unbind Alan was talking about. Driver workqueues probably need to be functioning all the way until the driver itself is suspended, in which case, the driver may just flush_worqueues in it's suspend path at the right time to make sure it doesn't exit with something still pending. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
On Mon, 9 Jul 2007, Pavel Machek wrote: On Sun 2007-07-08 16:20:46, [EMAIL PROTECTED] wrote: On Mon, 9 Jul 2007, Pavel Machek wrote: Actaully, I'm perfectly fine with that, as long as each task blocked by the driver due to suspend has PF_FROZEN (or something similar) set. Then, at least theoretically, we'll be able to drop the freezer from the suspend code path and move it after device_suspend() (or the hibernation-specific equivalent) for hibernation (in that case there shouldn't be a problem with any task waiting on I/O while the freezer is running ;-)). I don't see the need for a freezer for snapshot but that's a different issue. (stop_machine looks good enough to me). Freezer is not needed for snapshot -- it is needed so that we can write out the snapshot to disk without the need for special drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then write to disk from userland code in uswsusp). instead of trying to freeze most of the system, could you do something like start a virtual machine sandbox to write the data out, and not let any userspace other then the sandbox operate? you would need to throw away disk buffers so that you don't mix current pending I/O with I/O from the sandbox, and this would be a visable change for how suspend is setup, but wouldn't this work? It feels kind of expensive, but yes, we could use another kernel for doing the dump. Kdump people are using that. We could use hypervisor for doing the dump. Xen people are doing that. (But I do not think any of those solutions is suitable for "lets hibernate my notebook" case). expensive and reliable beats efficiant and unrelaible. why do you say that neither would work for the "lets hibernate my notebook" case? Both would work. One would eat 8-64MB of your RAM, permanently; second would eat 5-15% of your cpu, permanently. Not very suitable. how much overlap is there between the two approaches? are they close enough to be able to give the user the choice of which to use depending on their machine (new machines with the hardware virtualization support may want the hypervisor, other hardware may want to sacrafice 8M of ram) Who says current solution is unreliable? users report problems, suspend* developers repeatedly state that the problems are that the rest of the kernel needs to be fixed to work properly with the existing approach. I think it's safe to say that it doesn't work in the general case, even though it does work in some specific cases. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
On Sun 2007-07-08 16:20:46, [EMAIL PROTECTED] wrote: > On Mon, 9 Jul 2007, Pavel Machek wrote: > > >Actaully, I'm perfectly fine with that, as long as each task blocked by > >the > >driver due to suspend has PF_FROZEN (or something similar) set. Then, > >at > >least theoretically, we'll be able to drop the freezer from the suspend > >code > >path and move it after device_suspend() (or the hibernation-specific > >equivalent) for hibernation (in that case there shouldn't be a problem > >with > >any task waiting on I/O while the freezer is running ;-)). > > I don't see the need for a freezer for snapshot but that's a different > issue. (stop_machine looks good enough to me). > >>> > >>>Freezer is not needed for snapshot -- it is needed so that we can > >>>write out the snapshot to disk without the need for special > >>>drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then > >>>write to disk from userland code in uswsusp). > >> > >>instead of trying to freeze most of the system, could you do something > >>like start a virtual machine sandbox to write the data out, and not let > >>any userspace other then the sandbox operate? > >> > >>you would need to throw away disk buffers so that you don't mix current > >>pending I/O with I/O from the sandbox, and this would be a visable change > >>for how suspend is setup, but wouldn't this work? > > > >It feels kind of expensive, but yes, we could use another kernel for > >doing the dump. Kdump people are using that. We could use hypervisor > >for doing the dump. Xen people are doing that. (But I do not think any > >of those solutions is suitable for "lets hibernate my notebook" case). > > expensive and reliable beats efficiant and unrelaible. > > why do you say that neither would work for the "lets hibernate my > notebook" case? Both would work. One would eat 8-64MB of your RAM, permanently; second would eat 5-15% of your cpu, permanently. Not very suitable. Who says current solution is unreliable? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
On Mon, 9 Jul 2007, Pavel Machek wrote: Actaully, I'm perfectly fine with that, as long as each task blocked by the driver due to suspend has PF_FROZEN (or something similar) set. Then, at least theoretically, we'll be able to drop the freezer from the suspend code path and move it after device_suspend() (or the hibernation-specific equivalent) for hibernation (in that case there shouldn't be a problem with any task waiting on I/O while the freezer is running ;-)). I don't see the need for a freezer for snapshot but that's a different issue. (stop_machine looks good enough to me). Freezer is not needed for snapshot -- it is needed so that we can write out the snapshot to disk without the need for special drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then write to disk from userland code in uswsusp). instead of trying to freeze most of the system, could you do something like start a virtual machine sandbox to write the data out, and not let any userspace other then the sandbox operate? you would need to throw away disk buffers so that you don't mix current pending I/O with I/O from the sandbox, and this would be a visable change for how suspend is setup, but wouldn't this work? It feels kind of expensive, but yes, we could use another kernel for doing the dump. Kdump people are using that. We could use hypervisor for doing the dump. Xen people are doing that. (But I do not think any of those solutions is suitable for "lets hibernate my notebook" case). expensive and reliable beats efficiant and unrelaible. why do you say that neither would work for the "lets hibernate my notebook" case? David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
Hi! > > Freezer is not needed for snapshot -- it is needed so that we can > > write out the snapshot to disk without the need for special > > drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then > > write to disk from userland code in uswsusp). > > Yes. > > BTW, this patch: > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch > > that's queued up in -mm contains a freezer documentation update, in which the > reasons of using it, as well as its limitations, are described. > > To summarize what was previously said in this thread: > > * Apparently, we agree that the freezer is _generally_ not needed for suspend > (ie. any transition to a system sleep state other than hibernation), but > some > of us (eg. me) think that it wouldn't be reasonable to drop the freezer from > the suspend code path _right_ _now_ . > > * Some of us, including you, Nigel and me, think that the freezer is needed > for hibernation (please see the document in the patch above for details). > In the (very) long run this might be avoided too, but (IMO) certainly not at > this point. > > * We seem to agree that in order to remove the freezer from the suspend code > path some work needs to be done on device drivers, driver midlayers and the > PM core. We also need to do some work on the PM core in order to introduce > a separate hibernation framework and IMO it would be reasonable to > synchronize these efforts. > > * We are now to decide what to do so that the freezer can be safely removed > from the suspend code path and how to integrate that change with the > hibernation code path (if possible and reasonable). Nice summary, thanks. > * The freezer vs FUSE issue that started this thread remains unresolved, so > it would be desirable to provide a short-term fix (need not be very nice). Actually there are _2_ freezer vs FUSE issues, and one of them should be simple to solve, once we have sysrq-t of the deadlock. (Or did I miss it somewhere with discussion going on 10 lists in parallel?) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
Hi! > >>>Actaully, I'm perfectly fine with that, as long as each task blocked by > >>>the > >>>driver due to suspend has PF_FROZEN (or something similar) set. Then, at > >>>least theoretically, we'll be able to drop the freezer from the suspend > >>>code > >>>path and move it after device_suspend() (or the hibernation-specific > >>>equivalent) for hibernation (in that case there shouldn't be a problem > >>>with > >>>any task waiting on I/O while the freezer is running ;-)). > >> > >>I don't see the need for a freezer for snapshot but that's a different > >>issue. (stop_machine looks good enough to me). > > > >Freezer is not needed for snapshot -- it is needed so that we can > >write out the snapshot to disk without the need for special > >drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then > >write to disk from userland code in uswsusp). > > instead of trying to freeze most of the system, could you do something > like start a virtual machine sandbox to write the data out, and not let > any userspace other then the sandbox operate? > > you would need to throw away disk buffers so that you don't mix current > pending I/O with I/O from the sandbox, and this would be a visable change > for how suspend is setup, but wouldn't this work? It feels kind of expensive, but yes, we could use another kernel for doing the dump. Kdump people are using that. We could use hypervisor for doing the dump. Xen people are doing that. (But I do not think any of those solutions is suitable for "lets hibernate my notebook" case). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
Hi, On Monday, 9 July 2007 00:13, Pavel Machek wrote: > Hi! > > > > Actaully, I'm perfectly fine with that, as long as each task blocked by > > > the > > > driver due to suspend has PF_FROZEN (or something similar) set. Then, at > > > least theoretically, we'll be able to drop the freezer from the suspend > > > code > > > path and move it after device_suspend() (or the hibernation-specific > > > equivalent) for hibernation (in that case there shouldn't be a problem > > > with > > > any task waiting on I/O while the freezer is running ;-)). > > > > I don't see the need for a freezer for snapshot but that's a different > > issue. (stop_machine looks good enough to me). > > Freezer is not needed for snapshot -- it is needed so that we can > write out the snapshot to disk without the need for special > drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then > write to disk from userland code in uswsusp). Yes. BTW, this patch: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch that's queued up in -mm contains a freezer documentation update, in which the reasons of using it, as well as its limitations, are described. To summarize what was previously said in this thread: * Apparently, we agree that the freezer is _generally_ not needed for suspend (ie. any transition to a system sleep state other than hibernation), but some of us (eg. me) think that it wouldn't be reasonable to drop the freezer from the suspend code path _right_ _now_ . * Some of us, including you, Nigel and me, think that the freezer is needed for hibernation (please see the document in the patch above for details). In the (very) long run this might be avoided too, but (IMO) certainly not at this point. * We seem to agree that in order to remove the freezer from the suspend code path some work needs to be done on device drivers, driver midlayers and the PM core. We also need to do some work on the PM core in order to introduce a separate hibernation framework and IMO it would be reasonable to synchronize these efforts. * We are now to decide what to do so that the freezer can be safely removed from the suspend code path and how to integrate that change with the hibernation code path (if possible and reasonable). * The freezer vs FUSE issue that started this thread remains unresolved, so it would be desirable to provide a short-term fix (need not be very nice). Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
On Mon, 9 Jul 2007, Pavel Machek wrote: Actaully, I'm perfectly fine with that, as long as each task blocked by the driver due to suspend has PF_FROZEN (or something similar) set. Then, at least theoretically, we'll be able to drop the freezer from the suspend code path and move it after device_suspend() (or the hibernation-specific equivalent) for hibernation (in that case there shouldn't be a problem with any task waiting on I/O while the freezer is running ;-)). I don't see the need for a freezer for snapshot but that's a different issue. (stop_machine looks good enough to me). Freezer is not needed for snapshot -- it is needed so that we can write out the snapshot to disk without the need for special drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then write to disk from userland code in uswsusp). instead of trying to freeze most of the system, could you do something like start a virtual machine sandbox to write the data out, and not let any userspace other then the sandbox operate? you would need to throw away disk buffers so that you don't mix current pending I/O with I/O from the sandbox, and this would be a visable change for how suspend is setup, but wouldn't this work? David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]
Hi! > > Actaully, I'm perfectly fine with that, as long as each task blocked by the > > driver due to suspend has PF_FROZEN (or something similar) set. Then, at > > least theoretically, we'll be able to drop the freezer from the suspend code > > path and move it after device_suspend() (or the hibernation-specific > > equivalent) for hibernation (in that case there shouldn't be a problem with > > any task waiting on I/O while the freezer is running ;-)). > > I don't see the need for a freezer for snapshot but that's a different > issue. (stop_machine looks good enough to me). Freezer is not needed for snapshot -- it is needed so that we can write out the snapshot to disk without the need for special drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then write to disk from userland code in uswsusp). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 23:20, Benjamin Herrenschmidt wrote: > > > But I'm not sure it's a good idea in the long run. Think of a printer > > daemon, for example. It shouldn't have to experience unexpected I/O > > problems merely because someone has decided to put the system to sleep. > > Why not ? Printer is offline when machine is asleep... trying to print > errors out, I don't see the problem there. At one point, we'll need a > cleaner way to also notify userland in which case our daemon could > become more intelligent and stop servicings things before sleep and > resume afterward :-) > > > This will be up to the people responsible for the subsystems. I can > > take care of USB. > > USB is not that much of a problem in the sense that for most "leaf" > drivers, USB is a provider (ie, the bus they sit on), not the client > (like the network stack is to network drivers). > > In most cases, that "helper" thing would sit on the client subsystem, > since it's the one feeding drivers with requests. The main ones I see at > hand are block, alsa, net, fb/drm... Some of them already have > infrastructure to do it, some my need some more work. > > > > I think it's a fairly significant change from the current freezer and I > > > also think it's a very good idea. The more I think about it, the more I > > > like it, in the sense that it's a simple drop-in that you could put in a > > > lot of the ioctl path of drivers to just block tasks that are trying to > > > call in while suspending, and could be used selectively by things like > > > the USB hub threads. > > > > That's what I had in mind. Rafael, can we add an "icebox" routine? > > Like Ben says, it doesn't need to be much more than a waitqueue > > that the current task puts itself on if a suspend is in progress. > > Callers arriving at a time when the icebox isn't activated should > > simply return without blocking. Basically the icebox should be active > > at the same times as the existing freezer. > > There is still the race of: > > drivers_sysfs_write() > try_to_icebox() > < > hit hardware > > Those are akin, in some ways, to the freezer races. I'm not sure what races you're referring to, but never mind. :-) This is the reason why the freezer waits for tasks to freeze, actually. > Some kind of RCU might take care of them if we enable the icebox, > then wait for all tasks to hit an explicit schedule point once (or return to > userland). This is what the freezer does, isn't it? ;-) > That would mean that drivers need to try_to_icebox() again if they do > something that may schedule (such as __get_user). So it's not a magic > solution, it has issues, but it can handle a lot of the simplest cases. Well, can't we do: drivers_sysfs_write() while(!suspend_trylock()) try_to_icebox() --> or even try_to_freeze(), what's the difference? hit hardware unlock_suspend() where the PM core must wait for the suspend lock to get released (say with a timeout)? > > Here's a wacky idea which just might work: > > > > In order to prevent binding and unbinding, while suspending devices all > > the PM core has to do is avoid dropping the device semaphores! It can > > release the semaphores as it resumes the devices. > > > > Of course, for this to work it's necessary to avoid changes to the > > device list during the suspend. However I believe the iteration can be > > made safe against unregistration, so we only have to prevent device > > registration. (And anyway, it won't be possible to unregister a device > > while the PM core is holding its semaphore.) > > > > If we are willing to be somewhat non-transparent, this is easy to > > accomplish. After the notifier chain has been alerted about the > > upcoming suspend, we tell the driver core to disallow adding new > > devices. Maybe use SRCU to synchronize with registration calls that > > are in progress. Thus, until the suspend is over device_add() will > > immediately return an error. We could even add a new ESUSPENDING code > > to errno.h; it would come in handy in a few places. > > > > Drivers are already prepared for device registration to fail (or they > > ought to be), so this change shouldn't knock the bottom out of things. > > device_add() isn't on a hot path, so adding an extra check and > > srcu_read_lock() won't hurt. > > True. Also, bus drivers could just flag the port with something saying > "try registering again later". Don't underestimate the power of "try > later" constructs :-) > > > I have had the same thought, that unbinding and unregistration would be > > easier to handle than binding and registration. As it happens, holding > > the device semaphore will block both all three -- which makes life > > simpler. > > Yup. Fair and simple. > > > There are other possibilities too. For example, instead of using > > keventd these attri
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 2007-07-08 at 23:45 +0200, Rafael J. Wysocki wrote: > > Workqueues are kernel threads and the creator decides if they are going to > freeze. There are only two freezable worqueues in the entire tree right now. That and keventd workqueues... my point is you may well end up with something in a workqueue doing things that should have been blocked after the freeze. > Actaully, I'm perfectly fine with that, as long as each task blocked by the > driver due to suspend has PF_FROZEN (or something similar) set. Then, at > least theoretically, we'll be able to drop the freezer from the suspend code > path and move it after device_suspend() (or the hibernation-specific > equivalent) for hibernation (in that case there shouldn't be a problem with > any task waiting on I/O while the freezer is running ;-)). I don't see the need for a freezer for snapshot but that's a different issue. (stop_machine looks good enough to me). Ben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 23:03, Benjamin Herrenschmidt wrote: > On Sun, 2007-07-08 at 21:15 +0200, Rafael J. Wysocki wrote: > > On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote: > > [--snip--] > > > > > > I just think that the freezer approach, as it is, is backward. We can't > > > have a 3rd party try to discriminate what to freeze and what not, it > > > will always get something wrong, and in some cases with the wrong timing > > > or ordering. > > > > Nice discussion, except for one thing: the freezer doesn't decide what to > > freeze. For example, even right now kernel threads decide if they want to > > be > > frozen. > > Somewhat... userspace doesn't and workqueues are a gray area. Workqueues are kernel threads and the creator decides if they are going to freeze. There are only two freezable worqueues in the entire tree right now. > Also, I've been thinking this "icebox" idea a bit more and it seems in > fact a bit racy in some areas, at least for use by things like drivers, > unless we end up doing something aking to an RCU on suspend, waiting for > all tasks to reach userland once, but that has the same annoyances as > the current freezer. > > Thus I'm tempted to go back to saying that driver can handle things > locally :-) Actaully, I'm perfectly fine with that, as long as each task blocked by the driver due to suspend has PF_FROZEN (or something similar) set. Then, at least theoretically, we'll be able to drop the freezer from the suspend code path and move it after device_suspend() (or the hibernation-specific equivalent) for hibernation (in that case there shouldn't be a problem with any task waiting on I/O while the freezer is running ;-)). Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 22:16, Alan Stern wrote: > I'll make this reply short by agreeing up front with most of what you > say. > > On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote: > > > But that's only the "main" path. Aside for that, almost all drivers also > > have sideband "request" input and some driver don't actually live behind > > a subsystem. That ranges from ioctl, to direct read/write on a char dev > > from userland. > > Yes, these are the problem cases. > > > I think many of those cases can fairly well deal with just taking a PM > > semaphore, that's how I did for a couple of things in the past, provided > > that the request path isn't deadlocking with the semaphore held because > > of the system suspending of course. > > That's what USB does as well (for the drivers which have runtime PM > support -- at the moment only a few of them). > > > But in a whole lot of cases, it's, I beleive, perfectly kosher to just > > return an error. You're trying to capture frame from your camera while > > the machine is suspended ? error. At worst, your capture app will be > > unhappy when you wakeup, nothing terrible and totally fixable in > > userland if it's a problem. > > We can try falling back on this approach for now. If the drivers are > smart enough to fail cleanly when the device is already suspended, it > should work. And are they? > But I'm not sure it's a good idea in the long run. Think of a printer > daemon, for example. It shouldn't have to experience unexpected I/O > problems merely because someone has decided to put the system to sleep. Agreed. > > In some cases, we could use a little bit more help from the subsystem. > > Network for example, could have some explicit knowledge of the suspend > > state, and in addition to stopping the queue would also stop calling > > into things like change_mtu or set_multicast, provided it's agreed that > > the driver will account for those changes on resume (the actual MTU > > values or multicast lists are still updated in the netdev). > > This will be up to the people responsible for the subsystems. I can > take care of USB. > > > There are two things I believe. There's a generic issue with usermode > > helpers that make no sense to call between pre-suspend and > > post-resume, and there's the specific issue of adding/removing > > devices. > > > > I believe that "bus" drivers such as USB should indeed get a first > > round of notifications to tell them to stop performing bus > > plug/unplug operations (it's debatable whether we want to keep unplug > > going provided we can stack up the usermode events and re-send them > > later though, but let's say no for the sake of simplicity). > > Yes. Rafael, how close is your new notifier chain to mainline? Can it > at least be added to Greg KH's development tree so that I can start > using it? It's in -mm. The patches queued up in -mm are also in my patchset at http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/ > > > So instead, why not have the PM core take care of all this? There > > > could be a block_task_until_suspend_is_over() routine available for all > > > drivers to use. Its effect would be exactly the same as sending the > > > current task into the freezer, but it wouldn't be the freezer that > > > exists now. It would just be some routine that blocks until the system > > > suspend is over. We could call it "the icebox" instead of "the > > > freezer". :-) > > > > I'm not totally sure about that. I like some of it, but I think it's > > fairly different conceptually from the freezer (and the implementation > > could be as trivial as a single system wide wait queue). > > Exactly. > > > Basically it has a very big difference to the current freezer, and I > > like that, which is that we don't have some 3rd party trying to find out > > what to freeze and what not (the freezer), but instead, we have > > explicitely drivers or kernel threads sending -themselves- to the > > "icebox" when they think it's a good idea. Think of it as lazy freezing > > -> you only freeze lazy tasks that are trying to do something that > > cannot be done because of suspend. > > > > > Does that make you happier? > > > > I think it's a fairly significant change from the current freezer and I > > also think it's a very good idea. The more I think about it, the more I > > like it, in the sense that it's a simple drop-in that you could put in a > > lot of the ioctl path of drivers to just block tasks that are trying to > > call in while suspending, and could be used selectively by things like > > the USB hub threads. > > That's what I had in mind. Rafael, can we add an "icebox" routine? Yes, I think so. > Like Ben says, it doesn't need to be much more than a waitqueue > that the current task puts itself on if a suspend is in progress. > Callers arriving at a time when the icebox isn't activated should > simply return without blocking. Basically the icebox should be activ
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 2007-07-08 at 16:22 -0400, Alan Stern wrote: > On Sun, 8 Jul 2007, Rafael J. Wysocki wrote: > > > I'm all for changing this infrastructure, but in an organized way (ie. we > > discuss what to do next, we do that and then we go to the next step) and in > > the order that everyone will be comfortable with. > > > > So, let's finish this thread and start over from discussing what needs to be > > done, how (ie. in what order etc.) we are going to do that and who is going > > to > > do what. Shall we? > > IMO we should start by using the new notifier chain and by implementing > a central "icebox" routine. Then we can forbid device registration > during suspend. > > It might also be a good idea to add a freezable keventd-like workqueue > specifically intended for things that need to block during a suspend. > Although maybe this will end up being unnecessary; it's too soon to > tell. I think looking at the kmallo/gfp issue I mentioned earlier should be done asap. I can try to spare some time for it this week but I can't promise, I'm actually swamped by something totally unrelated at the moment. Another issue that's been a problem forever with suspend is the synchronous request_firmware interface. Lots of drivers do that in resume() which will generally not work. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> But I'm not sure it's a good idea in the long run. Think of a printer > daemon, for example. It shouldn't have to experience unexpected I/O > problems merely because someone has decided to put the system to sleep. Why not ? Printer is offline when machine is asleep... trying to print errors out, I don't see the problem there. At one point, we'll need a cleaner way to also notify userland in which case our daemon could become more intelligent and stop servicings things before sleep and resume afterward :-) > This will be up to the people responsible for the subsystems. I can > take care of USB. USB is not that much of a problem in the sense that for most "leaf" drivers, USB is a provider (ie, the bus they sit on), not the client (like the network stack is to network drivers). In most cases, that "helper" thing would sit on the client subsystem, since it's the one feeding drivers with requests. The main ones I see at hand are block, alsa, net, fb/drm... Some of them already have infrastructure to do it, some my need some more work. > > I think it's a fairly significant change from the current freezer and I > > also think it's a very good idea. The more I think about it, the more I > > like it, in the sense that it's a simple drop-in that you could put in a > > lot of the ioctl path of drivers to just block tasks that are trying to > > call in while suspending, and could be used selectively by things like > > the USB hub threads. > > That's what I had in mind. Rafael, can we add an "icebox" routine? > Like Ben says, it doesn't need to be much more than a waitqueue > that the current task puts itself on if a suspend is in progress. > Callers arriving at a time when the icebox isn't activated should > simply return without blocking. Basically the icebox should be active > at the same times as the existing freezer. There is still the race of: drivers_sysfs_write() try_to_icebox() < hit hardware Those are akin, in some ways, to the freezer races. Some kind of RCU might take care of them if we enable the icebox, then wait for all tasks to hit an explicit schedule point once (or return to userland). That would mean that drivers need to try_to_icebox() again if they do something that may schedule (such as __get_user). So it's not a magic solution, it has issues, but it can handle a lot of the simplest cases. > Here's a wacky idea which just might work: > > In order to prevent binding and unbinding, while suspending devices all > the PM core has to do is avoid dropping the device semaphores! It can > release the semaphores as it resumes the devices. > > Of course, for this to work it's necessary to avoid changes to the > device list during the suspend. However I believe the iteration can be > made safe against unregistration, so we only have to prevent device > registration. (And anyway, it won't be possible to unregister a device > while the PM core is holding its semaphore.) > > If we are willing to be somewhat non-transparent, this is easy to > accomplish. After the notifier chain has been alerted about the > upcoming suspend, we tell the driver core to disallow adding new > devices. Maybe use SRCU to synchronize with registration calls that > are in progress. Thus, until the suspend is over device_add() will > immediately return an error. We could even add a new ESUSPENDING code > to errno.h; it would come in handy in a few places. > > Drivers are already prepared for device registration to fail (or they > ought to be), so this change shouldn't knock the bottom out of things. > device_add() isn't on a hot path, so adding an extra check and > srcu_read_lock() won't hurt. True. Also, bus drivers could just flag the port with something saying "try registering again later". Don't underestimate the power of "try later" constructs :-) > I have had the same thought, that unbinding and unregistration would be > easier to handle than binding and registration. As it happens, holding > the device semaphore will block both all three -- which makes life > simpler. Yup. Fair and simple. > There are other possibilities too. For example, instead of using > keventd these attributes could use a separate workqueue which would put > itself in the icebox during a suspend. Or maybe sysfs can be reworked > so that they don't need to use a workqueue at all. I think having a facility for a given workqueue entry to requeue itself for after resume might be of use for drivers too. > > Don't get me wrong, I never said we don't need generic infrastructure > > and utilities, such as your proposed icebox scheme, or some of those > > workqueue bits, helpers in subsystems, etc... > > I hope that everyone will agree that now is a good time to get started > on them. There shouldn't be any problem about having them present > along with the freezer, and then it will be all that much easier to > remove the freezer later on if tha
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > I'm all for changing this infrastructure, but in an organized way (ie. we > > discuss what to do next, we do that and then we go to the next step) and in > > the order that everyone will be comfortable with. > > > > So, let's finish this thread and start over from discussing what needs to be > > done, how (ie. in what order etc.) we are going to do that and who is going > > to > > do what. Shall we? > > IMO we should start by using the new notifier chain and by implementing > a central "icebox" routine. Then we can forbid device registration > during suspend. > > It might also be a good idea to add a freezable keventd-like workqueue > specifically intended for things that need to block during a suspend. > Although maybe this will end up being unnecessary; it's too soon to > tell. We actually had patches for freezeable workqueues at one point. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > > I just think that the freezer approach, as it is, is backward. We can't > > > have a 3rd party try to discriminate what to freeze and what not, it > > > will always get something wrong, and in some cases with the wrong timing > > > or ordering. > > > > Nice discussion, except for one thing: the freezer doesn't decide what to > > freeze. For example, even right now kernel threads decide if they want to > > be > > frozen. > > Somewhat... userspace doesn't and workqueues are a gray area. But userspace must not be neccessary for kernel functioning, so that's quite okay. And we do need to solve the workqueues. > Also, I've been thinking this "icebox" idea a bit more and it seems in > fact a bit racy in some areas, at least for use by things like drivers, > unless we end up doing something aking to an RCU on suspend, waiting for > all tasks to reach userland once, but that has the same annoyances as > the current freezer. > > Thus I'm tempted to go back to saying that driver can handle things > locally :-) :-). Or perhaps freezer is not _that_ evil after all? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 2007-07-08 at 21:15 +0200, Rafael J. Wysocki wrote: > On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote: > [--snip--] > > > > I just think that the freezer approach, as it is, is backward. We can't > > have a 3rd party try to discriminate what to freeze and what not, it > > will always get something wrong, and in some cases with the wrong timing > > or ordering. > > Nice discussion, except for one thing: the freezer doesn't decide what to > freeze. For example, even right now kernel threads decide if they want to be > frozen. Somewhat... userspace doesn't and workqueues are a gray area. Also, I've been thinking this "icebox" idea a bit more and it seems in fact a bit racy in some areas, at least for use by things like drivers, unless we end up doing something aking to an RCU on suspend, waiting for all tasks to reach userland once, but that has the same annoyances as the current freezer. Thus I'm tempted to go back to saying that driver can handle things locally :-) Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > There are two things I believe. There's a generic issue with usermode > > helpers that make no sense to call between pre-suspend and > > post-resume, and there's the specific issue of adding/removing > > devices. > > > > I believe that "bus" drivers such as USB should indeed get a first > > round of notifications to tell them to stop performing bus > > plug/unplug operations (it's debatable whether we want to keep unplug > > going provided we can stack up the usermode events and re-send them > > later though, but let's say no for the sake of simplicity). > > Yes. Rafael, how close is your new notifier chain to mainline? Can it > at least be added to Greg KH's development tree so that I can start > using it? It should be in -mm, IIRC. > > I think it's a fairly significant change from the current freezer and I > > also think it's a very good idea. The more I think about it, the more I > > like it, in the sense that it's a simple drop-in that you could put in a > > lot of the ioctl path of drivers to just block tasks that are trying to > > call in while suspending, and could be used selectively by things like > > the USB hub threads. > > That's what I had in mind. Rafael, can we add an "icebox" routine? > Like Ben says, it doesn't need to be much more than a waitqueue > that the current task puts itself on if a suspend is in progress. > Callers arriving at a time when the icebox isn't activated should > simply return without blocking. Basically the icebox should be active > at the same times as the existing freezer. You could use try_to_freeze(), but you want it to be separate routine so it can be grepped for... Can you #define icebox_me try_to_freeze for testing? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Sunday, 8 July 2007 21:50, Miklos Szeredi wrote: > > > > Well, fix userspace filesystems and maybe NFS. If they react to > > > > sigstop in timely manner, they will work with suspend properly, too. > > > > > > Which is pretty much impossible, given the unix filesystem API. To be > > > able to react to sigstop, the operations in question need to be > > > restartable. Which they are not, so they can't react to sigstop. End > > > of story. > > > > Or not. That depends on your willingness to cooperate, I'd say. :-) > > Do you actually understand what I'm talking about? Because it sure > doesn't depend on my cooperation. > > Maybe I'm stupid, and I'm missing something obvious. In that case > please explain how you propose to make filesystem operations, like > rename() restartable. I'm not proposing that, sorry. I'm not a filesystem expert, so don't think I can propose a working solution here, right now. Still, maybe something like in fork.c, line 1409-1411 might work (I know that it won't solve the "other tasks may be waiting on VFS mutexes" issue, but at least it may decrease the probability of freezer failure). > > > You may not like the fact that one process can cause another to go > > > into uninterruptible sleep, but in fact there's nothing wrong with > > > that. > > > > Well, this introduces interdependencies between processes that do not exist > > otherwise. Even if that isn't wrong per se, it's something that needs > > consideration in any case. > > > > IMO, FUSE breaks one of the assumptions that the freezer is based on and > > saying that the freezer is broken because of that is unfair. > > The freezer is not broken because of that, it's broken anyway. What > we are seeing is a _symptom_ of it being broken. > > And by broken, I don't mean it's buggy or that it was badly designed. > I just mean, that it's simply not what suspend should depend on, to > protect drivers. Please have a look at the documentation update at the bottom of this patch: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch It says what the freezer is for in the first place. :-) > > > So the fact that the freezer can't handle this is unfortunate, but > > > it's just a symptom of the brokenness of it, not something that fuse > > > introduced. Not being able to suspend with NFS (or other network > > > filesystems) when the network is lost shows that this is a deeper > > > problem. > > > > Well, the system that cannot access its filesystems is not in a consistent > > state, so it generally is not reasonable to suspend or hibernate it. > > Saying the system must be in a "consistent" state, and defining > consistent as "every process is stopped", is just an arbitrary > limitation that fits what the freezer does now. Yes the _hardware_ > state must be consistent, but that has nothing to do with either fuse > or NFS. Can't you see that? Well, I can agree as far as FUSE is concerned. Still, imagine that you have an NFS share mounted, which is unavailable at the moment and one of the tasks waits for an I/O on it to complete (it might hold some locks needed for suspending, but say it doesn't). Then you suspend, take the box somewhere else and attach to a different network with a different NFS server. Now, you resume and you have a mess to recover from. Yes, it _should_ be recoverable, but refusing to suspend in such a case is not okerkill, IMHO. > > > As stated otherwise in the thread, suspend2 in fact allowed processes > > > to be in uninterruptible sleep instead, without negative side effects. > > > > And yet, Nigel thinks that the freezer is necessary for the hibernation. > > Strange, no? > > I'm totally ignorant about why the freezer is necessary for hibernate. > Please explain. See above. :-) > Yes, we need to make sure, that nothing is scheduled during (and > possibly after) taking the snapshot. But AFAICS that could be > achieved by unplugging all but one CPU. Actually, we want things to get scheduled, because we need some of them to save the image (to make things more difficult, we don't know what will be needed to save the image in advance ;-)). Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > But in a whole lot of cases, it's, I beleive, perfectly kosher to just > return an error. You're trying to capture frame from your camera while > the machine is suspended ? error. At worst, your capture app will be > unhappy when you wakeup, nothing terrible and totally fixable in > userland if it's a problem. Well, that way you'd have to teach applications about suspend... Which is quite bad. You mentioned it -- returning random errors will be very bad for machines like OLPC that want to suspend automatically. Plus it is a step back from current implementation, and ABI change, too... > > So instead, why not have the PM core take care of all this? There > > could be a block_task_until_suspend_is_over() routine available for all > > drivers to use. Its effect would be exactly the same as sending the > > current task into the freezer, but it wouldn't be the freezer that > > exists now. It would just be some routine that blocks until the system > > suspend is over. We could call it "the icebox" instead of "the > > freezer". :-) > > I'm not totally sure about that. I like some of it, but I think it's > fairly different conceptually from the freezer (and the implementation > could be as trivial as a single system wide wait queue). > > Basically it has a very big difference to the current freezer, and I > like that, which is that we don't have some 3rd party trying to find out > what to freeze and what not (the freezer), but instead, we have > explicitely drivers or kernel threads sending -themselves- to the > "icebox" when they think it's a good idea. Think of it as lazy > freezing Kernel threads already send _themselves_ to the refrigerator. [Plus we put all the userland there, which is what you don't like, but kernel can not rely on userland after suspend starts, anyway, so it should not hurt]. Anyway.. PPC currently suspends without freezer, which puts rules on drivers. ("Must handle i/o requests after .suspend() method is ran, must not use GFP_KERNEL to do so, must not try to synchronously communicate with userspace before _all_ devices are unfrozen") I am not certain what the exact rules are, but you seem to know them. Could we get Doc*/power/suspend_wo_freezer.txt describing them for driver authors? That way we can make sure drivers work on ppc, too, and maybe get rid of freezer in the long run. > > You also agree that kernel threads and workqueues must be allowed to > > operate during suspend. > > Yes, unless kernel threads explicitely decide to stop themselves (for > example, khubd is a good candidate for that). Again, not a 3rd party > trying to decide what to freeze and what not, but the drivers or kernel > threads themselves deciding it. This is how it works currently in -mm. (Plus, the rule is that threads that decide _not to_ stop themselves should not do any I/O.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 8 Jul 2007, Rafael J. Wysocki wrote: > I'm all for changing this infrastructure, but in an organized way (ie. we > discuss what to do next, we do that and then we go to the next step) and in > the order that everyone will be comfortable with. > > So, let's finish this thread and start over from discussing what needs to be > done, how (ie. in what order etc.) we are going to do that and who is going to > do what. Shall we? IMO we should start by using the new notifier chain and by implementing a central "icebox" routine. Then we can forbid device registration during suspend. It might also be a good idea to add a freezable keventd-like workqueue specifically intended for things that need to block during a suspend. Although maybe this will end up being unnecessary; it's too soon to tell. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote: > Note that we could have also a per-device "icebox" (just a waitqueue). > Might be nice for a device to resume whatever it froze just after it > resumed itself, might even be necessary in case whatever thread got > frozen is necessary for handling child devices. This wouldn't need any special infrastructure. Just a regular waitqueue defined in the device's private data structure. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
I'll make this reply short by agreeing up front with most of what you say. On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote: > But that's only the "main" path. Aside for that, almost all drivers also > have sideband "request" input and some driver don't actually live behind > a subsystem. That ranges from ioctl, to direct read/write on a char dev > from userland. Yes, these are the problem cases. > I think many of those cases can fairly well deal with just taking a PM > semaphore, that's how I did for a couple of things in the past, provided > that the request path isn't deadlocking with the semaphore held because > of the system suspending of course. That's what USB does as well (for the drivers which have runtime PM support -- at the moment only a few of them). > But in a whole lot of cases, it's, I beleive, perfectly kosher to just > return an error. You're trying to capture frame from your camera while > the machine is suspended ? error. At worst, your capture app will be > unhappy when you wakeup, nothing terrible and totally fixable in > userland if it's a problem. We can try falling back on this approach for now. If the drivers are smart enough to fail cleanly when the device is already suspended, it should work. But I'm not sure it's a good idea in the long run. Think of a printer daemon, for example. It shouldn't have to experience unexpected I/O problems merely because someone has decided to put the system to sleep. > In some cases, we could use a little bit more help from the subsystem. > Network for example, could have some explicit knowledge of the suspend > state, and in addition to stopping the queue would also stop calling > into things like change_mtu or set_multicast, provided it's agreed that > the driver will account for those changes on resume (the actual MTU > values or multicast lists are still updated in the netdev). This will be up to the people responsible for the subsystems. I can take care of USB. > There are two things I believe. There's a generic issue with usermode > helpers that make no sense to call between pre-suspend and > post-resume, and there's the specific issue of adding/removing > devices. > > I believe that "bus" drivers such as USB should indeed get a first > round of notifications to tell them to stop performing bus > plug/unplug operations (it's debatable whether we want to keep unplug > going provided we can stack up the usermode events and re-send them > later though, but let's say no for the sake of simplicity). Yes. Rafael, how close is your new notifier chain to mainline? Can it at least be added to Greg KH's development tree so that I can start using it? > > So instead, why not have the PM core take care of all this? There > > could be a block_task_until_suspend_is_over() routine available for all > > drivers to use. Its effect would be exactly the same as sending the > > current task into the freezer, but it wouldn't be the freezer that > > exists now. It would just be some routine that blocks until the system > > suspend is over. We could call it "the icebox" instead of "the > > freezer". :-) > > I'm not totally sure about that. I like some of it, but I think it's > fairly different conceptually from the freezer (and the implementation > could be as trivial as a single system wide wait queue). Exactly. > Basically it has a very big difference to the current freezer, and I > like that, which is that we don't have some 3rd party trying to find out > what to freeze and what not (the freezer), but instead, we have > explicitely drivers or kernel threads sending -themselves- to the > "icebox" when they think it's a good idea. Think of it as lazy freezing > -> you only freeze lazy tasks that are trying to do something that > cannot be done because of suspend. > > > Does that make you happier? > > I think it's a fairly significant change from the current freezer and I > also think it's a very good idea. The more I think about it, the more I > like it, in the sense that it's a simple drop-in that you could put in a > lot of the ioctl path of drivers to just block tasks that are trying to > call in while suspending, and could be used selectively by things like > the USB hub threads. That's what I had in mind. Rafael, can we add an "icebox" routine? Like Ben says, it doesn't need to be much more than a waitqueue that the current task puts itself on if a suspend is in progress. Callers arriving at a time when the icebox isn't activated should simply return without blocking. Basically the icebox should be active at the same times as the existing freezer. > > Of course, another possibility is simply to fail the bind. But that's > > not very satisfying, since suspends should be transparent. > > I don't think suspend has to be -that- transparent (though there is some > debate on whether it should be if we're gonna do some kind of fast > "light" suspend for things like OLPC) but overall, I agree that a bind > operatio
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > > Well, fix userspace filesystems and maybe NFS. If they react to > > > sigstop in timely manner, they will work with suspend properly, too. > > > > Which is pretty much impossible, given the unix filesystem API. To be > > able to react to sigstop, the operations in question need to be > > restartable. Which they are not, so they can't react to sigstop. End > > of story. > > Or not. That depends on your willingness to cooperate, I'd say. :-) Do you actually understand what I'm talking about? Because it sure doesn't depend on my cooperation. Maybe I'm stupid, and I'm missing something obvious. In that case please explain how you propose to make filesystem operations, like rename() restartable. > > You may not like the fact that one process can cause another to go > > into uninterruptible sleep, but in fact there's nothing wrong with > > that. > > Well, this introduces interdependencies between processes that do not exist > otherwise. Even if that isn't wrong per se, it's something that needs > consideration in any case. > > IMO, FUSE breaks one of the assumptions that the freezer is based on and > saying that the freezer is broken because of that is unfair. The freezer is not broken because of that, it's broken anyway. What we are seeing is a _symptom_ of it being broken. And by broken, I don't mean it's buggy or that it was badly designed. I just mean, that it's simply not what suspend should depend on, to protect drivers. > > So the fact that the freezer can't handle this is unfortunate, but > > it's just a symptom of the brokenness of it, not something that fuse > > introduced. Not being able to suspend with NFS (or other network > > filesystems) when the network is lost shows that this is a deeper > > problem. > > Well, the system that cannot access its filesystems is not in a consistent > state, so it generally is not reasonable to suspend or hibernate it. Saying the system must be in a "consistent" state, and defining consistent as "every process is stopped", is just an arbitrary limitation that fits what the freezer does now. Yes the _hardware_ state must be consistent, but that has nothing to do with either fuse or NFS. Can't you see that? > > As stated otherwise in the thread, suspend2 in fact allowed processes > > to be in uninterruptible sleep instead, without negative side effects. > > And yet, Nigel thinks that the freezer is necessary for the hibernation. > Strange, no? I'm totally ignorant about why the freezer is necessary for hibernate. Please explain. Yes, we need to make sure, that nothing is scheduled during (and possibly after) taking the snapshot. But AFAICS that could be achieved by unplugging all but one CPU. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote: [--snip--] > > I just think that the freezer approach, as it is, is backward. We can't > have a 3rd party try to discriminate what to freeze and what not, it > will always get something wrong, and in some cases with the wrong timing > or ordering. Nice discussion, except for one thing: the freezer doesn't decide what to freeze. For example, even right now kernel threads decide if they want to be frozen. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 06:39, Benjamin Herrenschmidt wrote: > > > In my defense, you should realize that until Rafael's notifier chain > > was added (just a few weeks ago, still not in mainline I believe) there > > was no other way to do it. Plug activity needs to be stopped before > > the child devices are suspended, and the PM core does not send any > > notification to drivers at that time. All it does is activate the > > freezer. > > That's true. That was one of the reason I've always wanted the > pre-suspend and post-resume hooks. (I prefer keeping the ordering there > too, rather than a notifier, but a notifier is fine I suppose). > > Among the clients we want here the firmware stuff, the allocators, > etc... to get themselves in conditions that won't deadlock during > suspend cycle. > > I think that's a much bigger issue overall than freezer vs. no > freezer :-) Yes. The freezer is only one part of the PM infrastructure that we have currently and that is not very sophisticated, so to speak. I'm all for changing this infrastructure, but in an organized way (ie. we discuss what to do next, we do that and then we go to the next step) and in the order that everyone will be comfortable with. So, let's finish this thread and start over from discussing what needs to be done, how (ie. in what order etc.) we are going to do that and who is going to do what. Shall we? Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Sun, Jul 08, 2007 at 08:08:56PM +0200, Rafael J. Wysocki wrote: > Well, the system that cannot access its filesystems is not in a consistent > state, so it generally is not reasonable to suspend or hibernate it. > > In fact, NFS and similar filesystems should always be unmounted before the > suspend/hibernation to avoid problems that may arise after the resume, if > they become unavailable at that time. Ah, thanks for clarifying that. [mentally putting the entire thing into "unusable due to idiotic design imposing idiotic limitations" bin and stepping out of this thread] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 04:24, Alan Stern wrote: > On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote: > > > > Which is exactly my point. It _doesn't_ work fine without a freezer, > > > because the USB stack currently relies on the freezer to prevent plug > > > activity. > > > > Putting on hold plug activity has nothing, NOTHING, to do with the half > > assed piece of deadlocking crap we have now we call a freezer. > > You're wrong; it _does_ have something to do with the freezer. The > connection is that the code uses the freezer to put plug activity on > hold. > > You probably meant to say that it _should_ have nothing to do with the > freezer. That's a different matter. But in any case you should write > what you actually mean, rather than just putting down something as > inflammatory as possible. > > > As long as you guys keep mixing up all the issues and coming up with > > totally bogus solutions that cannot work, we won't have a useful suspend > > (either to RAM or to disk) in linux. > > In my defense, you should realize that until Rafael's notifier chain > was added (just a few weeks ago, still not in mainline I believe) there > was no other way to do it. Plug activity needs to be stopped before > the child devices are suspended, and the PM core does not send any > notification to drivers at that time. All it does is activate the > freezer. Exactly. And if we really want to get rid of the freezer, we need to _prepare_ for that. At least, we need to have an idea of how we will do thing when the freezer is not there. For example, how will we check if the system is not in a state in which the resume is likely to fail? There are more questions like that, IMO, and we should at least have a list of them _before_ we remove the freezer. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sunday, 8 July 2007 02:42, Benjamin Herrenschmidt wrote: > On Sat, 2007-07-07 at 12:17 -0400, Alan Stern wrote: > > On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote: > > > > > > > And guess what ? It's what we do on powerbooks, and it works fine, > > > > > without a freezer :-) > > > > > If you remember, one of the things I've been advocating has always been > > > that we should put on hold all plug activity (unplug might be alright as > > > long as the user events are just delayed) when we start suspending. No > > > new devices, no new bindings. "hub" type devices are respondible for > > > bringing in the new stuff after resume. > > > > Which is exactly my point. It _doesn't_ work fine without a freezer, > > because the USB stack currently relies on the freezer to prevent plug > > activity. > > Putting on hold plug activity has nothing, NOTHING, to do with the half > assed piece of deadlocking crap we have now we call a freezer. Well, no matter how much bad names you will call it, that's what in use right now and it's not _that_ easy to get rid of. I'm perfectly fine with readying drivers etc. for dropping the freezer, but in the meantime bugs in the freezer need to be fixed, if possible. And if it deadlocks, there is a bug in it. > As long as you guys keep mixing up all the issues and coming up with > totally bogus solutions that cannot work, we won't have a useful suspend > (either to RAM or to disk) in linux. Discussion at this level would be totally counterproductive and wouldn't lead to anything constructive, so let me avoid commenting your abusive remarks. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Sunday, 8 July 2007 16:23, Miklos Szeredi wrote: > > > > > > We can just wait for all fuse requests to be serviced before > > > > > > proceeding further with freeze, right? > > > > > > > > > > Right. Nice way to slow down or stop the suspend with an unprivileged > > > > > process. Avoiding that sort of DoS is one of the design goals of > > > > > fuse. > > > > > > > > So you want me to handle _malicious_ filesystems now? > > > > > > What I'd like, is a suspend, that works reliably, regardless of the > > > state of any userspace filesystem, network servers and such. > > > > Well, fix userspace filesystems and maybe NFS. If they react to > > sigstop in timely manner, they will work with suspend properly, too. > > Which is pretty much impossible, given the unix filesystem API. To be > able to react to sigstop, the operations in question need to be > restartable. Which they are not, so they can't react to sigstop. End > of story. Or not. That depends on your willingness to cooperate, I'd say. :-) > > > > That should be easy... :-). You already have nasty deadlocks in FUSE, > > > > and you solve them by "root can echo 1 > abort"... so allow me the > > > > same possibility. > > > > > > > > We can tell fused we are freezing, and if all the requests are not > > > > serviced within, say, 30 seconds, we call the filesystem malicious and > > > > do echo 1 > abort. > > > > > > Arbitrary time limits, nice. Not. > > > > > > This freezer is like an old house that's close to collapsing, and you > > > > Nice way to have useful discussion. Not. > > Sorry, didn't mean to offend you. > > > Look, Linux was not designed with malicious filesystems in mind. In > > particular, suspend was not designed with malicious filesystems in > > mind, and VFS was not designed with malicious filesystems in mind. > > That's why you have the nastiness with deadlocks... which you are > > unable/unwilling to solve because you'd have to redesign VFS and meet > > Al Viro. > > No, I don't think we need to redesign the VFS, and also I think the > malicious filesystem thing has been adequately taken care of in fuse. > > You may not like the fact that one process can cause another to go > into uninterruptible sleep, but in fact there's nothing wrong with > that. Well, this introduces interdependencies between processes that do not exist otherwise. Even if that isn't wrong per se, it's something that needs consideration in any case. IMO, FUSE breaks one of the assumptions that the freezer is based on and saying that the freezer is broken because of that is unfair. > The effects are localized to the mount owner, it won't cause > any system-wide ill, it's in fact no worse than a malicious process > doing ptrace(). > > And as explained above it's unavoidable due to the well established > userspace API. The _point_ of fuse is to let this API be used by > unmodified programs and the filesystem be provided by a userspace > process. After showing me the right way to do this with Podfuk, this > should not come as a surprise to you. > > So the fact that the freezer can't handle this is unfortunate, but > it's just a symptom of the brokenness of it, not something that fuse > introduced. Not being able to suspend with NFS (or other network > filesystems) when the network is lost shows that this is a deeper > problem. Well, the system that cannot access its filesystems is not in a consistent state, so it generally is not reasonable to suspend or hibernate it. In fact, NFS and similar filesystems should always be unmounted before the suspend/hibernation to avoid problems that may arise after the resume, if they become unavailable at that time. > And for some reason you seem not to accept that. You think that the > problem is with fuse, NFS, CIFS, whatever, and not the freezer, when > in fact it's quite clear, that neither of the above should have > anything to do with power management. IMHO, it's not _that_ clear. > Even if it was possible to fix them, it would still be just fixing the > symptoms. I don't think we need to fix the network filesystems. If there's a mounted filesystem and we have at least one process waiting for it to become available in the D state (ie. uninterruptible), then the system is not in a consistent state and should not be suspended. > > Now.. freezer already includes timeout; if some part of kernel knows > > nothing about suspend or had crashed, it will abort after some time, > > telling you which part to blame. Another timeout for detecting > > malicious userspace filesystems will not hurt much in this old house. > > (Maybe you don't want to auto abort them, just tell user what happened. > > > > We are stuck with refrigerator for now, and at least for hibernation, > > I don't see any feasible alternative. > > Even for hibernation, I don't see, why we would need all processes > being effectively in a stopped state. > > As stated otherwise in the thread, suspend2 in fact allowed processes > to b
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Sunday 08 July 2007, Al Viro wrote: > On Sun, Jul 08, 2007 at 12:37:48PM +, Pavel Machek wrote: > > I'm talking malicious _filesystems_ here, and yes, fuse is first of > > this kind. We want to handle unresponding NFS, but I believe handling > > malicious NFS server nicely is slightly out of scope. > > If your variant doesn't handle compromised NFS server, your variant is > needs to be fixed... That would depend on the type of compromise, right? Remember that the fundamental contract between a client and a server includes the client extending some trust to that server. Whether it's appropriate to extend that trust (at any given moment) is an out-of-band security issue. Trust is not a protocol issue ... more like an operational issue, and only slightly an implementation issue. A malicious filesystem could do many things. It could send private data off to someone who wasn't intended to receive that data. It could return falsified data. It could do any variety of things outside the protocol specification... And *MOST* of those would be impractical to defend against in code. Which is why I have such a hard time agreeing with your comment about "fixing" a client that doesn't try to defend itself. (Unless by "client" you also include the whole operational side of the client, including regular re-validation of the trust extended to that server... which would at best minimize damage caused by compromises.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > > > > We can just wait for all fuse requests to be serviced before > > > > > proceeding further with freeze, right? > > > > > > > > Right. Nice way to slow down or stop the suspend with an unprivileged > > > > process. Avoiding that sort of DoS is one of the design goals of > > > > fuse. > > > > > > So you want me to handle _malicious_ filesystems now? > > > > What I'd like, is a suspend, that works reliably, regardless of the > > state of any userspace filesystem, network servers and such. > > Well, fix userspace filesystems and maybe NFS. If they react to > sigstop in timely manner, they will work with suspend properly, too. Which is pretty much impossible, given the unix filesystem API. To be able to react to sigstop, the operations in question need to be restartable. Which they are not, so they can't react to sigstop. End of story. > > > That should be easy... :-). You already have nasty deadlocks in FUSE, > > > and you solve them by "root can echo 1 > abort"... so allow me the > > > same possibility. > > > > > > We can tell fused we are freezing, and if all the requests are not > > > serviced within, say, 30 seconds, we call the filesystem malicious and > > > do echo 1 > abort. > > > > Arbitrary time limits, nice. Not. > > > > This freezer is like an old house that's close to collapsing, and you > > Nice way to have useful discussion. Not. Sorry, didn't mean to offend you. > Look, Linux was not designed with malicious filesystems in mind. In > particular, suspend was not designed with malicious filesystems in > mind, and VFS was not designed with malicious filesystems in mind. > That's why you have the nastiness with deadlocks... which you are > unable/unwilling to solve because you'd have to redesign VFS and meet > Al Viro. No, I don't think we need to redesign the VFS, and also I think the malicious filesystem thing has been adequately taken care of in fuse. You may not like the fact that one process can cause another to go into uninterruptible sleep, but in fact there's nothing wrong with that. The effects are localized to the mount owner, it won't cause any system-wide ill, it's in fact no worse than a malicious process doing ptrace(). And as explained above it's unavoidable due to the well established userspace API. The _point_ of fuse is to let this API be used by unmodified programs and the filesystem be provided by a userspace process. After showing me the right way to do this with Podfuk, this should not come as a surprise to you. So the fact that the freezer can't handle this is unfortunate, but it's just a symptom of the brokenness of it, not something that fuse introduced. Not being able to suspend with NFS (or other network filesystems) when the network is lost shows that this is a deeper problem. And for some reason you seem not to accept that. You think that the problem is with fuse, NFS, CIFS, whatever, and not the freezer, when in fact it's quite clear, that neither of the above should have anything to do with power management. Even if it was possible to fix them, it would still be just fixing the symptoms. > Now.. freezer already includes timeout; if some part of kernel knows > nothing about suspend or had crashed, it will abort after some time, > telling you which part to blame. Another timeout for detecting > malicious userspace filesystems will not hurt much in this old house. > (Maybe you don't want to auto abort them, just tell user what happened. > > We are stuck with refrigerator for now, and at least for hibernation, > I don't see any feasible alternative. Even for hibernation, I don't see, why we would need all processes being effectively in a stopped state. As stated otherwise in the thread, suspend2 in fact allowed processes to be in uninterruptible sleep instead, without negative side effects. The current freezer does too much for suspend and for hibernate too. > > Malicious programs are not something specific to fuse. A lot of the > > multiuser/multitasking OS design is about isolating things, so such a > > program is limited in the damage it can do. > > I'm talking malicious _filesystems_ here, and yes, fuse is first of > this kind. We want to handle unresponding NFS, but I believe handling > malicious NFS server nicely is slightly out of scope. It doesn't have to be malicious, it's enough if the server crashes, or the network connection is lost. Do we want to maintain the status quo, just because we can live with it? > > > Not nice, but we don't know any better for now. "Just fix all the > > > drivers" basically means "just fix 90% of kernel". > > > > And how much of that 90% currently has any power management? > > Anything that's used in today's notebooks, I'd say... Which is what, 10% of all the drivers? Then it really not as bad as you try to make it sound. And with the late suspend call (whatever it does) that can take care of most of those, it really becomes just a few drivers and subsystems to fix. Miklos - To uns
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Sun, Jul 08, 2007 at 12:37:48PM +, Pavel Machek wrote: > I'm talking malicious _filesystems_ here, and yes, fuse is first of > this kind. We want to handle unresponding NFS, but I believe handling > malicious NFS server nicely is slightly out of scope. If your variant doesn't handle compromised NFS server, your variant is needs to be fixed... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
On Sunday, 8 July 2007 09:21, Miklos Szeredi wrote: > > > > We can just wait for all fuse requests to be serviced before > > > > proceeding further with freeze, right? > > > > > > Right. Nice way to slow down or stop the suspend with an unprivileged > > > process. Avoiding that sort of DoS is one of the design goals of > > > fuse. > > > > So you want me to handle _malicious_ filesystems now? > > What I'd like, is a suspend, that works reliably, regardless of the > state of any userspace filesystem, network servers and such. And then you also would like to have a consistent state of the system after the resume, but if the state were inconsistent before the suspend that would be difficult to get. > > That should be easy... :-). You already have nasty deadlocks in FUSE, > > and you solve them by "root can echo 1 > abort"... so allow me the > > same possibility. > > > > We can tell fused we are freezing, and if all the requests are not > > serviced within, say, 30 seconds, we call the filesystem malicious and > > do echo 1 > abort. > > Arbitrary time limits, nice. Not. > > This freezer is like an old house that's close to collapsing, and you > are basically just thinking of where to prop it up further. To > continute this brilliant analogy, Rafael's patch at least demolishes > the worst part of the house, where bricks are already falling on our > head ;) Actually, this isn't correct and I have some testing problems with this patch (as I expected, BTW), so it's not going anywhere. Alternative ideas will be appreciated. :-) > > Not ideal, but neither is allowing malicious filesystems in the first > > place... > > Malicious programs are not something specific to fuse. A lot of the > multiuser/multitasking OS design is about isolating things, so such a > program is limited in the damage it can do. > > > > Look at it this way: the task of the freezer is to stop new I/O > > > hitting the hardware. But it is totally indiscriminate about what it > > > stops, it tries to stop _everything_ even things which have nothing to > > > do with hardware. > > > > > > Not nice. > > > > Not nice, but we don't know any better for now. "Just fix all the > > drivers" basically means "just fix 90% of kernel". > > And how much of that 90% currently has any power management? Well, you know why that actually started to be a real problem? Because sufficiently many people have started to use suspend/hibernation and it actually works for quite a lot of them. This makes people to try exotic combinations like suspend+FUSE and they find that doesn't work. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Hi! > > > > We can just wait for all fuse requests to be serviced before > > > > proceeding further with freeze, right? > > > > > > Right. Nice way to slow down or stop the suspend with an unprivileged > > > process. Avoiding that sort of DoS is one of the design goals of > > > fuse. > > > > So you want me to handle _malicious_ filesystems now? > > What I'd like, is a suspend, that works reliably, regardless of the > state of any userspace filesystem, network servers and such. Well, fix userspace filesystems and maybe NFS. If they react to sigstop in timely manner, they will work with suspend properly, too. > > That should be easy... :-). You already have nasty deadlocks in FUSE, > > and you solve them by "root can echo 1 > abort"... so allow me the > > same possibility. > > > > We can tell fused we are freezing, and if all the requests are not > > serviced within, say, 30 seconds, we call the filesystem malicious and > > do echo 1 > abort. > > Arbitrary time limits, nice. Not. > > This freezer is like an old house that's close to collapsing, and you Nice way to have useful discussion. Not. Look, Linux was not designed with malicious filesystems in mind. In particular, suspend was not designed with malicious filesystems in mind, and VFS was not designed with malicious filesystems in mind. That's why you have the nastiness with deadlocks... which you are unable/unwilling to solve because you'd have to redesign VFS and meet Al Viro. Now.. freezer already includes timeout; if some part of kernel knows nothing about suspend or had crashed, it will abort after some time, telling you which part to blame. Another timeout for detecting malicious userspace filesystems will not hurt much in this old house. (Maybe you don't want to auto abort them, just tell user what happened. We are stuck with refrigerator for now, and at least for hibernation, I don't see any feasible alternative. And now, can we get that sysrq-t trace? Dealing with 'unable to freeze' may be hard, but the deadlock Matthew saw should be simple bug for us to fix. > > Not ideal, but neither is allowing malicious filesystems in the first > > place... > > Malicious programs are not something specific to fuse. A lot of the > multiuser/multitasking OS design is about isolating things, so such a > program is limited in the damage it can do. I'm talking malicious _filesystems_ here, and yes, fuse is first of this kind. We want to handle unresponding NFS, but I believe handling malicious NFS server nicely is slightly out of scope. > > Not nice, but we don't know any better for now. "Just fix all the > > drivers" basically means "just fix 90% of kernel". > > And how much of that 90% currently has any power management? Anything that's used in today's notebooks, I'd say... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Off topically about Re: malicious filesystems (was Re: Re: [PATCH] Remove process freezer from suspend to RAM pathway)
>> > > We can just wait for all fuse requests to be serviced before >> > > proceeding further with freeze, right? >> > >> > Right. Nice way to slow down or stop the suspend with an unprivileged >> > process. Avoiding that sort of DoS is one of the design goals of >> > fuse. >> >> So you want me to handle _malicious_ filesystems now? > > What I'd like, is a suspend, that works reliably, regardless of the > state of any userspace filesystem, network servers and such. And this must be called sleep, deep sleep or something like that. Because preparing for sleeping takes place, sleeping itself is a process, life requires. Waking up, staying all day fresh and active is the needed goal. Thus, whole freeze/ice idea was dead-born, IMHO :) WRT this, busy-loop _waiting_ that was called sleep, usleep or nanosleep is another wrong-named concept. I imagine somebody after one minute sleep :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway (philosophical)
* Alan Stern >> Why are you guys working so hard and spending so much energy to try to >> avoid doing the right thing is beyond my understanding... >> >> > It _does_ apply to kernel threads. That's exactly why I wrote above >> > that kernel threads which try to do I/O during a suspend will need >> > extra attention. >> >> Ok none at all if you don't have a freezer. > > In answer to both questions: We need the freezer in order to Non of *complex* living creatures survive motionless freezing (with ice). That's why stupid me just don't care about all that stuff in the Linux. It's nice to see somebody approaches this from technical, but not off-topical POV :). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > > We can just wait for all fuse requests to be serviced before > > > proceeding further with freeze, right? > > > > Right. Nice way to slow down or stop the suspend with an unprivileged > > process. Avoiding that sort of DoS is one of the design goals of > > fuse. > > So you want me to handle _malicious_ filesystems now? What I'd like, is a suspend, that works reliably, regardless of the state of any userspace filesystem, network servers and such. > That should be easy... :-). You already have nasty deadlocks in FUSE, > and you solve them by "root can echo 1 > abort"... so allow me the > same possibility. > > We can tell fused we are freezing, and if all the requests are not > serviced within, say, 30 seconds, we call the filesystem malicious and > do echo 1 > abort. Arbitrary time limits, nice. Not. This freezer is like an old house that's close to collapsing, and you are basically just thinking of where to prop it up further. To continute this brilliant analogy, Rafael's patch at least demolishes the worst part of the house, where bricks are already falling on our head ;) > Not ideal, but neither is allowing malicious filesystems in the first > place... Malicious programs are not something specific to fuse. A lot of the multiuser/multitasking OS design is about isolating things, so such a program is limited in the damage it can do. > > Look at it this way: the task of the freezer is to stop new I/O > > hitting the hardware. But it is totally indiscriminate about what it > > stops, it tries to stop _everything_ even things which have nothing to > > do with hardware. > > > > Not nice. > > Not nice, but we don't know any better for now. "Just fix all the > drivers" basically means "just fix 90% of kernel". And how much of that 90% currently has any power management? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Alan Stern writes: > In answer to both questions: We need the freezer in order to implement > hibernate. Even if we take your advice and stop using the freezer > during suspend, these issues would still remain and would need to be > solved. Stepping back for a minute, let's think about what the freezer is trying to achieve. I think that currently there are three basic design goals: A. Ensure that device drivers don't get I/O requests after being suspended. B. Ensure that driver suspend routines don't end up blocking forever on mutexes or semaphores held by frozen tasks. C. Provide a way to get an atomic snapshot of memory for hibernation. Now, it's easy enough to freeze all processes (or all except one), if you don't have goal B. Just offline non-boot cpus and disable interrupts, or use stop_machine(). But goal B implies that you can't necessarily just stop all tasks wherever they are. In fact it means there are points where it is safe to stop a given task, and there may be points where it isn't safe to stop it - and there is no practical way to determine those points reliably.[1] That implies to me that we can have a freezer as long as we do nothing that can sleep, while tasks are frozen. In other words, I think the freezer is a viable option for hibernation as long as we restrict driver hibernate routines to doing only things which don't sleep. I _think_ that should be doable since hibernate routines only need to wait for outstanding DMAs to complete, as I understand it, which can be done by polling. Paul. [1] For a start, there's no way to determine which mutexes a task holds. Even if there were, we would then also have to know which mutexes it is going to try to acquire before it releases the one we want, which is pretty much unknowable. One can say "we'll only freeze kernel tasks that ask to be frozen" but then one has no way to guarantee A, and we still don't reliably guarantee B if we have user-level filesystems or device drivers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> I think it's a fairly significant change from the current freezer and I > also think it's a very good idea. The more I think about it, the more I > like it, in the sense that it's a simple drop-in that you could put in a > lot of the ioctl path of drivers to just block tasks that are trying to > call in while suspending, and could be used selectively by things like > the USB hub threads. Note that we could have also a per-device "icebox" (just a waitqueue). Might be nice for a device to resume whatever it froze just after it resumed itself, might even be necessary in case whatever thread got frozen is necessary for handling child devices. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> You agree that drivers need to block various activities during suspend. > Principally I/O requests, but other things as well. So when one of > these requests arrives, the driver has to make it wait somehow and then > has to allow it to proceed at the appropriate time. Yes. The main thing I see here is that here is nothing common among drivers in what a "request" is and how it's processed. For example, for block drivers, it's actually fairly simple to just stop processing the request queue and wait for pending ones to complete. For network, in a similar vein, we can mostly just tell the network stack to stop sending us packets. That's what I call the "main path". This is often the trivial part to deal with, mostly because for a whole lot of drivers, it can be done via a couple of helpers in the subsystem that the driver provides a service too, via a helper, asking that subsystem to stop calling into the said driver (the asking should be done by the driver itself of course, for ordering reasons). We have some helpers, but I think not enough, and that's where we should focus imho. For example, I added fb_set_suspend() so that fbdev's can request fbcon to stop accessing them (it doesn't solve the problem of userland mmap's, that will have to be done, if we want to do it, in a more sneaky way, using VM tricks, but the DRM nowadays has the infrastructure to do it). But that's only the "main" path. Aside for that, almost all drivers also have sideband "request" input and some driver don't actually live behind a subsystem. That ranges from ioctl, to direct read/write on a char dev from userland. I think many of those cases can fairly well deal with just taking a PM semaphore, that's how I did for a couple of things in the past, provided that the request path isn't deadlocking with the semaphore held because of the system suspending of course. But in a whole lot of cases, it's, I beleive, perfectly kosher to just return an error. You're trying to capture frame from your camera while the machine is suspended ? error. At worst, your capture app will be unhappy when you wakeup, nothing terrible and totally fixable in userland if it's a problem. In some cases, we could use a little bit more help from the subsystem. Network for example, could have some explicit knowledge of the suspend state, and in addition to stopping the queue would also stop calling into things like change_mtu or set_multicast, provided it's agreed that the driver will account for those changes on resume (the actual MTU values or multicast lists are still updated in the netdev). > Normally a waitqueue or a struct completion would be used for this > purpose. I think there's no "normal" scenario, each driver or family of drivers will do things very differently. > But either one puts the burden on the driver of defining a > data structure and signalling it at the right time. Yes. > That time is generally when the device is resumed, but there's nothing > wrong with delaying it slightly, to after all the devices have been resumed > (i.e., > the time when the current PM code takes everything out of the freezer). In fact, the best is to have parallel suspend/resume of drivers and asynchronous resume but that's out of topic :-) (For the record, I did some bits like ADB resume like that, that is asynchronous, to speed up wakeup time). > > In fact, we definitely don't want to unblock plug events until this > later time. There are two things I believe. There's a generic issue with usermode helpers that make no sense to call between pre-suspend and post-resume, and there's the specific issue of adding/removing devices. I believe that "bus" drivers such as USB should indeed get a first round of notifications to tell them to stop performing bus plug/unplug operations (it's debatable whether we want to keep unplug going provided we can stack up the usermode events and re-send them later though, but let's say no for the sake of simplicity). > So instead, why not have the PM core take care of all this? There > could be a block_task_until_suspend_is_over() routine available for all > drivers to use. Its effect would be exactly the same as sending the > current task into the freezer, but it wouldn't be the freezer that > exists now. It would just be some routine that blocks until the system > suspend is over. We could call it "the icebox" instead of "the > freezer". :-) I'm not totally sure about that. I like some of it, but I think it's fairly different conceptually from the freezer (and the implementation could be as trivial as a single system wide wait queue). Basically it has a very big difference to the current freezer, and I like that, which is that we don't have some 3rd party trying to find out what to freeze and what not (the freezer), but instead, we have explicitely drivers or kernel threads sending -themselves- to the "icebox" when they think it's a good idea. Think of it as lazy freezing -> you only free
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> In my defense, you should realize that until Rafael's notifier chain > was added (just a few weeks ago, still not in mainline I believe) there > was no other way to do it. Plug activity needs to be stopped before > the child devices are suspended, and the PM core does not send any > notification to drivers at that time. All it does is activate the > freezer. That's true. That was one of the reason I've always wanted the pre-suspend and post-resume hooks. (I prefer keeping the ordering there too, rather than a notifier, but a notifier is fine I suppose). Among the clients we want here the firmware stuff, the allocators, etc... to get themselves in conditions that won't deadlock during suspend cycle. I think that's a much bigger issue overall than freezer vs. no freezer :-) Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote: > > (And rather than trying to manage a waitqueue or struct completion, it > > would be easiest to jump directly into the freezer! The driver or the > > core wouldn't have to worry about waking up all these blocked threads.) > > That's wrong. The freezer is NOT a solution for that sort of thing. Just > because you guys can't get your locking right. You misunderstood. Stop trying to incite riot, calm down, and pay attention to what I actually wrote. I'll explain it again in more explicit terms: You agree that drivers need to block various activities during suspend. Principally I/O requests, but other things as well. So when one of these requests arrives, the driver has to make it wait somehow and then has to allow it to proceed at the appropriate time. Normally a waitqueue or a struct completion would be used for this purpose. But either one puts the burden on the driver of defining a data structure and signalling it at the right time. That time is generally when the device is resumed, but there's nothing wrong with delaying it slightly, to after all the devices have been resumed (i.e., the time when the current PM code takes everything out of the freezer). In fact, we definitely don't want to unblock plug events until this later time. So instead, why not have the PM core take care of all this? There could be a block_task_until_suspend_is_over() routine available for all drivers to use. Its effect would be exactly the same as sending the current task into the freezer, but it wouldn't be the freezer that exists now. It would just be some routine that blocks until the system suspend is over. We could call it "the icebox" instead of "the freezer". :-) Does that make you happier? > > > Face it, we should seriously look into doing suspend/resume without a > > > freezer. > > > > I'm willing to try, although I think it will be a tremendous amount of > > work to verify that every driver does the right thing. There's lots of > > support missing. For example, don't you think we should block all > > sysfs I/O during suspend? And likewise for insmod/rmmod? > > sysfs is a matter of driver. If a sysfs read/write callback in a driver > is hitting the HW, it most certainly already has some kind of locking. > That locking can/should be extended to deal with blocking when the HW is > suspended. User tasks can cause driver binding by writing to sysfs. Binding _can't_ be blocked in the driver; by then it's already too late. If it is going to be blocked at all, it has to be blocked earlier. One possibility is in the sysfs attribute code; another is to block all sysfs access. Of course, another possibility is simply to fail the bind. But that's not very satisfying, since suspends should be transparent. > However, since it seems that people universally consider it very hard to > get right (I don't but heh), Linus and Paul have come up with a solution > for most simple enough directly-mapped drivers such as PCI (ok, that > doesn't include USB) which is to simply do the HW suspend in a late > callback after IRQs are off, and not bother with the rest. Ben, you haven't given enough thought to the work needed to avoid locking problems. For instance, you agree that during suspend we must not allow device or driver registration or unregistration, right? And we must not allow driver binding or unbinding. But these events generally involve acquiring a device semaphore, in the driver core and quite often in the core's caller. Since that semaphore is also needed for calling the suspend and resume methods, we have to be very careful about blocking binding/unbinding/registration/unregistration. It has to be done at a time when no device semaphores are held. You also agree that kernel threads and workqueues must be allowed to operate during suspend. But consider this: By writing the appropriate sysfs attribute, a user task can cause a workqueue item to be queued to keventd that tries to unregister a device. That really puts you on the spot: Unregistration can't be allowed to fail, it can't be allowed to succeed during a suspend, and keventd can't be blocked! So what should we do? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote: > > Which is exactly my point. It _doesn't_ work fine without a freezer, > > because the USB stack currently relies on the freezer to prevent plug > > activity. > > Putting on hold plug activity has nothing, NOTHING, to do with the half > assed piece of deadlocking crap we have now we call a freezer. You're wrong; it _does_ have something to do with the freezer. The connection is that the code uses the freezer to put plug activity on hold. You probably meant to say that it _should_ have nothing to do with the freezer. That's a different matter. But in any case you should write what you actually mean, rather than just putting down something as inflammatory as possible. > As long as you guys keep mixing up all the issues and coming up with > totally bogus solutions that cannot work, we won't have a useful suspend > (either to RAM or to disk) in linux. In my defense, you should realize that until Rafael's notifier chain was added (just a few weeks ago, still not in mainline I believe) there was no other way to do it. Plug activity needs to be stopped before the child devices are suspended, and the PM core does not send any notification to drivers at that time. All it does is activate the freezer. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
> > Well... 2 things here. Either you have a freezer in which case the > > chances of the above scenario are increased, > > How so? :-) I meant you have a freezer that freezes uninterruptible tasks. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> Spinning in the driver with the lock not held is impossible, since the > driver is called with the lock already acquired. > > Failing with -ERETRY is non-transparent. I would prefer to block such > requests at their source, before the lock is acquired. Perhaps in the > driver core, perhaps even earlier. > > (And rather than trying to manage a waitqueue or struct completion, it > would be easiest to jump directly into the freezer! The driver or the > core wouldn't have to worry about waking up all these blocked threads.) That's wrong. The freezer is NOT a solution for that sort of thing. Just because you guys can't get your locking right. > You're missing the point. If the driver and the freezer are both > solid, there's no reason they can't share the work. If many drivers > can pass off part of their workload to the single freezer, it's a net > win. I've explained already multiple times that the freezer will not do what you guys expect it to do. IOs can be submited at non-task time and there is no clear distinction between IO generating threads that must and those that must not be frozen. I really can't understand why you guys work so hard at trying to avoid the right solutions systematically. > So it isn't a question of how solid the drivers are; it's a question > of how solid the freezer is. And bear in mind that if you convince > people the freezer is not solid enough to be used, then you will have > to find an alternative for purposes of hibernation. Because I'm intimately convinced that the freezer is a wrong approach that cannot be made solid enough. > > The freezer is a flawed concept in the first place. > > <... Long and cogent argument which I will skip over for now ...> Too bad, that's where the interesting points that show that the freezer cannot work are.. > > Face it, we should seriously look into doing suspend/resume without a > > freezer. > > I'm willing to try, although I think it will be a tremendous amount of > work to verify that every driver does the right thing. There's lots of > support missing. For example, don't you think we should block all > sysfs I/O during suspend? And likewise for insmod/rmmod? sysfs is a matter of driver. If a sysfs read/write callback in a driver is hitting the HW, it most certainly already has some kind of locking. That locking can/should be extended to deal with blocking when the HW is suspended. However, since it seems that people universally consider it very hard to get right (I don't but heh), Linus and Paul have come up with a solution for most simple enough directly-mapped drivers such as PCI (ok, that doesn't include USB) which is to simply do the HW suspend in a late callback after IRQs are off, and not bother with the rest. > > I even tend to think that we could do STD that way too, in > > fact, while Linus is right saying it's a different problem than STR, we > > could even probably re-use some of the STR infrastructure in some > > hackish way, still without a freezer. We could have ways to block page > > cache writeout, for example, to prevent new post-snapshot dirty data > > from hitting the platter, and use direct BIOs for writeout. That's just > > an example. > > What about systems with no BIOS? I think this would be very hard or > even impossible to make work. You made the same mistake I did when reading Nigel's mail ... BIOs -> Block IO requests, not BIOS :-) Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sat, 2007-07-07 at 12:17 -0400, Alan Stern wrote: > On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote: > > > > > And guess what ? It's what we do on powerbooks, and it works fine, > > > > without a freezer :-) > > > If you remember, one of the things I've been advocating has always been > > that we should put on hold all plug activity (unplug might be alright as > > long as the user events are just delayed) when we start suspending. No > > new devices, no new bindings. "hub" type devices are respondible for > > bringing in the new stuff after resume. > > Which is exactly my point. It _doesn't_ work fine without a freezer, > because the USB stack currently relies on the freezer to prevent plug > activity. Putting on hold plug activity has nothing, NOTHING, to do with the half assed piece of deadlocking crap we have now we call a freezer. As long as you guys keep mixing up all the issues and coming up with totally bogus solutions that cannot work, we won't have a useful suspend (either to RAM or to disk) in linux. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sat, 2007-07-07 at 13:49 +0200, Pavel Machek wrote: > And anyway I believe that current issue (fuse deadlocks with s2ram) > should be present on powerbooks, too... it is just way harder to > trigger. All that is neccessary is fused (or one of its helpers) to > get frozen by accessing suspended device. I don't see any fuse specific issue without a freezer, but I see generic issues such as kmalloc blocking in a driver with a mutex held etc... but I've talked about these already. They have nothing to do with the freezer though. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Hi! > > We can just wait for all fuse requests to be serviced before > > proceeding further with freeze, right? > > Right. Nice way to slow down or stop the suspend with an unprivileged > process. Avoiding that sort of DoS is one of the design goals of > fuse. So you want me to handle _malicious_ filesystems now? That should be easy... :-). You already have nasty deadlocks in FUSE, and you solve them by "root can echo 1 > abort"... so allow me the same possibility. We can tell fused we are freezing, and if all the requests are not serviced within, say, 30 seconds, we call the filesystem malicious and do echo 1 > abort. Not ideal, but neither is allowing malicious filesystems in the first place... > Look at it this way: the task of the freezer is to stop new I/O > hitting the hardware. But it is totally indiscriminate about what it > stops, it tries to stop _everything_ even things which have nothing to > do with hardware. > > Not nice. Not nice, but we don't know any better for now. "Just fix all the drivers" basically means "just fix 90% of kernel". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: problem 1 (was Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway))
On Saturday, 7 July 2007 14:08, Pavel Machek wrote: > Hi! > > > > Now, if kernel needs FUSE services for some reason (that's the problem > > > we hit in s2ram case, right?), we have a deadlock. > > > > > > So main problem still seems to be "kernel should not depend on > > > userland services during suspend", refrigerator or not. > > > > And also "Userland should not depend on userland services", which is > > rather more of a problem. > > No, that's not a problem. Or rather, that's different problem, called > "problem 2" (fuse causes freezer to fail to stop processes). > > But we still have "problem 1" here: after devices are suspended, > kernel tries to use fuse's services. That is not going to work, one > way or another, because devices are suspended and userland can't work > reliably. > > (Aha, it _may_ be it is kernel tries to use fuse's services after > freezing userland but before freezing devices. I don't think it is). > > To solve "problem 1", we need to know which part of kernel asks for > fuse services. sysrq-t trace is likely to tell us. Can someone repeat > the "problem 1" scenario (freezer succeeds but then it deadlocks), and > produce sysrq-t trace? That way we can solve "problem 1". Well, such a trace would be helpful in any case. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
> We can just wait for all fuse requests to be serviced before > proceeding further with freeze, right? Right. Nice way to slow down or stop the suspend with an unprivileged process. Avoiding that sort of DoS is one of the design goals of fuse. Look at it this way: the task of the freezer is to stop new I/O hitting the hardware. But it is totally indiscriminate about what it stops, it tries to stop _everything_ even things which have nothing to do with hardware. Not nice. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Saturday, 7 July 2007 04:44, Benjamin Herrenschmidt wrote: > On Fri, 2007-07-06 at 11:31 +0200, Oliver Neukum wrote: > > Am Freitag, 6. Juli 2007 schrieb Benjamin Herrenschmidt: > > > On Fri, 2007-07-06 at 09:13 +0200, Rafael J. Wysocki wrote: > > > > > > > > The only reason (I know of) why we don't handle uninterruptible tasks > > > > in the > > > > freezer is that we're afraid of the suspend process deadlocking with an > > > > uninterruptible task holding a lock, but AFAICS the probability of such > > > > an > > > > event is extremely small. > > > > > > What would deadlock specifically ? One of the drivers trying to acquire > > > that lock ? It would be a driver bug then. > > > > Your driver's write method looks like: > > > > mutex_lock(); > > poke_some_hardware(); > > wait_event_uninterruptible(); //for result > > res = evaluate_result(); > > mutex_unlock(); > > return res; > > > > If you put a task into the refrigerator at wait_event_interruptible() > > you will deadlock if you need this lock for the driver to go to suspend. > > The suspend method then must not take the lock _and_ it must be > > aware that there may be an ongoing operation. > > Well... 2 things here. Either you have a freezer in which case the > chances of the above scenario are increased, How so? :-) > or you don't, in which case > your suspend method will just sleep on the lock until outstanding HW > accesses that have that lock are completed, and everything is fine. > > You need to be careful with one thing though, whether you have a freezer > or not. If you driver, in some code path, whatever it is (ioctl, kernel > thread, workqueue, ...) does something like: > > mutex_lock > kmalloc(...,GFP_KERNEL); > mutex_unlock > > And it's suspend callback then does: > > mutex_lock > > The problem here is that the disks might already have been suspended > prior to your driver being called. Thus, any attempt at pushing things > out to swap or dirty mmap'ings back to storage will hang, thus kmalloc > can potentially hang (afaik), and you will deadlock. > > That's what I've been talking about earlier when I said that we should > have some security in SLAB/SLUB/Buddy allocators, to silently turn > GFP_KERNEL to at least GFP_NOIO or even ATOMIC before we start > suspending drivers. > > Now, another way to deal with that would have to use > pre-suspend/post-resume notifications, and have drivers avoid doing the > above between those, but that's much harder. (Essentially, drivers would > have to either make sure they don't do things like blocking allocations, > even implicitely, or possibly fall back to a degraded synchronous mode > or that sort of thing). > > I think it's much simpler to tweak slab/slub/buddy instead :-) > > Note that the above issue is orthogonal to our freezer discussion, it's > just one of the potential deadlock cause we have with suspend that needs > to be fixed. Agreed. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
> > One task doing ptrace() can basically do whatever it wants with the > > task being traced. This is not an exact analogy to what fuse does, > > but close. > > Well, IMO userland tasks should not have power to grab VFS mutexes for > indefinite ammount of time. ("fused is allowed to deadlock kernel, in > a way only write to special file helps" is ugly). Unfortunately, I > don't think there's a way to work around that deadlock within fuse > design limits... (coda was able to get around it by working on whole > files granularity, AFAICT), so we'll have to live with that. That's just file I/O. You can easily deadlock coda with any other file operation. In fact coda is _less_ robust wrt a misbehaving userspace server than fuse by a big margin. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote: > > > so the suspend process will wait for it. When binding > > > is done the suspend_device() code will take the device lock and tell > > > everything else to postpone further bind requests as above. > > > > My question referred to drivers trying to bind or unbind a device > > _after_ the device has been suspended. I suppose you'll say that's > > covered by the NO_BIND flag. But now we have the locking problem > > mentioned above: The thread trying to bind is holding a lock which is > > needed for resuming. > > Why would it ? Just make it fail, maybe with some kind of -ERETRY... Or > it can spin with the lock not held if it want. That's a detail really. Spinning in the driver with the lock not held is impossible, since the driver is called with the lock already acquired. Failing with -ERETRY is non-transparent. I would prefer to block such requests at their source, before the lock is acquired. Perhaps in the driver core, perhaps even earlier. (And rather than trying to manage a waitqueue or struct completion, it would be easiest to jump directly into the freezer! The driver or the core wouldn't have to worry about waking up all these blocked threads.) > > As one of the people responsible for the USB power management > > implementation, I would appreciate more details about this. For > > example, a dmesg log with CONFIG_USB_DEBUG turned on together with a > > complete description of the actions you took to provoke the bug. > > > > (I wonder how much of this "buginess" is caused by the lack of the > > freezer in PPC.) > > No. The freezer will hide some of those problems under the carpet, but > not solve the basic issue which is the driver should be solid. Period. You're missing the point. If the driver and the freezer are both solid, there's no reason they can't share the work. If many drivers can pass off part of their workload to the single freezer, it's a net win. So it isn't a question of how solid the drivers are; it's a question of how solid the freezer is. And bear in mind that if you convince people the freezer is not solid enough to be used, then you will have to find an alternative for purposes of hibernation. > The freezer is a flawed concept in the first place. <... Long and cogent argument which I will skip over for now ...> > Face it, we should seriously look into doing suspend/resume without a > freezer. I'm willing to try, although I think it will be a tremendous amount of work to verify that every driver does the right thing. There's lots of support missing. For example, don't you think we should block all sysfs I/O during suspend? And likewise for insmod/rmmod? > I even tend to think that we could do STD that way too, in > fact, while Linus is right saying it's a different problem than STR, we > could even probably re-use some of the STR infrastructure in some > hackish way, still without a freezer. We could have ways to block page > cache writeout, for example, to prevent new post-snapshot dirty data > from hitting the platter, and use direct BIOs for writeout. That's just > an example. What about systems with no BIOS? I think this would be very hard or even impossible to make work. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote: > > > And guess what ? It's what we do on powerbooks, and it works fine, > > > without a freezer :-) > If you remember, one of the things I've been advocating has always been > that we should put on hold all plug activity (unplug might be alright as > long as the user events are just delayed) when we start suspending. No > new devices, no new bindings. "hub" type devices are respondible for > bringing in the new stuff after resume. Which is exactly my point. It _doesn't_ work fine without a freezer, because the USB stack currently relies on the freezer to prevent plug activity. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Fri 2007-07-06 09:07:38, Miklos Szeredi wrote: > > > Actually fuse allows SIGKILL, because it's always fatal, and the > > > syscall may not be restarted. > > > > Okay, and you should handle refrigerator in the same paths where you > > handle SIGKILL. Just add try_to_freeze() there... > > It's the fourth time I'm repeating this in this thread: > > Yes adding try_to_freeze() there would partially solve the probelem. > > But another task can be sleeping on a mutex held by the task waiting > for the reply. And the freezer won't be able to handle that one. > > Generally, calling try_to_freeze() with mutexes held is not a good > idea. Agreed, calling try_to_freeze() with mutex held is no-no, and it is even documented somewhere. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > > Actually fuse allows SIGKILL, because it's always fatal, and the > > > syscall may not be restarted. > > > > I think you want to stick try_to_freeze() at the same places where you > > do SIGKILL handling. That should solve the 'syslogd is unfreezeable' > > problem. > > I could, but it would not solve the general problem. Namely, that the > presence of fuse imposes a certain ordering in which userspace tasks > have to be frozen. And it is not possible to know this ordering. We can just wait for all fuse requests to be serviced before proceeding further with freeze, right? > And even if the ordering were solved, the freezer would still not work > if the filesystem is not responding due to external events, such as a > lost network (this affects NFS, CIFS, whatever just the same as > fuse). That's ok, you can't suspend if your hdd is dead, and in the same way you can't suspend if your NFS server is dead. I agree it is ugly, but we seem to live ok with that. We could (and should?) handle that, probably by realizing that NFS is not a disk and using interruptible sleep, but... > > Plus, it would be nice to find out where suspend/hibernation is > > triggering fuse activity. We can then decide where to fix it -- in > > fuse or in suspend parts. You said sys_sync is not implemented... so > > where is the problem? > > I cannot say without having a sysrq-t of the situation. Yes please. Can someone affected please produce sysrq-t? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Thu 2007-07-05 10:15:01, Paul Mackerras wrote: > Rafael J. Wysocki writes: > > > This is incompatible with the code in kernel/power/main.c, since we only > > disable the nonboot CPUs after devices have been suspended. Do you think > > that > > your framework can be modified to work without disabling the nonboot CPUs > > by the user space? > > Sure. It was a "if it can be done in userspace, do it in userspace" > kind of decision, but I'm not wedded to it. > > I actually do want to converge to using the generic suspend-to-ram > code on powerbooks. I just want to avoid causing regressions for > powerbook users, including myself. :) Curious, do you actually use fuse? Can you try it _with_ freezer and produce sysrq-t trace of deadlock? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
problem 1 (was Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway))
Hi! > > Now, if kernel needs FUSE services for some reason (that's the problem > > we hit in s2ram case, right?), we have a deadlock. > > > > So main problem still seems to be "kernel should not depend on > > userland services during suspend", refrigerator or not. > > And also "Userland should not depend on userland services", which is > rather more of a problem. No, that's not a problem. Or rather, that's different problem, called "problem 2" (fuse causes freezer to fail to stop processes). But we still have "problem 1" here: after devices are suspended, kernel tries to use fuse's services. That is not going to work, one way or another, because devices are suspended and userland can't work reliably. (Aha, it _may_ be it is kernel tries to use fuse's services after freezing userland but before freezing devices. I don't think it is). To solve "problem 1", we need to know which part of kernel asks for fuse services. sysrq-t trace is likely to tell us. Can someone repeat the "problem 1" scenario (freezer succeeds but then it deadlocks), and produce sysrq-t trace? That way we can solve "problem 1". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Hi! > > > > You have processes that don't react to signals, because some > > > > other user land task is misbehaving. I'd call that ugly at the > > > > very least. > > > > > > It already happens with, say, NFS. Don't think about it in terms of a > > > userland task misbehaving - think of it in terms of a resource becoming > > > unavailable. > > > > I think there's a difference between a userland task playing the role of a > > resource and a "real" external resource the kernel doesn't control. > > > > IMO, userland tasks should not have the power to affect each other as though > > they were parts of the kernel. > > One task doing ptrace() can basically do whatever it wants with the > task being traced. This is not an exact analogy to what fuse does, > but close. Well, IMO userland tasks should not have power to grab VFS mutexes for indefinite ammount of time. ("fused is allowed to deadlock kernel, in a way only write to special file helps" is ugly). Unfortunately, I don't think there's a way to work around that deadlock within fuse design limits... (coda was able to get around it by working on whole files granularity, AFAICT), so we'll have to live with that. I think we have two separate problems here, and both are solvable, without major changes to fuse or suspend framework. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)
Hi! > > > And also "Userland should not depend on userland services", which is > > > rather more of a problem. > > > > I think you're oversimplifying it, as far as FUSE is concerned. > > > > Namely, if there are two userland tasks, A and B, and B is uninterruptible, > > because A is blocked, then this is not a usual situation. > > Fuse is one case of it occuring, and if we end up with more userspace > drivers then the problem is only going to get worse. We'll have to solve them as they come. Face it, hardware drivers _have_ to know about suspend/resume. Even userspace drivers will have to know about suspend/resume, because they need to reinit the hw during resume. Now... most parts of kernel need to know (a bit) about suspend/resume -- at least enough to play nicely with refrigerator. In retrospect it is pretty obvious that this covers fused, too, unfortunately noone noticed that when fuse was designed. Can we try to solve the suspend vs. fuse problem now? "Just removing the refrigerator" is not the answer. First, refrigerator is impossible to remove in few months timeframe, and second, it does not solve the problem anyway. (Actually, there are two separate problems with suspend vs. fuse.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > > And guess what ? It's what we do on powerbooks, and it works fine, > > > without a freezer :-) Well, issue is, you should stop claiming it works fine until issue below is fixed... please? And anyway I believe that current issue (fuse deadlocks with s2ram) should be present on powerbooks, too... it is just way harder to trigger. All that is neccessary is fused (or one of its helpers) to get frozen by accessing suspended device. Pavel > > I wish you'd stop saying that. Have you ever done any serious testing? > > > > Here's something to try: Add a time delay to the end of hub_suspend in > > drivers/usb/core/hub.c, so you can provoke a race manually. Then while > > one of your root hubs is being suspended and the system is waiting in > > that delay, either plug in a new USB device to that hub or unplug an > > existing device. > > > > Be sure that CONFIG_USB_DEBUG is on so that we can figure out what > > happened after the fact. > > If you remember, one of the things I've been advocating has always been > that we should put on hold all plug activity (unplug might be alright as > long as the user events are just delayed) when we start suspending. No > new devices, no new bindings. "hub" type devices are respondible for > bringing in the new stuff after resume. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Hi! > > How will that help? Block the kernel thread in the freezer or block it > > in the driver -- either way it is blocked. So how do your deadlocks > > get resolved? > > Because nobody is waiting on that kernel thread anyway without a freezer > so there is no deadlock anymore. In the deadlock we are seeing, _someone_ is waiting on userspace thread, that leads to deadlock with freezer. We don't know who, because we have not seen the sysrq-t dumps. The "unknown who" will deadlock on fused frozen by driver, too. We really need to fix the "unknown who" here. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
> > so the suspend process will wait for it. When binding > > is done the suspend_device() code will take the device lock and tell > > everything else to postpone further bind requests as above. > > My question referred to drivers trying to bind or unbind a device > _after_ the device has been suspended. I suppose you'll say that's > covered by the NO_BIND flag. But now we have the locking problem > mentioned above: The thread trying to bind is holding a lock which is > needed for resuming. Why would it ? Just make it fail, maybe with some kind of -ERETRY... Or it can spin with the lock not held if it want. That's a detail really. > As one of the people responsible for the USB power management > implementation, I would appreciate more details about this. For > example, a dmesg log with CONFIG_USB_DEBUG turned on together with a > complete description of the actions you took to provoke the bug. > > (I wonder how much of this "buginess" is caused by the lack of the > freezer in PPC.) No. The freezer will hide some of those problems under the carpet, but not solve the basic issue which is the driver should be solid. Period. The freezer is a flawed concept in the first place. If you go back to square one, what is the basic idea of it ? I'll basically expose the idea and go down all of the path I have in mind where it stops working and becomes an incredibly difficult thing that in the end doesn't even solve all the problems it's supposed to. So first thing first... I want a quiescent system with no new "IO requests" (whatever that mean in the context of drivers) issued to avoid races during suspend/resume. That sounds like a nice idea. Yeah. Sounds... only. Problem is. How do you define that quiescent system ? First idea is ... let's stop userland. There are various ways of doing that, but the freezer hooking into the signal code is not necessarily a bad one. No, I'm purposefully putting aside all the cases where the above doesn't work (user process in the kernel in some uninterruptible wait, etc...), which are the first big setback imho... our simple idea is suddenly not so simple anymore, but we can bring those back later. Now, there is still a problem... kernel threads. In fact, there is no fundamental distinction between a kernel thread and a user process... one has an MM and the other doesn't but as far as we are concerned, it's the same. Kernel threads can issues IOs, or like khubd, detect devices, plug/unplug them, etc etc all over the place. Easy answer that comes to mind -> freeze them too. Heh, but kernel threads don't do signals, so we end up with all those try_to_freeze(). Then what about the fact that drivers may need those kernel threads to proceed ? Some drivers queue up their IO requests to a kernel thread to process them and suspend() might need to flush those down, issue a couple more such as "spin down disks" before that kernel thread can actually be frozen... Hrm.. maybe not all of them then. But how do you decide ? What defines that a kernel can issue an IO ? In fact, if you look closely, anything doing kmalloc(...,GFP_KERNEL) for example can trigger an IO... implicitely, via the VM pushing things out. And that's just one example. In some case, those same threads that may need to be kept non-frozen are -also- the ones that will potentially submit new IOs or bring in new devices. And then, there is keventd ... what do you do about work queues ? You have everybody pouring things at workqueues... some of these things may well hit your driver, some may not. Same goes in some cases with interrupt time stuff, such as timers or tasklets.. think about networking. In the end, the nice idea that "threads/tasks cause requests, so we just stop them" basically falls appart. Half of the kernel can cause a driver to be hit somewhere and a given time, it can be from a thread context, directly caused by userland, or from some timer due to some subsystem having a keepalive thing ticking in or whatever else. Now, we go back to the previous issue of what do we do about uninterruptible sleep... You want to abort suspend because, for example, somethign called a driver that does an msleep(200) or so ? Are you aware that 99% of laptop users close their laptops and shove it in the bag not even waiting for the disk to spin down ? And you want suspend to abort because some random "happen all the time" even such as a process being somewhere temporarily in uninterruptible state in the kernel ? So let's say we freeze them from within the scheduler even when they are uininterruptible.. ouch... you just caused the deadlocks we talked about before. While without a freezer, suspend() can at least rely on the fact that it can wait for processes that have such pending locked constructs waiting will ultimately wakeupm and wait for them (or even explicitely wake them), it can't if they've been frozen. So what was a perfectly solvable moderate driver synchronisation issue becomes a deadlock nigh
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Fri, 2007-07-06 at 10:38 -0400, Alan Stern wrote: > On Fri, 6 Jul 2007, Benjamin Herrenschmidt wrote: > > > What you propose is basically a slightly over-simplistic version of what > > I think (and Paulus think) should be done. We do need to do it via > > driver callbacks down the tree since only drivers can know how to deal > > with their DMA etc... and ordering need to be respected, but that's > > basically it. > > > > And guess what ? It's what we do on powerbooks, and it works fine, > > without a freezer :-) > > I wish you'd stop saying that. Have you ever done any serious testing? > > Here's something to try: Add a time delay to the end of hub_suspend in > drivers/usb/core/hub.c, so you can provoke a race manually. Then while > one of your root hubs is being suspended and the system is waiting in > that delay, either plug in a new USB device to that hub or unplug an > existing device. > > Be sure that CONFIG_USB_DEBUG is on so that we can figure out what > happened after the fact. If you remember, one of the things I've been advocating has always been that we should put on hold all plug activity (unplug might be alright as long as the user events are just delayed) when we start suspending. No new devices, no new bindings. "hub" type devices are respondible for bringing in the new stuff after resume. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Fri, 6 Jul 2007, Kyle Moffett wrote: > Except Linus already decreed (and I heartily agree) that hibernation > and suspend-to-RAM were fundamentally completely different operations > and therefore any attempts to share code were basically just making a > big muddy mess of things. Would a thread "Remove phase-of-the-moon > calculations from network-recv code" be relevant to lunar observation > just because the two had to do with the phases of the moon? No! For that matter, shutting down a CPU and hibernation are fundamentally different operations -- but they both use the freezer. Is that a big muddy mess? But never mind; I won't discuss hibernation. > > A) Decide what to do about remote wakeup requests. > > Why do we care? If the wakeup request arrives before we go to sleep, > we obviously aren't asleep and so can't wake up. If it arrives after > we go to sleep then it will wake us up. Anything that depends on a > wakeup arriving mid-sequence is 100% masochistic race condition. You don't understand my point. If a wakeup request arrives before the system goes to sleep, and it is serviced, then a device which ought to have been suspended will in fact be awake. This will (if the parent's driver is written correctly) cause the sleep transition to abort. Not that there's necessarily anything wrong with that. I just wanted to be sure you were aware of the potential problems. > > C) Prevent devices and drivers from being registered or > > unregistered; in particular decide what to do about hot-plug or hot- > > unplug events. > > (1b) Again, that's where the NO_BIND flag comes in. If its set then > any device probe events must sleep, otherwise they can go through. I didn't say "bind", I said "registered". Admittedly, they are rather similar. Still, there are difficulties. Let's say a driver has set the NO_BIND flag for one of its devices. A bind request comes in, and the driver puts it on a waitqueue. Note that the binding thread holds the device semaphore; this is always true when a driver's probe routine is called. Later on it comes time for the PM core to resume the device, which will start up the threads on the waitqueue. Before doing so it must acquire the device semaphore. Deadlock! > If any of those things screw up suspend-to-RAM then it is 100% the > drivers fault and no "process freezer" is going to fix it, end of > story. Why do you say that? A "process freezer" can prevent bind and registration calls from occurring, since these calls have to run in process context. Ergo a freezer _can_ fix some of these problems. > And "A" cannot be made reliable. At some point you shut off > interrupts right before going to sleep, and at that point any remote > wakeup event is just going to get dropped until you actually enter > sleep mode and the hardware takes over again. If you miss a wakeup > event then whatever sent it should just retry, just as with *every* > other kind of network packet. Who mentioned network packets? And who says a remote wakeup event will get dropped once interrupts are disabled? More likely it will set a bit somewhere that causes the system to wake up immediately after it has gone to sleep. > > So what happens if a new subdevice arrives at the wrong time? Do > > you block instead of binding it? While holding a mutex needed to > > suspend the parent device? > > That would be a driver bug. If you have asynchronous probing then > proper suspend handling includes being able to postpone driver probe > events until after resume. One of your conditions (embodied in the pseudocode you posted earlier) was that drivers should be told to prevent binding and registration before the child devices are suspended. Currently the PM core doesn't do anything like that. You can't blame the drivers for this lack. Of course it could be added. Or perhaps more easily, the drivers that support asynchronous probing could be notified when a suspend is about to start so they could begin blocking bindings/registrations then. > > What about drivers trying to bind to existing devices? > > While binding it will clearly be holding a mutex/spinlock on the > parent device, ("Parent device"? Do you mean the device being bound? If so then I agree. Or do you mean the device's parent? If so then your statement is not clear at all. There is special-case code in the driver core to make sure it is true for USB devices, and it looks ugly as can be.) > so the suspend process will wait for it. When binding > is done the suspend_device() code will take the device lock and tell > everything else to postpone further bind requests as above. My question referred to drivers trying to bind or unbind a device _after_ the device has been suspended. I suppose you'll say that's covered by the NO_BIND flag. But now we have the locking problem mentioned above: The thread trying to bind is holding a lock which is needed for re
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Fri, 2007-07-06 at 11:53 +0200, Rafael J. Wysocki wrote: > > Moreover, I claim that, in the context of your example, _if_ the task > is stuck > at the wait_event_uninterruptible(), _then_ the freezerless suspend > will > deadlock with the task. Why would the task be stuck there if it's not becasue of a freezer ? The only reason I see would be a dependency on something like kmalloc trying to push things to disk, or a lock owned by another user process, but the former is a generic issue I've discussed in a separate mail and the later is a driver bug imho. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Fri, 2007-07-06 at 11:31 +0200, Oliver Neukum wrote: > Am Freitag, 6. Juli 2007 schrieb Benjamin Herrenschmidt: > > On Fri, 2007-07-06 at 09:13 +0200, Rafael J. Wysocki wrote: > > > > > > The only reason (I know of) why we don't handle uninterruptible tasks in > > > the > > > freezer is that we're afraid of the suspend process deadlocking with an > > > uninterruptible task holding a lock, but AFAICS the probability of such an > > > event is extremely small. > > > > What would deadlock specifically ? One of the drivers trying to acquire > > that lock ? It would be a driver bug then. > > Your driver's write method looks like: > > mutex_lock(); > poke_some_hardware(); > wait_event_uninterruptible(); //for result > res = evaluate_result(); > mutex_unlock(); > return res; > > If you put a task into the refrigerator at wait_event_interruptible() > you will deadlock if you need this lock for the driver to go to suspend. > The suspend method then must not take the lock _and_ it must be > aware that there may be an ongoing operation. Well... 2 things here. Either you have a freezer in which case the chances of the above scenario are increased, or you don't, in which case your suspend method will just sleep on the lock until outstanding HW accesses that have that lock are completed, and everything is fine. You need to be careful with one thing though, whether you have a freezer or not. If you driver, in some code path, whatever it is (ioctl, kernel thread, workqueue, ...) does something like: mutex_lock kmalloc(...,GFP_KERNEL); mutex_unlock And it's suspend callback then does: mutex_lock The problem here is that the disks might already have been suspended prior to your driver being called. Thus, any attempt at pushing things out to swap or dirty mmap'ings back to storage will hang, thus kmalloc can potentially hang (afaik), and you will deadlock. That's what I've been talking about earlier when I said that we should have some security in SLAB/SLUB/Buddy allocators, to silently turn GFP_KERNEL to at least GFP_NOIO or even ATOMIC before we start suspending drivers. Now, another way to deal with that would have to use pre-suspend/post-resume notifications, and have drivers avoid doing the above between those, but that's much harder. (Essentially, drivers would have to either make sure they don't do things like blocking allocations, even implicitely, or possibly fall back to a degraded synchronous mode or that sort of thing). I think it's much simpler to tweak slab/slub/buddy instead :-) Note that the above issue is orthogonal to our freezer discussion, it's just one of the potential deadlock cause we have with suspend that needs to be fixed. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Jul 06, 2007, at 11:42:33, Alan Stern wrote: On Thu, 5 Jul 2007, Kyle Moffett wrote: Umm, this thread is NOT ABOUT HIBERNATING!!! Please go back and read the subject, specifically the "suspend to RAM" parts :-D. But it _is_ about the freezer; see the "Remove process freezer" part. :-) Since the freezer is used during hibernation, hibernation is a legitimate topic. Except Linus already decreed (and I heartily agree) that hibernation and suspend-to-RAM were fundamentally completely different operations and therefore any attempts to share code were basically just making a big muddy mess of things. Would a thread "Remove phase-of-the-moon calculations from network-recv code" be relevant to lunar observation just because the two had to do with the phases of the moon? No! When your hardware can put itself to sleep and atomically preserve memory as it does so, you don't need an atomic copy. For Real Suspend(TM) (IE: Suspend-to-RAM), the list of things to do is short and simple: 1) Stop DMA and put most hardware into low-power states (stops all interrupt sources) 2) Ensure that the other CPUs have finished any trailing interrupt handlers and put them to sleep 3) Put the interrupt-controllers into low-power stat 4) Go to sleep Your short and simple list omits a few crucial items: A) Decide what to do about remote wakeup requests. Why do we care? If the wakeup request arrives before we go to sleep, we obviously aren't asleep and so can't wake up. If it arrives after we go to sleep then it will wake us up. Anything that depends on a wakeup arriving mid-sequence is 100% masochistic race condition. B) Prevent I/O requests from resuming devices that have been suspended. (1a) As I describe below, step (1) includes setting NO_BIND and NO_IO flags on devices as they are processed. Anybody who wants to do IO while those flags are set should just go sleep on a waitqueue. C) Prevent devices and drivers from being registered or unregistered; in particular decide what to do about hot-plug or hot- unplug events. (1b) Again, that's where the NO_BIND flag comes in. If its set then any device probe events must sleep, otherwise they can go through. D) Block driver bind or unbind calls. See points (1a) and (1b) above. Any of these things is capable of screwing up the course of events. (In fact A _should_ be allowed to abort a suspend.) If any of those things screw up suspend-to-RAM then it is 100% the drivers fault and no "process freezer" is going to fix it, end of story. And "A" cannot be made reliable. At some point you shut off interrupts right before going to sleep, and at that point any remote wakeup event is just going to get dropped until you actually enter sleep mode and the hardware takes over again. If you miss a wakeup event then whatever sent it should just retry, just as with *every* other kind of network packet. How about a freezer whose job it is to "wait for pending hard interrupts to complete when we have already guaranteed that we won't get any more"? That part should be really *REALLY* easy. You don't need to care about either userspace processes or kernel threads at all. Specifically, Step 1 consists of: suspend_device(dev) { set_no_bind_flag(dev); for (dev->subdevices) suspend_device(dev); set_no_io_flag(dev); wait_for_in_progress_dma(dev); turn_off_interrupts(dev); go_to_low_power_state(dev); } After you've set the "no_bind" flag, you won't get any *new* subdevices trying to bind, So what happens if a new subdevice arrives at the wrong time? Do you block instead of binding it? While holding a mutex needed to suspend the parent device? That would be a driver bug. If you have asynchronous probing then proper suspend handling includes being able to postpone driver probe events until after resume. If you have synchronous probing then the problem doesn't exist because "set_no_bind_flag" is just telling the device not to raise any more device probe interrupts. What about drivers trying to bind to existing devices? While binding it will clearly be holding a mutex/spinlock on the parent device, so the suspend process will wait for it. When binding is done the suspend_device() code will take the device lock and tell everything else to postpone further bind requests as above. What happens to I/O requests submitted after the "no_io" flag is set? The driver will have to block them, effectively creating its own little "freezer". Oh, so you're calling every waitqueue in the kernel a "freezer" now? We do these things at the driver level *all* *the* *time*. For instance, you can't submit new IOs to an ATA controller while it's renegotiating the bus speed, but that's never been a problem before. When all the leaf devices are off, the parent device can be turned off because everythi
Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Thu, 5 Jul 2007, Kyle Moffett wrote: > Umm, this thread is NOT ABOUT HIBERNATING!!! Please go back and read > the subject, specifically the "suspend to RAM" parts :-D. But it _is_ about the freezer; see the "Remove process freezer" part. :-) Since the freezer is used during hibernation, hibernation is a legitimate topic. > When your > hardware can put itself to sleep and atomically preserve memory as it > does so, you don't need an atomic copy. For Real Suspend(TM) (IE: > Suspend-to-RAM), the list of things to do is short and simple: > > 1) Stop DMA and put most hardware into low-power states (stops all > interrupt sources) > 2) Ensure that the other CPUs have finished any trailing interrupt > handlers and put them to sleep > 3) Put the interrupt-controllers into low-power state > 4) Go to sleep Your short and simple list omits a few crucial items: A) Decide what to do about remote wakeup requests. B) Prevent I/O requests from resuming devices that have been suspended. C) Prevent devices and drivers from being registered or unregistered; in particular decide what to do about hot-plug or hot-unplug events. D) Block driver bind or unbind calls. Any of these things is capable of screwing up the course of events. (In fact A _should_ be allowed to abort a suspend.) > > As Pavel rightly said, you can get rid of the freezer, but you're > > only going to have to implement another one that does the > > essentially the same thing, even if it is at some other level. > > How about a freezer whose job it is to "wait for pending hard > interrupts to complete when we have already guaranteed that we won't > get any more"? That part should be really *REALLY* easy. You don't > need to care about either userspace processes or kernel threads at > all. Specifically, Step 1 consists of: > > suspend_device(dev) > { > set_no_bind_flag(dev); > for (dev->subdevices) > suspend_device(dev); > set_no_io_flag(dev); > wait_for_in_progress_dma(dev); > turn_off_interrupts(dev); > go_to_low_power_state(dev); > } > > After you've set the "no_bind" flag, you won't get any *new* > subdevices trying to bind, So what happens if a new subdevice arrives at the wrong time? Do you block instead of binding it? While holding a mutex needed to suspend the parent device? What about drivers trying to bind to existing devices? What happens to I/O requests submitted after the "no_io" flag is set? The driver will have to block them, effectively creating its own little "freezer". > therefore it's safe to iterate over the > list of present sub-devices and suspend them. Once those are > suspended and in low-power states you can set a "no_io" flag to > prevent the driver from submitting more IO. At that point you can > lazily wait for existing DMA/IO/interrupts to finish on the device, > since *NOBODY* will be submitting them anymore, and we certainly > aren't probing for new devices. Then you can just turn off the power > to the device. When all the leaf devices are off, the parent device > can be turned off because everything waiting on the leaf devices is > blocked on them and won't unblock until the parent device *AND* the > leaf device are turned on again, in that order. This is a lot like what we already do. The differences are: There is nothing corresponding to your "no-bind" flag. Most drivers don't have anything like your "no_io" flag; they assume that nobody will be around to submit an I/O request. > Scheduling and userspace are all still fully enabled in this > scenario. Once all your devices are turned off, the only remaining > running threads will be those which haven't done IO since the > beginning of the suspend. We can then disable preemption, turn off > the timer interrupts, and tell the other CPUs to park all their > remaining threads in schedule() and sleep. Then we put the IRQ > controller to sleep and go to sleep ourselves. If our driver model > locking is sufficient to handle putting a parent device to sleep > while threads are sleeping on a child device then there are exactly 0 > problems. > > Resuming is basically running the whole process in reverse. Runtime- > suspend is achieved by not setting the 'no_io' or 'no_bind' flags and > putting selective device-subtrees to sleep without doing anything to > the rest of the system. Nobody doubts that suspend can be made to work without the freezer. The point is that doing it this way dumps a bunch of extra responsibility on drivers. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
On Fri, 6 Jul 2007, Benjamin Herrenschmidt wrote: > Why are you guys working so hard and spending so much energy to try to > avoid doing the right thing is beyond my understanding... > > > It _does_ apply to kernel threads. That's exactly why I wrote above > > that kernel threads which try to do I/O during a suspend will need > > extra attention. > > Ok none at all if you don't have a freezer. In answer to both questions: We need the freezer in order to implement hibernate. Even if we take your advice and stop using the freezer during suspend, these issues would still remain and would need to be solved. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/