Re: A kexec approach to hibernation
Hi. On Sun, 2007-06-10 at 19:02 -0700, H. Peter Anvin wrote: > Matthew Garrett wrote: > > No, it only supports ext2 (and reading ext3 as if it's ext2). Right now, > > the assumption that syncing during suspend will cause data to hit > > something grub can read isn't a safe one. > > I brought this issue up quite a few years ago at an OLS BOF. We pretty > much need a "supersync" system call; you can do this by bmapping any > file on ext3, but having something supported across filesystems would be > good. Sounds like a good idea to me. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: A kexec approach to hibernation
Xavier Bestel <[EMAIL PROTECTED]> writes: [snip] > If I were helping you coding I'd suggest to only concentrate on having > your project work on standard filesystems, and then when it works maybe > think about suspending on crypto-over-loop-over-fuse-over-vpn-over-wifi. > But talk is cheap so I'm shutting up. Right now. :) Well, the whole idea of the kexec approach is that the hibernate system doesn't need to know anything at all about filesystems or any particular device. So if it works at all, it will work for crypto-over-loop-over-fuse-over-vpn-over-wifi -over-pigeon-carrier-protocol-over-printer-and-scanner. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon, 2007-06-11 at 11:51 -0400, Jeremy Maitin-Shepard wrote: > Xavier Bestel <[EMAIL PROTECTED]> writes: > > > On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote: > >> >> You might claim then that the solution is to simply keep the network > >> >> driver quiesced or stopped. But then it is impossible to write the > >> >> image over the network. The way to get around this problem is to write > >> >> the image over the network using a fresh network stack. > >> > >> > Or teach the driver stack about the difference/reset it. Remember that > >> > even if you get a fresh network stack, you'll still be getting packets > >> > for the old stack. Getting a new ip (assuming one is available) won't > >> > stop other connections getting killed, either because we send resets > >> > from the kexec'd kernel, or because they timeout looking for the old > >> > ip. > >> > >> I could be mistaken, but I think that bringing up the network interface > >> with a different IP address would prevent it from reseting existing TCP > >> connections, because it would never receive the packets for those > >> existing connections. > > > That can't work. There are networks where the client must have a fixed > > IP, or must accept the adress given by dhcp in order to talk to > > fileservers. And you still have the same mac adress, which may cause > > problems. > > I wasn't suggesting that using a different IP address would be a general > solution. It might be a solution for a few people. > > In general, I'd imagine that most people would not bring up the network > interface at all, and most of the people that do would bring it up with > the same IP address, causing some existing TCP connections to possibly > be reset. > > I think that causing connections to be reset is, however, far better > than acking packets that are then silently thrown away. If I were helping you coding I'd suggest to only concentrate on having your project work on standard filesystems, and then when it works maybe think about suspending on crypto-over-loop-over-fuse-over-vpn-over-wifi. But talk is cheap so I'm shutting up. Right now. :) Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Xavier Bestel <[EMAIL PROTECTED]> writes: > On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote: >> >> You might claim then that the solution is to simply keep the network >> >> driver quiesced or stopped. But then it is impossible to write the >> >> image over the network. The way to get around this problem is to write >> >> the image over the network using a fresh network stack. >> >> > Or teach the driver stack about the difference/reset it. Remember that >> > even if you get a fresh network stack, you'll still be getting packets >> > for the old stack. Getting a new ip (assuming one is available) won't >> > stop other connections getting killed, either because we send resets >> > from the kexec'd kernel, or because they timeout looking for the old >> > ip. >> >> I could be mistaken, but I think that bringing up the network interface >> with a different IP address would prevent it from reseting existing TCP >> connections, because it would never receive the packets for those >> existing connections. > That can't work. There are networks where the client must have a fixed > IP, or must accept the adress given by dhcp in order to talk to > fileservers. And you still have the same mac adress, which may cause > problems. I wasn't suggesting that using a different IP address would be a general solution. It might be a solution for a few people. In general, I'd imagine that most people would not bring up the network interface at all, and most of the people that do would bring it up with the same IP address, causing some existing TCP connections to possibly be reset. I think that causing connections to be reset is, however, far better than acking packets that are then silently thrown away. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote: > >> You might claim then that the solution is to simply keep the network > >> driver quiesced or stopped. But then it is impossible to write the > >> image over the network. The way to get around this problem is to write > >> the image over the network using a fresh network stack. > > > Or teach the driver stack about the difference/reset it. Remember that > > even if you get a fresh network stack, you'll still be getting packets > > for the old stack. Getting a new ip (assuming one is available) won't > > stop other connections getting killed, either because we send resets > > from the kexec'd kernel, or because they timeout looking for the old > > ip. > > I could be mistaken, but I think that bringing up the network interface > with a different IP address would prevent it from reseting existing TCP > connections, because it would never receive the packets for those > existing connections. That can't work. There are networks where the client must have a fixed IP, or must accept the adress given by dhcp in order to talk to fileservers. And you still have the same mac adress, which may cause problems. Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Pavel Machek <[EMAIL PROTECTED]> writes: [snip] >> > If _I_ were willing to add some runtime overhead to make hibernation >> > simpler, I'd just use some virtualization to do that... with added >> > advantage of "hibernate here, resume on different hw". >> >> I don't believe there is going to be any runtime overhead. > 64MB less memory seems like runtime overhead for me. If you know how > to do kexec without pre-reserving memory, I believe kexec/kdump team > will be interested. The main reason kdump needs to reserve memory at boot is that it needs to preload the crashdump kernel into memory so that it will be available on panic (and however much memory the crashdump kernel will need to run will also need to be available at all times, since a panic can occur at any time), and also because no attempt is made to shutdown devices on panic, and consequently devices may clobber existing memory with ongoing DMA, so a reserved area of memory must be used by the crashdump kernel. For hibernate via kexec, however, these issues do not exist. The simplest solution would be to simply backup the first say 16MB or 64MB (or however much is desired for the "save" kernel to have) of memory into free pages just before copying the "save" kernel into the desired position and jumping to it. Due to the speed of memory copying, this should not add any significant overhead. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Nigel Cunningham <[EMAIL PROTECTED]> writes: [snip] > Trying to image a system to a fuse filesystem is indeed fundamentally > broken. The problem is really that we have to make choices about what we > will and won't support. > We can have suspending to fuse filesystems, but only if we have > running userspace (which in turn implies either limiting the image to > half of memory or compressing a larger image as it's copied so that it > fits in the remaining space). > We could have fuse from kexec, but then setting it > up will be... interesting. > We can have suspending to a network, but yes, we will want/need to be > selective about how network connections are handled. > I agree that the best solution seems to be selective resuming of devices > for writing the atomic copy. I had a patch to do that long ago, but it > wasn't a popular idea at the time. I'd argue that the kexec approach does provide a fairly clean way to selectively load device drivers --- simply leave out or keep as unloaded modules the drivers that you don't want to load under the "save" and "load" kernels. >> You might claim then that the solution is to simply keep the network >> driver quiesced or stopped. But then it is impossible to write the >> image over the network. The way to get around this problem is to write >> the image over the network using a fresh network stack. > Or teach the driver stack about the difference/reset it. Remember that > even if you get a fresh network stack, you'll still be getting packets > for the old stack. Getting a new ip (assuming one is available) won't > stop other connections getting killed, either because we send resets > from the kexec'd kernel, or because they timeout looking for the old > ip. I could be mistaken, but I think that bringing up the network interface with a different IP address would prevent it from reseting existing TCP connections, because it would never receive the packets for those existing connections. > I can see that kexec does provide a nice, clean separation of context > from that of the kernel being hibernated. But it also deprives us of the > ability to easily use context in the hibernating kernel such as > encrypted devices and network connections & configuration. Do you have > some way in mind that could be utilised to overcome these limitations? The reason I don't think this need to "re-setup" the context for suspending should a significant problem in practice is that the setup required under the "save kernel" should be exactly the same as that required under the "load kernel". In particular, it should likely be possible to re-use exactly the same code (in the initrd/initramfs) to locate the desired device, and/or perform any necessary device mapper commands to create the necessary devices. In the more complex case, this "setup" might require setting up a network connection and/or mounting a fuse filesystem. > [snip] >> if /boot is not mounted: mount /boot >> make change >> umount /boot >> >> If you do it from the "save kernel", you need logic like: >> mount /dev/boot-device /boot (no fstab on "save kernel", most likely) >> make change >> umount /boot. > Doesn't the unmount do everything required to sync the data? Yes it does. The issue is that some people might not have /boot as a separate partition, and have it as part of the root filesystem instead, for instance. In that case, grub is effectively accessing a dirty mounted filesystem. In practice, sync basically takes care of it, but in theory it shouldn't really be done. > [snip] >> I suppose you do that by using more sophisticated logic to atomically >> copy the pages to their final location after loading them from disk. In >> particular, I suppose you must order the page copies carefully to avoid >> clobbering pages that have not yet been copied. Seems reasonable. In >> that case, there is indeed probably no reason to not use that approach >> for resuming. > For Suspend2, I do something similar but simpler. If a page can be > loaded directly to the final address, do so. The only pages that need > to be loaded to another address and then restored are those that are > used by the loading kernel. We don't have to worry about copying > pages back in a particular order. What about the pages that couldn't be loaded back to their final address because their final address is used by another page that couldn't be loaded to its final address? Maybe you have some way to avoid this from happening, it is just something that occurred to me. (It isn't important anyway though.) I suppose in any case, we can see that resuming would be essentially the same under the kexec approach as under the current approach. > [snip] >> To me, it seems a lot easier to get right than the current approaches. > But you can't get what you said you wanted - a fully functional system > with a fully functional userspace isn't possible. You're running a > different kernel and can't safely mount filesystems that were
Re: A kexec approach to hibernation
Hi. On Fri, 2007-06-01 at 21:54 -0400, Jeremy Maitin-Shepard wrote: > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > > >> But kernel threads also rely on userspace, due to e.g. fuse and usermode > >> helpers. > > > Yes, I know that and I think these issues are solvable within the current > > approach. > > It seems like it would be very hard to get writing of an image to a > fuse filesystem working under the current scheme. > > Trying to image a system while it is running seems fundamentally broken. > As another example, I believe currently although devices are "quiesced" > or stopped while the atomic snapshot is made, they are all then started > again afterward while the image is written to disk. As a result, the > network drivers will continue acking TCP packets that are received after > the snapshot, but these packets will be lost. Trying to image a system to a fuse filesystem is indeed fundamentally broken. The problem is really that we have to make choices about what we will and won't support. We can have suspending to fuse filesystems, but only if we have running userspace (which in turn implies either limiting the image to half of memory or compressing a larger image as it's copied so that it fits in the remaining space). We could have fuse from kexec, but then setting it up will be... interesting. We can have suspending to a network, but yes, we will want/need to be selective about how network connections are handled. I agree that the best solution seems to be selective resuming of devices for writing the atomic copy. I had a patch to do that long ago, but it wasn't a popular idea at the time. Since then I've focused more on minimising the Suspend2 patch, so it's been dropped. > You might claim then that the solution is to simply keep the network > driver quiesced or stopped. But then it is impossible to write the > image over the network. The way to get around this problem is to write > the image over the network using a fresh network stack. Or teach the driver stack about the difference/reset it. Remember that even if you get a fresh network stack, you'll still be getting packets for the old stack. Getting a new ip (assuming one is available) won't stop other connections getting killed, either because we send resets from the kexec'd kernel, or because they timeout looking for the old ip. I can see that kexec does provide a nice, clean separation of context from that of the kernel being hibernated. But it also deprives us of the ability to easily use context in the hibernating kernel such as encrypted devices and network connections & configuration. Do you have some way in mind that could be utilised to overcome these limitations? [..] > Some people get away with it, but fundamentally it is broken to do so. > (The fact that the current software suspend implementations tell the > filesystems to sync to disk increases its chances of working.) You are > accessing a filesystem that is in an unknown state. Consider that the > user might make a change to grub.conf, but the kernel caches the write. > If the filesystem containing grub.conf is left mounted, the write might > never reach disk before the system is hibernated. As a result, when > grub attempts to read it, it doesn't get the expected data. > > >> >> This shouldn't be a significant problem in practice. > >> > >> > I don't agree here. > >> > >> I think hibernate-script already includes support for modifying grub's > >> configuration. > > > Yes. It does that _before_ the hibernation begins. ;-) > > Either way, it doesn't make much difference. Inside of > hibernate-script, you need logic like: > > if /boot is not mounted: mount /boot > make change > umount /boot > > If you do it from the "save kernel", you need logic like: > mount /dev/boot-device /boot (no fstab on "save kernel", most likely) > make change > umount /boot. Doesn't the unmount do everything required to sync the data? > [snip] > > >> As far as I understand it, the swsusp resume path involves the boot > >> kernel loading the entire image from disk to available memory, then > >> shutting down all the devices, and copying the memory into place, and > >> then jumping to the original kernel, which reinitializes devices and > >> starts tasks running. This isn't very different from what I was > >> proposing as the alternative anyway, except that: memory is copied once, > >> which is pretty fast, but means that only up to half of the total memory > >> can be saved. > > > No that's not correct. Actually, during the restore we _can_ load much more > > than 50% of RAM, everything needed for that is already in place. :-) > > I suppose you do that by using more sophisticated logic to atomically > copy the pages to their final location after loading them from disk. In > particular, I suppose you must order the page copies carefully to avoid > clobbering pages that have not yet been copied. Seems reasonable. In > that case, there is indeed probably no reas
Re: A kexec approach to hibernation
Matthew Garrett wrote: > No, it only supports ext2 (and reading ext3 as if it's ext2). Right now, > the assumption that syncing during suspend will cause data to hit > something grub can read isn't a safe one. I brought this issue up quite a few years ago at an OLS BOF. We pretty much need a "supersync" system call; you can do this by bmapping any file on ext3, but having something supported across filesystems would be good. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Tue, 2007-06-05 at 11:34 +0200, Stefan Seyfried wrote: > On Tue, Jun 05, 2007 at 10:15:41AM +0200, Xavier Bestel wrote: > > FWIW, on my old laptop apm beats any kernel solution hands down in terms > > of speed > > This might be true on 64MB systems. It is surely not true on multi-Gigabyte- > RAM setups. At least not if you actually use that memory for anything > including filesystem cache. > And you simply cannot buy a new machine today that still supports APM suspend > to disk. I don't contest that. I just say that technically, an "external kernel" can suspend/hibernate a laptop very well. Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Tue, Jun 05, 2007 at 10:15:41AM +0200, Xavier Bestel wrote: > FWIW, on my old laptop apm beats any kernel solution hands down in terms > of speed This might be true on 64MB systems. It is surely not true on multi-Gigabyte- RAM setups. At least not if you actually use that memory for anything including filesystem cache. And you simply cannot buy a new machine today that still supports APM suspend to disk. -- Stefan Seyfried QA / R&D Team Mobile Devices| "Any ideas, John?" SUSE LINUX Products GmbH, Nürnberg | "Well, surrounding them's out." This footer brought to you by insane German lawmakers: SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Tue, 2007-06-05 at 08:36 +1000, Nigel Cunningham wrote: > I spent some time, last I think, seriously considering this approach. > The more I thought about the details, the more I realised that it wasn't > a viable approach. As I said before, it does indeed sound like a dream > at first, but once you get into the details, it becomes more and more of > a nightmare. >From very far, it looks like apm suspend (i.e. an "external" system taking control of the computer for hibernation and resuming). FWIW, on my old laptop apm beats any kernel solution hands down in terms of speed and robustness. Not that this means anything for kexec-suspend. Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Hi! > > > To me, it seems a lot easier to get right than the current approaches. > > > > Well, you are certainly welcome to create the patch. "suspend3" name > > is still free, AFAICT. > > I could be sneaky and call it "hibernate". Probably nicer though to use the > name "kexec hibernate" to be later simplified to just "hibernate". > > I was hoping that everyone would like the idea so much that they would rush > to > implement it, so that I wouldn't have to try. (I haven't written That apparently did not happen, that much should be clear by now. > > If _I_ were willing to add some runtime overhead to make hibernation > > simpler, I'd just use some virtualization to do that... with added > > advantage of "hibernate here, resume on different hw". > > I don't believe there is going to be any runtime overhead. 64MB less memory seems like runtime overhead for me. If you know how to do kexec without pre-reserving memory, I believe kexec/kdump team will be interested. > To some extent, (see some of the explanations I gave in the other e-mail I > sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be > viewed as a cleaner variant of userspace hibernate. It also can be viewed as vaporware. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Hi. On Mon, 2007-06-04 at 18:09 -0400, Jeremy Maitin-Shepard wrote: > I was hoping that everyone would like the idea so much that they would > rush to > implement it, so that I wouldn't have to try. (I haven't written much kernel > code before, and I have a number of other time-requiring projects to work > on.) > It looks like that is not too likely to happen though ;). I spent some time, last I think, seriously considering this approach. The more I thought about the details, the more I realised that it wasn't a viable approach. As I said before, it does indeed sound like a dream at first, but once you get into the details, it becomes more and more of a nightmare. > Maybe I'll try implementing it though, and find that it isn't very much work. Perhaps that would be a good idea. Then you'll get to see those issues too. [...] > To some extent, (see some of the explanations I gave in the other e-mail I > sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be > viewed as a cleaner variant of userspace hibernate. I'm not going to bother saying more in response to that at the moment. It seems clear to me that the three of us who've actually worked on hibernation and thought about the issues actually know nothing, and everyone who hasn't worked on it is far more expert than us. I'm not saying that I think it's utterly impossible to use kexec for hibernation. I am saying that I think such an implementation would be even more of a headache than the existing issues. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: A kexec approach to hibernation
On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote: > [snip] > > You might claim then that the solution is to simply keep the network > > driver quiesced or stopped. But then it is impossible to write the > > image over the network. The way to get around this problem is to write > > the image over the network using a fresh network stack. > > The "fresh network stack" will RST any connections that were going, > which is ugly, too. It will only do this if you bring up the network device with the same IP address in the new kernel (which you would have no reason to do if you don't need to write the image over the network.) Maybe the ideal behavior would be to tell the network stack to just ignore unexpected TCP packets, rather than send RST, while saving or reading the image, but that is probably not necessary for most uses and would be a hack. I also think that sending RST is far better than sending ACK and then silently tossing out the data, which is what is currently done. (Since I believe currently the network devices are brought back up along with all other devices after the atomic copy is made.) Silently losing data is something that should only occur on a crash. This is likely to actually be a somewhat serious problem for servers on which hibernate is used to move the server between rooms without losing connections. Of course, you can get around this by adding a hack to not bring up network devices based on some option or other, but that just solves one specific case with an ugly solution. In contrast, using the kexec approach, the network device or any other device would quite naturally not be brought back up unless it was needed for hibernate, and even if it is brought back up, no data are silently lost. > [snip] > > To me, it seems a lot easier to get right than the current approaches. > > Well, you are certainly welcome to create the patch. "suspend3" name > is still free, AFAICT. I could be sneaky and call it "hibernate". Probably nicer though to use the name "kexec hibernate" to be later simplified to just "hibernate". I was hoping that everyone would like the idea so much that they would rush to implement it, so that I wouldn't have to try. (I haven't written much kernel code before, and I have a number of other time-requiring projects to work on.) It looks like that is not too likely to happen though ;). Maybe I'll try implementing it though, and find that it isn't very much work. It would be very convenient if the current work being done to improve the driver interfaces for hibernate also results in the proper interfaces needed for this approach. It looks like the resume path should be exactly the same with this approach as with the existing approaches, but the hibernate path is not exactly the same. In particular, it seems that all devices should be shut down to a greater extent that merely the quiescing neccessary for the current approaches while making an atomic copy, but also they should not be completely shut down to the extent that they cannot be restored to the desired state when resuming or aborting. > > If _I_ were willing to add some runtime overhead to make hibernation > simpler, I'd just use some virtualization to do that... with added > advantage of "hibernate here, resume on different hw". I don't believe there is going to be any runtime overhead. To some extent, (see some of the explanations I gave in the other e-mail I sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be viewed as a cleaner variant of userspace hibernate. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon, Jun 04, 2007 at 03:22:20PM +1000, Nigel Cunningham wrote: > Hi. > > I can see that the idea of writing a kernel image from using another > kernel sounds nice and clean initially, but the more we get into the > details (yes, I am listening, even though I said nothing before now), > the more it's sounding like the cure is worse than the disease. I think if we look into the details a bit more, we may find that it is in fact not worse after all. It would be nice if it were also the case that this approach could be implemented in only a few hours of work, but unfortunately I doubt that to be the case even though I imagine it may be somewhat simpler to implement than the current swsusp and suspend2 implementations. Just to give some perspective on the implementation, I believe the following functions/procedures provided by the kernel to userspace (implemented as system calls, sysfs files, ioctls, etc.) would be sufficient for this hibernation approach: (Note that I wrote this description after writing my responses to the other points you make, and so it may make more sense for those to be read first.) 1. "start hibernation" Parameters: - "save image" kernel to use (either as the binary data or as a path to the file perhaps); - extra kernel command-line parameters to the "save image" kernel; - an initrd for the "save image" kernel (if needed). This function would result in the original kernel loading the "save image" kernel into memory, stopping all devices, and jumping to the new kernel. 2. "resume from hibernation" Parameters: Somehow the block of memory containing the hibernate image would need to be provided; it could be specified as a pointer to memory in the process invoking this function, or alternatively something like /dev/snapshot could be used. This function would stop devices, shuffle the pages around in memory, and jump back to the original kernel. 3. "abort hibernation" Parameters: The address to jump back to the original kernel would need to be specified; the new kernel would know this address because it would be provided as a kernel command-line parameter. This function would act similarly to "resume from hibernation", except that the pages are already in memory exactly where they need to be, so all that needs to be done is to stop all devices, and jump back to the original kernel. If it is desired to do slightly more in the kernel, the "save image" kernel could process the kernel command-line arguments to determine the pages that need to be written, and provide of a view of them e.g. as /dev/snapshot, rather than having the userspace under the "save image" kernel do that work and then perhaps access the pages using /dev/mem. > To get rid of process freezing, we're talking about: Note that the advantage of this approach is not just getting rid of process freezing and its associated problems. There is also the advantage of allowing much greater flexibility in how the image is written, and avoiding disturbing things like the network stack. > * making hibernation depend on depriving the user of 32 or 64M of > otherwise perfectly usable memory (thereby making hibernation on > machines with less memory impossible) It is not clear that this much memory would really need to be reserved. I'll admit I don't fully understand the requirements for using kexec to load a kernel. In particular, I don't know how much memory would really be required to load a kernel to write an image, and to what extent that memory needs to be contiguous. Even if a significant amount of contiguous physical memory needs to be reserved at boot, this memory could still perhaps be used for the page cache by the original kernel, since it could be freed up for hibernation (and possibly those cached pages could be moved to different memory.) In the best case, though, a significant amount of contiguous memory would not be required, in which case a certain amount of memory would need to be freed only for hibernation, and could be used normally while not hibernating. (As a side note, with machines typically having 1GB+ of memory these days, even wasting 64MB of memory is becoming increasingly unimportant, although I agree it is not a good idea. I actually run an x86 system with 1GB of memory and no HIGHMEM support, and as a result waste over 100MB of physical memory, which would handily be free for the new kernel. Changing the VM split broke certain programs that I didn't feel like fixing.) > * requiring them to set up kexec or kdump (I don't understand the > difference, sorry) or some new variation This new hibernation approach would indeed internally use some or all of the kexec code, but I don't think this detail would significantly impact the setup procedure. The only real impact would be that the user would need to somehow specify how to access the "save image kernel" and the additional k
Re: A kexec approach to hibernation
On Mon, Jun 04, 2007 at 03:10:00PM +0200, Pavel Machek wrote: > On Mon 2007-06-04 13:20:54, Matthew Garrett wrote: > > On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote: > > > > > sync is perfectly safe way of telling the fs to store data on disk. > > > > On disk, yes. On the filesystem, no. It's valid for the data to be left > > in the journal, for instance. > > Yep... then grub needs to grok the journal. It does for ext3, IIRC. No, it only supports ext2 (and reading ext3 as if it's ext2). Right now, the assumption that syncing during suspend will cause data to hit something grub can read isn't a safe one. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon 2007-06-04 13:20:54, Matthew Garrett wrote: > On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote: > > > sync is perfectly safe way of telling the fs to store data on disk. > > On disk, yes. On the filesystem, no. It's valid for the data to be left > in the journal, for instance. Yep... then grub needs to grok the journal. It does for ext3, IIRC. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote: > sync is perfectly safe way of telling the fs to store data on disk. On disk, yes. On the filesystem, no. It's valid for the data to be left in the journal, for instance. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Hi! > >> But kernel threads also rely on userspace, due to e.g. fuse and usermode > >> helpers. > > > Yes, I know that and I think these issues are solvable within the current > > approach. > > It seems like it would be very hard to get writing of an image to a > fuse filesystem working under the current scheme. > > Trying to image a system while it is running seems fundamentally broken. > As another example, I believe currently although devices are "quiesced" > or stopped while the atomic snapshot is made, they are all then started > again afterward while the image is written to disk. As a result, the > network drivers will continue acking TCP packets that are received after > the snapshot, but these packets will be lost. > > You might claim then that the solution is to simply keep the network > driver quiesced or stopped. But then it is impossible to write the > image over the network. The way to get around this problem is to write > the image over the network using a fresh network stack. The "fresh network stack" will RST any connections that were going, which is ugly, too. > >> Grub, its configuration, and the kernel used to resume the system had > >> better be on a "safe" filesystem already (i.e. a separate, unmounted > >> before hibernation /boot). > > > Currently, you don't need to do that. > > Some people get away with it, but fundamentally it is broken to do so. > (The fact that the current software suspend implementations tell the > filesystems to sync to disk increases its chances of working.) You are > accessing a filesystem that is in an unknown state. Consider that the > user might make a change to grub.conf, but the kernel caches the write. > If the filesystem containing grub.conf is left mounted, the write might > never reach disk before the system is hibernated. As a result, when > grub attempts to read it, it doesn't get the expected data. sync is perfectly safe way of telling the fs to store data on disk. > >> That is why I think the kexec solution is the elegant solution. > > > Frankly, I think it's tricky. ;-) > > To me, it seems a lot easier to get right than the current approaches. Well, you are certainly welcome to create the patch. "suspend3" name is still free, AFAICT. If _I_ were willing to add some runtime overhead to make hibernation simpler, I'd just use some virtualization to do that... with added advantage of "hibernate here, resume on different hw". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Hi again. On Mon, 2007-06-04 at 10:05 +0200, Rafael J. Wysocki wrote: > On Monday, 4 June 2007 07:22, Nigel Cunningham wrote: > > Hi. > > > > I can see that the idea of writing a kernel image from using another > > kernel sounds nice and clean initially, but the more we get into the > > details (yes, I am listening, even though I said nothing before now), > > the more it's sounding like the cure is worse than the disease. > > > > To get rid of process freezing, we're talking about: > > * making hibernation depend on depriving the user of 32 or 64M of > > otherwise perfectly usable memory (thereby making hibernation on > > machines with less memory impossible) > > * requiring them to set up kexec or kdump (I don't understand the > > difference, sorry) or some new variation > > * adding interfaces to tell kexec/dump/whatever what pages need to be > > saved and reloaded > > * adding convolutions in which at resume time we boot one kernel, switch > > to another kernel to do the loading and then switch back again to the > > resumed kernel (assuming I understand what you're suggesting). > > > > It all sounds terribly complicated and confusing to me, and that's > > before I even begin to think about how this second kernel could possibly > > write the image to an encrypted device or LVM or such like that the > > first kernel knows about and might use now. > > > > Can't we just get the freezer right and be done with it? > > My feelings about this are pretty much the same. :-) > > At least, there still is room for improvements within the current approach, > so first I'd like to improve it as much as reasonably possible and then to > think of alternatives, if need be. Agreed. I'm not for a moment denying that the current freezer could be better, but biffing it out the window just doesn't seem to be the appropriate solution at the moment. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: A kexec approach to hibernation
Hi, On Monday, 4 June 2007 07:22, Nigel Cunningham wrote: > Hi. > > I can see that the idea of writing a kernel image from using another > kernel sounds nice and clean initially, but the more we get into the > details (yes, I am listening, even though I said nothing before now), > the more it's sounding like the cure is worse than the disease. > > To get rid of process freezing, we're talking about: > * making hibernation depend on depriving the user of 32 or 64M of > otherwise perfectly usable memory (thereby making hibernation on > machines with less memory impossible) > * requiring them to set up kexec or kdump (I don't understand the > difference, sorry) or some new variation > * adding interfaces to tell kexec/dump/whatever what pages need to be > saved and reloaded > * adding convolutions in which at resume time we boot one kernel, switch > to another kernel to do the loading and then switch back again to the > resumed kernel (assuming I understand what you're suggesting). > > It all sounds terribly complicated and confusing to me, and that's > before I even begin to think about how this second kernel could possibly > write the image to an encrypted device or LVM or such like that the > first kernel knows about and might use now. > > Can't we just get the freezer right and be done with it? My feelings about this are pretty much the same. :-) At least, there still is room for improvements within the current approach, so first I'd like to improve it as much as reasonably possible and then to think of alternatives, if need be. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Hi. I can see that the idea of writing a kernel image from using another kernel sounds nice and clean initially, but the more we get into the details (yes, I am listening, even though I said nothing before now), the more it's sounding like the cure is worse than the disease. To get rid of process freezing, we're talking about: * making hibernation depend on depriving the user of 32 or 64M of otherwise perfectly usable memory (thereby making hibernation on machines with less memory impossible) * requiring them to set up kexec or kdump (I don't understand the difference, sorry) or some new variation * adding interfaces to tell kexec/dump/whatever what pages need to be saved and reloaded * adding convolutions in which at resume time we boot one kernel, switch to another kernel to do the loading and then switch back again to the resumed kernel (assuming I understand what you're suggesting). It all sounds terribly complicated and confusing to me, and that's before I even begin to think about how this second kernel could possibly write the image to an encrypted device or LVM or such like that the first kernel knows about and might use now. Can't we just get the freezer right and be done with it? Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: A kexec approach to hibernation
On Fri, Jun 01, 2007 at 07:54:30PM -0400, Jeremy Maitin-Shepard wrote: > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > > > On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote: > > [snip] > > >> Just before jumping into the new kernel, with interrupts disabled, the > >> old kernel could either prepare a data structure that specifies what > >> pages are allocated, or alternatively simply provide a pointer to the > >> relevant data structure in the old kernel. > > > But for this purpose the old kernel will actually need to do what is > > currently > > done in swsusp while the image is being created (the only difference is that > > we allocate memory in the process, but that's a detail only). > > Okay, but creating a list of pages should be extremely easy. > Alternatively, with the "save kernel" might be able to read the existing > data structures directly. > Can't we do it Kdump way? Kdump creates ELF headers and stores these in memory. Address of these elf headers is passed to second kernel through command line parameter. These ELF headers contain the information regarding what memory areas need to be captured by the second kernel. Can't we adopt similar raw approach for hibernation? Reserve a memory area for second kernel (This is not used by first kernel). During hibernation, load second kernel in reserved memory area which will also determine what physical memory needs to be saved (possibly reading /proc/iomem) and create ELF headers and then second kernel can parse these headers and save the memory. The output file can possibly be and ELF image again so that restoring back becomes easier. I am just thinking that do we have to create a list of pages etc? Can't we just copy the raw memory to disk and restore it back. Information regarding where a chunk of memory should go back will be provided by the ELF header. One fall side would be that problem of reserving a memory area constantly and this memory area is currently reserved at first kernel boot time. Can we somehow make this reservation dynamic? Something like using hugepage support so that we can allocate big chunks of contiguous physical memory from user space (32MB or 64MB) at run time. Thanks Vivek - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Saturday, 2 June 2007 03:54, Jeremy Maitin-Shepard wrote: > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > > >> But kernel threads also rely on userspace, due to e.g. fuse and usermode > >> helpers. > > > Yes, I know that and I think these issues are solvable within the current > > approach. > > It seems like it would be very hard to get writing of an image to a > fuse filesystem working under the current scheme. If the filesystem is located on a separate partition, that should be doable (not that I'm going to try it in the foreseeable future). > Trying to image a system while it is running seems fundamentally broken. I don't agree. > As another example, I believe currently although devices are "quiesced" > or stopped while the atomic snapshot is made, they are all then started > again afterward while the image is written to disk. As a result, the > network drivers will continue acking TCP packets that are received after > the snapshot, but these packets will be lost. > > You might claim then that the solution is to simply keep the network > driver quiesced or stopped. But then it is impossible to write the > image over the network. The way to get around this problem is to write > the image over the network using a fresh network stack. Can we just take the interface down and bring it up just for writing the image, possibly with another IP address? We don't need another kernel to do that. > >> [snip] > >> > >> >> > One more thing: How do we restore the system state? > >> >> > >> >> The "resume kernel" would be loaded at the same address as the "save > >> >> kernel" was loaded (it should probably be the same kernel), > >> > >> > Well, we'd have to use a relocatable kernel for this purpose, it > >> > seems. > >> > >> Not necessarily relocatable (although that would be the usual > >> solution). It just needs to be loaded at a different address than the > >> normal kernel. > > > AFAICS, you can't do that with a kernel which is not relocatable (you can > > load > > it, of course, but will it work then?). > > I seem to recall in recent kernel versions support for both a > relocatable kernel and also support for non-relocatable kernels which > load at a non-standard address. > > >> If it isn't relocatable, the memory that would be needed by the "save > >> kernel" > >> would have to be reserved at boot. > > > That doesn't seem to be realistic to me. > > Okay. I don't see why there would be a problem with using a relocatable > kernel though. > > [snip] > > >> >> Presumably it would be most convenient to have the normal boot loader > >> >> load the resume kernel directly at the desired address. The > >> >> disadvantage is that at the same time the image is written, something > >> >> would have to be done so that the boot loader would know to load the > >> >> resume kernel, rather than the normal kernel. (E.g. the image writing > >> >> kernel would need to modify the grub config file.) > >> > >> > No, it can't do that, unless the file is on a 'safe' filesystem > >> > >> Grub, its configuration, and the kernel used to resume the system had > >> better be on a "safe" filesystem already (i.e. a separate, unmounted > >> before hibernation /boot). > > > Currently, you don't need to do that. > > Some people get away with it, but fundamentally it is broken to do so. > (The fact that the current software suspend implementations tell the > filesystems to sync to disk increases its chances of working.) You are > accessing a filesystem that is in an unknown state. Consider that the > user might make a change to grub.conf, but the kernel caches the write. > If the filesystem containing grub.conf is left mounted, the write might > never reach disk before the system is hibernated. As a result, when > grub attempts to read it, it doesn't get the expected data. Yes, that can happen in theory (and I believe it's happend for some XFS users), but _most_ often it just works and people are used to doing it. > >> >> This shouldn't be a significant problem in practice. > >> > >> > I don't agree here. > >> > >> I think hibernate-script already includes support for modifying grub's > >> configuration. > > > Yes. It does that _before_ the hibernation begins. ;-) > > Either way, it doesn't make much difference. Inside of > hibernate-script, you need logic like: > > if /boot is not mounted: mount /boot > make change > umount /boot > > If you do it from the "save kernel", you need logic like: > mount /dev/boot-device /boot (no fstab on "save kernel", most likely) > make change > umount /boot. > > [snip] > > >> As far as I understand it, the swsusp resume path involves the boot > >> kernel loading the entire image from disk to available memory, then > >> shutting down all the devices, and copying the memory into place, and > >> then jumping to the original kernel, which reinitializes devices and > >> starts tasks running. This isn't very different from what I was > >> proposing as the alte
Re: A kexec approach to hibernation
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: >> But kernel threads also rely on userspace, due to e.g. fuse and usermode >> helpers. > Yes, I know that and I think these issues are solvable within the current > approach. It seems like it would be very hard to get writing of an image to a fuse filesystem working under the current scheme. Trying to image a system while it is running seems fundamentally broken. As another example, I believe currently although devices are "quiesced" or stopped while the atomic snapshot is made, they are all then started again afterward while the image is written to disk. As a result, the network drivers will continue acking TCP packets that are received after the snapshot, but these packets will be lost. You might claim then that the solution is to simply keep the network driver quiesced or stopped. But then it is impossible to write the image over the network. The way to get around this problem is to write the image over the network using a fresh network stack. >> [snip] >> >> >> > One more thing: How do we restore the system state? >> >> >> >> The "resume kernel" would be loaded at the same address as the "save >> >> kernel" was loaded (it should probably be the same kernel), >> >> > Well, we'd have to use a relocatable kernel for this purpose, it >> > seems. >> >> Not necessarily relocatable (although that would be the usual >> solution). It just needs to be loaded at a different address than the >> normal kernel. > AFAICS, you can't do that with a kernel which is not relocatable (you can load > it, of course, but will it work then?). I seem to recall in recent kernel versions support for both a relocatable kernel and also support for non-relocatable kernels which load at a non-standard address. >> If it isn't relocatable, the memory that would be needed by the "save kernel" >> would have to be reserved at boot. > That doesn't seem to be realistic to me. Okay. I don't see why there would be a problem with using a relocatable kernel though. [snip] >> >> Presumably it would be most convenient to have the normal boot loader >> >> load the resume kernel directly at the desired address. The >> >> disadvantage is that at the same time the image is written, something >> >> would have to be done so that the boot loader would know to load the >> >> resume kernel, rather than the normal kernel. (E.g. the image writing >> >> kernel would need to modify the grub config file.) >> >> > No, it can't do that, unless the file is on a 'safe' filesystem >> >> Grub, its configuration, and the kernel used to resume the system had >> better be on a "safe" filesystem already (i.e. a separate, unmounted >> before hibernation /boot). > Currently, you don't need to do that. Some people get away with it, but fundamentally it is broken to do so. (The fact that the current software suspend implementations tell the filesystems to sync to disk increases its chances of working.) You are accessing a filesystem that is in an unknown state. Consider that the user might make a change to grub.conf, but the kernel caches the write. If the filesystem containing grub.conf is left mounted, the write might never reach disk before the system is hibernated. As a result, when grub attempts to read it, it doesn't get the expected data. >> >> This shouldn't be a significant problem in practice. >> >> > I don't agree here. >> >> I think hibernate-script already includes support for modifying grub's >> configuration. > Yes. It does that _before_ the hibernation begins. ;-) Either way, it doesn't make much difference. Inside of hibernate-script, you need logic like: if /boot is not mounted: mount /boot make change umount /boot If you do it from the "save kernel", you need logic like: mount /dev/boot-device /boot (no fstab on "save kernel", most likely) make change umount /boot. [snip] >> As far as I understand it, the swsusp resume path involves the boot >> kernel loading the entire image from disk to available memory, then >> shutting down all the devices, and copying the memory into place, and >> then jumping to the original kernel, which reinitializes devices and >> starts tasks running. This isn't very different from what I was >> proposing as the alternative anyway, except that: memory is copied once, >> which is pretty fast, but means that only up to half of the total memory >> can be saved. > No that's not correct. Actually, during the restore we _can_ load much more > than 50% of RAM, everything needed for that is already in place. :-) I suppose you do that by using more sophisticated logic to atomically copy the pages to their final location after loading them from disk. In particular, I suppose you must order the page copies carefully to avoid clobbering pages that have not yet been copied. Seems reasonable. In that case, there is indeed probably no reason to not use that approach for resuming. [snip] >> The whole reason to want to checkpoint filesystems was
Re: A kexec approach to hibernation
On Saturday, 2 June 2007 01:54, Jeremy Maitin-Shepard wrote: > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > > > On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote: > > [snip] > > >> Just before jumping into the new kernel, with interrupts disabled, the > >> old kernel could either prepare a data structure that specifies what > >> pages are allocated, or alternatively simply provide a pointer to the > >> relevant data structure in the old kernel. > > > But for this purpose the old kernel will actually need to do what is > > currently > > done in swsusp while the image is being created (the only difference is that > > we allocate memory in the process, but that's a detail only). > > Okay, but creating a list of pages should be extremely easy. > Alternatively, with the "save kernel" might be able to read the existing > data structures directly. > > >> I can't say exactly how this data would be given to the new kernel, but I > >> can't imagine it being difficult. (For instance, multiboot headers, the > >> kernel command line, initrd, or some other mechanism could be used.) > > > Besides, you need to load the new kernel somehow. If that's to work without > > problems, that should be done before we switch off devices. > > Well, the new kernel can be loaded at any time, No. By reading from a file systems, you're modifying it's meta data (in general, of course). > and would be done in exactly the way kexec loads a kernel. It would probably > make sense to load the kernel into memory (but not jump to it) as the very > first step of hibernation. I think you'd have to do that. > >> >> 5. The new kernel loads, and then either kernel space or user space > >> >> writes the necessary data from the old kernel to disk. > >> > >> > You also need to reinitialize devices needed to write the image. > >> > >> Yes. That would be done, as normal, when the kernel loads. Currently > >> devices are suspended or stopped anyway before the atomic copy, and then > >> reinitialized to write the image. In theory, this stopping shouldn't be > >> needed, and I mentioned that if additional support were added to some > >> drivers for passing some information about the current state of the > >> device, the device might only need to be partially shut down before > >> jumping to the new kernel. This might allow, for instance, avoiding > >> spinning down and then up again the disks. > > > Well, I don't quite agree. I think that for this purpose we'll need > > devices to > > be initialized from scratch by the new kernel, so the old kernel should put > > them into states that allow this to be done. > > I agree that the default behavior should be to completely shut down the > devices. Later, special support could be added to select devices to > allow them to not be fully shut down. > > > We are going to implement something like this anyway, but that's a rather > > long > > way to go. > > >> >> 6. The new kernel either powers off or suspends to ram. If it suspends > >> >> to ram, then it would need to be able to jump back to the old kernel > >> >> when it resumes from ram. > >> > >> > What if the user wants to abort the hibernation? > >> > >> This would be handled in effectively the same way as if the user wants > >> to suspend to ram after writing the image: it would be necessary to jump > >> back to the old kernel. This would effectively be handled in the same > >> way as a resume, except that the copying back of memory would be > >> avoided. Presumably the image writing kernel would have devices in > >> approximately the same state as the image loading kernel, and so the old > >> kernel needs to be prepared to receive the devices in that state anyway. > > > Please see above. I don't think that would be easy to arrange for. > > In that case, the devices can indeed be fully shut down, at least > initially. > > >> >> The advantages of this approach include: > >> >> > >> >> - having a completely functional system (with a completely functional > >> >> userspace) from which the image is written, without having to worry > >> >> about messing up the state that is being saved (hell, the user could > >> >> even do it via an interactive shell on the new kernel); > >> >> > >> >> - no need to worry about trying to use drivers while some processes are > >> >> frozen; > >> > >> > We're rather worried about running processes when the devices are > >> > frozen. ;-) > >> > >> The point is, with this kexec approach, essentially no code at all runs > >> under the old kernel after the very initial steps of the hibernation > >> have begun, but any code, kernel or user, can run under the new kernel, > >> because the new kernel provides a completely functional system, while at > >> the same time not clobbering any of the memory of the old kernel. In > >> particular, it will be possible to write the image to a fuse file > >> system. > > > You need to be cautious here. You can't touch any filesystems mounted by >
Re: A kexec approach to hibernation
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote: [snip] >> Just before jumping into the new kernel, with interrupts disabled, the >> old kernel could either prepare a data structure that specifies what >> pages are allocated, or alternatively simply provide a pointer to the >> relevant data structure in the old kernel. > But for this purpose the old kernel will actually need to do what is currently > done in swsusp while the image is being created (the only difference is that > we allocate memory in the process, but that's a detail only). Okay, but creating a list of pages should be extremely easy. Alternatively, with the "save kernel" might be able to read the existing data structures directly. >> I can't say exactly how this data would be given to the new kernel, but I >> can't imagine it being difficult. (For instance, multiboot headers, the >> kernel command line, initrd, or some other mechanism could be used.) > Besides, you need to load the new kernel somehow. If that's to work without > problems, that should be done before we switch off devices. Well, the new kernel can be loaded at any time, and would be done in exactly the way kexec loads a kernel. It would probably make sense to load the kernel into memory (but not jump to it) as the very first step of hibernation. >> >> 5. The new kernel loads, and then either kernel space or user space >> >> writes the necessary data from the old kernel to disk. >> >> > You also need to reinitialize devices needed to write the image. >> >> Yes. That would be done, as normal, when the kernel loads. Currently >> devices are suspended or stopped anyway before the atomic copy, and then >> reinitialized to write the image. In theory, this stopping shouldn't be >> needed, and I mentioned that if additional support were added to some >> drivers for passing some information about the current state of the >> device, the device might only need to be partially shut down before >> jumping to the new kernel. This might allow, for instance, avoiding >> spinning down and then up again the disks. > Well, I don't quite agree. I think that for this purpose we'll need devices > to > be initialized from scratch by the new kernel, so the old kernel should put > them into states that allow this to be done. I agree that the default behavior should be to completely shut down the devices. Later, special support could be added to select devices to allow them to not be fully shut down. > We are going to implement something like this anyway, but that's a rather long > way to go. >> >> 6. The new kernel either powers off or suspends to ram. If it suspends >> >> to ram, then it would need to be able to jump back to the old kernel >> >> when it resumes from ram. >> >> > What if the user wants to abort the hibernation? >> >> This would be handled in effectively the same way as if the user wants >> to suspend to ram after writing the image: it would be necessary to jump >> back to the old kernel. This would effectively be handled in the same >> way as a resume, except that the copying back of memory would be >> avoided. Presumably the image writing kernel would have devices in >> approximately the same state as the image loading kernel, and so the old >> kernel needs to be prepared to receive the devices in that state anyway. > Please see above. I don't think that would be easy to arrange for. In that case, the devices can indeed be fully shut down, at least initially. >> >> The advantages of this approach include: >> >> >> >> - having a completely functional system (with a completely functional >> >> userspace) from which the image is written, without having to worry >> >> about messing up the state that is being saved (hell, the user could >> >> even do it via an interactive shell on the new kernel); >> >> >> >> - no need to worry about trying to use drivers while some processes are >> >> frozen; >> >> > We're rather worried about running processes when the devices are >> > frozen. ;-) >> >> The point is, with this kexec approach, essentially no code at all runs >> under the old kernel after the very initial steps of the hibernation >> have begun, but any code, kernel or user, can run under the new kernel, >> because the new kernel provides a completely functional system, while at >> the same time not clobbering any of the memory of the old kernel. In >> particular, it will be possible to write the image to a fuse file >> system. > You need to be cautious here. You can't touch any filesystems mounted by > the old kernel, or they will be corrupted after the restore. Certainly. Note that any filesystems that are available to the "save state" kernel would have been specifically mounted under that kernel. There isn't any real possibility of confusion over which filesystems are safe to access. >> >> - no need for complicated process freezing; >> >> > In fact it's not complicated, at least as
Re: A kexec approach to hibernation
On Saturday, 2 June 2007 00:25, Jeremy Maitin-Shepard wrote: > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > > > On Friday, 1 June 2007 22:39, Jeremy Maitin-Shepard wrote: > >> I figured I'd throw this idea out, since although it is not perfect, it > >> has the potential to elegantly solve a lot of issues with hibernate. > >> > >> Just as kexec can now be used to write a crashdump after a kernel panic, > >> a fresh kexec-loaded kernel (loaded into unused memory) could be used to > >> write the hibernate image of the existing kernel to disk, and then power > >> off the system (or suspend to ram, or anything else). This avoids the > >> need for the original kernel to jump through hoops to hibernate itself > >> in place. > >> > >> A hibernate sequence would be approximately as follows: > >> > >> 1. Free some memory if needed or desired, and disable the swap device > >> if it is going to be used to write the hibernate image. > > > Why to disable it? > > To make sure that the swap data won't get clobbered by the writing of > the image, if the swap device is to be used to write the hibernate > image. Presumably something similar is already done. In any case this > is not an important point. > > >> 2. Load the fresh kernel in a chunk of available (possibly > >> pre-allocated) memory (there must also be enough available memory > >> for this kernel to use). > >> > >> 3. Disable interrupts and stop all devices. > > > Well, this is one of the hardest parts of hibernation, so no advantage > > here. > > It seems like support for this is mostly already in place though, and it > needs to be done for suspend to ram, kexec, and shutdown anyway. > > >> 4. Jump to the new kernel, passing whatever state information will be > >> needed by it to know how to write the image. > > > How would we know which data to write (more precisely, which data to > > tell the other kernel to write)? How do we pass this information to > > the new kernel? > > Just before jumping into the new kernel, with interrupts disabled, the > old kernel could either prepare a data structure that specifies what > pages are allocated, or alternatively simply provide a pointer to the > relevant data structure in the old kernel. But for this purpose the old kernel will actually need to do what is currently done in swsusp while the image is being created (the only difference is that we allocate memory in the process, but that's a detail only). > I can't say exactly how this data would be given to the new kernel, but I > can't imagine it being difficult. (For instance, multiboot headers, the > kernel command line, initrd, or some other mechanism could be used.) Besides, you need to load the new kernel somehow. If that's to work without problems, that should be done before we switch off devices. > >> 5. The new kernel loads, and then either kernel space or user space > >> writes the necessary data from the old kernel to disk. > > > You also need to reinitialize devices needed to write the image. > > Yes. That would be done, as normal, when the kernel loads. Currently > devices are suspended or stopped anyway before the atomic copy, and then > reinitialized to write the image. In theory, this stopping shouldn't be > needed, and I mentioned that if additional support were added to some > drivers for passing some information about the current state of the > device, the device might only need to be partially shut down before > jumping to the new kernel. This might allow, for instance, avoiding > spinning down and then up again the disks. Well, I don't quite agree. I think that for this purpose we'll need devices to be initialized from scratch by the new kernel, so the old kernel should put them into states that allow this to be done. We are going to implement something like this anyway, but that's a rather long way to go. > >> 6. The new kernel either powers off or suspends to ram. If it suspends > >> to ram, then it would need to be able to jump back to the old kernel > >> when it resumes from ram. > > > What if the user wants to abort the hibernation? > > This would be handled in effectively the same way as if the user wants > to suspend to ram after writing the image: it would be necessary to jump > back to the old kernel. This would effectively be handled in the same > way as a resume, except that the copying back of memory would be > avoided. Presumably the image writing kernel would have devices in > approximately the same state as the image loading kernel, and so the old > kernel needs to be prepared to receive the devices in that state anyway. Please see above. I don't think that would be easy to arrange for. > >> The advantages of this approach include: > >> > >> - having a completely functional system (with a completely functional > >> userspace) from which the image is written, without having to worry > >> about messing up the state that is being saved (hell, the user could > >> even do it via an interactive she
Re: A kexec approach to hibernation
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > On Friday, 1 June 2007 22:39, Jeremy Maitin-Shepard wrote: >> I figured I'd throw this idea out, since although it is not perfect, it >> has the potential to elegantly solve a lot of issues with hibernate. >> >> Just as kexec can now be used to write a crashdump after a kernel panic, >> a fresh kexec-loaded kernel (loaded into unused memory) could be used to >> write the hibernate image of the existing kernel to disk, and then power >> off the system (or suspend to ram, or anything else). This avoids the >> need for the original kernel to jump through hoops to hibernate itself >> in place. >> >> A hibernate sequence would be approximately as follows: >> >> 1. Free some memory if needed or desired, and disable the swap device >> if it is going to be used to write the hibernate image. > Why to disable it? To make sure that the swap data won't get clobbered by the writing of the image, if the swap device is to be used to write the hibernate image. Presumably something similar is already done. In any case this is not an important point. >> 2. Load the fresh kernel in a chunk of available (possibly >> pre-allocated) memory (there must also be enough available memory >> for this kernel to use). >> >> 3. Disable interrupts and stop all devices. > Well, this is one of the hardest parts of hibernation, so no advantage > here. It seems like support for this is mostly already in place though, and it needs to be done for suspend to ram, kexec, and shutdown anyway. >> 4. Jump to the new kernel, passing whatever state information will be >> needed by it to know how to write the image. > How would we know which data to write (more precisely, which data to > tell the other kernel to write)? How do we pass this information to > the new kernel? Just before jumping into the new kernel, with interrupts disabled, the old kernel could either prepare a data structure that specifies what pages are allocated, or alternatively simply provide a pointer to the relevant data structure in the old kernel. I can't say exactly how this data would be given to the new kernel, but I can't imagine it being difficult. (For instance, multiboot headers, the kernel command line, initrd, or some other mechanism could be used.) >> 5. The new kernel loads, and then either kernel space or user space >> writes the necessary data from the old kernel to disk. > You also need to reinitialize devices needed to write the image. Yes. That would be done, as normal, when the kernel loads. Currently devices are suspended or stopped anyway before the atomic copy, and then reinitialized to write the image. In theory, this stopping shouldn't be needed, and I mentioned that if additional support were added to some drivers for passing some information about the current state of the device, the device might only need to be partially shut down before jumping to the new kernel. This might allow, for instance, avoiding spinning down and then up again the disks. >> 6. The new kernel either powers off or suspends to ram. If it suspends >> to ram, then it would need to be able to jump back to the old kernel >> when it resumes from ram. > What if the user wants to abort the hibernation? This would be handled in effectively the same way as if the user wants to suspend to ram after writing the image: it would be necessary to jump back to the old kernel. This would effectively be handled in the same way as a resume, except that the copying back of memory would be avoided. Presumably the image writing kernel would have devices in approximately the same state as the image loading kernel, and so the old kernel needs to be prepared to receive the devices in that state anyway. >> The advantages of this approach include: >> >> - having a completely functional system (with a completely functional >> userspace) from which the image is written, without having to worry >> about messing up the state that is being saved (hell, the user could >> even do it via an interactive shell on the new kernel); >> >> - no need to worry about trying to use drivers while some processes are >> frozen; > We're rather worried about running processes when the devices are > frozen. ;-) The point is, with this kexec approach, essentially no code at all runs under the old kernel after the very initial steps of the hibernation have begun, but any code, kernel or user, can run under the new kernel, because the new kernel provides a completely functional system, while at the same time not clobbering any of the memory of the old kernel. In particular, it will be possible to write the image to a fuse file system. >> - no need for complicated process freezing; > In fact it's not complicated, at least as far as the user land is > concerned. I think given all of the issues about whether to freeze kernel or users tasks, which tasks to freeze, etc., it is hard to argue that there are not complications, and many possible bugs an
Re: A kexec approach to hibernation
On Friday, 1 June 2007 22:39, Jeremy Maitin-Shepard wrote: > I figured I'd throw this idea out, since although it is not perfect, it > has the potential to elegantly solve a lot of issues with hibernate. > > Just as kexec can now be used to write a crashdump after a kernel panic, > a fresh kexec-loaded kernel (loaded into unused memory) could be used to > write the hibernate image of the existing kernel to disk, and then power > off the system (or suspend to ram, or anything else). This avoids the > need for the original kernel to jump through hoops to hibernate itself > in place. > > A hibernate sequence would be approximately as follows: > > 1. Free some memory if needed or desired, and disable the swap device > if it is going to be used to write the hibernate image. Why to disable it? > 2. Load the fresh kernel in a chunk of available (possibly > pre-allocated) memory (there must also be enough available memory > for this kernel to use). > > 3. Disable interrupts and stop all devices. Well, this is one of the hardest parts of hibernation, so no advantage here. > 4. Jump to the new kernel, passing whatever state information will be > needed by it to know how to write the image. How would we know which data to write (more precisely, which data to tell the other kernel to write)? How do we pass this information to the new kernel? > 5. The new kernel loads, and then either kernel space or user space > writes the necessary data from the old kernel to disk. You also need to reinitialize devices needed to write the image. > 6. The new kernel either powers off or suspends to ram. If it suspends > to ram, then it would need to be able to jump back to the old kernel > when it resumes from ram. What if the user wants to abort the hibernation? > The advantages of this approach include: > > - having a completely functional system (with a completely functional >userspace) from which the image is written, without having to worry >about messing up the state that is being saved (hell, the user could >even do it via an interactive shell on the new kernel); > > - no need to worry about trying to use drivers while some processes are >frozen; We're rather worried about running processes when the devices are frozen. ;-) > - no need for complicated process freezing; In fact it's not complicated, at least as far as the user land is concerned. > the same logic that can be used for suspend to ram should be sufficient; > > - no need for an atomic copy of memory, or any other complicated memory >copying; the memory of the old kernel, including the page cache, can >be written directly; > > - instead of needing a significant amount of free memory to store the >atomic copy, only a few megabytes would needed to load and run the >new kernel. Yes, this sounds good in theory. > It may or may not be necessary to require that the new kernel used to > write the image is the same as the existing kernel; it will likely be > useful to require that it is built from the same sources and with a > similar config. It would likely be useful, however, to either compile > out or (e.g. via the kernel command-line) disable the initialization of > drivers that will not be needed to write the image, such as sound > drivers, cdrom drivers, filesystems, and network drivers (if the image > is not to be written via the network). I think that, for average users, this would be difficult. > Of course, if special initialization was needed under the original > kernel to set up the devices that will be used to write the image, such > as device mapper setup, or network initialization, that will have to be > repeated under the new kernel as well. This is the principal > disadvantage to this approach, but since it must be done during resume > from hibernation in any case, it doesn't seem like a very significant > disadvantage. The other disadvantage is that there would be the > delay of loading the fresh kernel; this may, however, only take a second > or two, which is relatively insignificant compared to the time required > to actually write the image, and the delay could be reduced by stripping > out unnecessary drivers from the image-writing kernel. One more thing: How do we restore the system state? Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
A kexec approach to hibernation
I figured I'd throw this idea out, since although it is not perfect, it has the potential to elegantly solve a lot of issues with hibernate. Just as kexec can now be used to write a crashdump after a kernel panic, a fresh kexec-loaded kernel (loaded into unused memory) could be used to write the hibernate image of the existing kernel to disk, and then power off the system (or suspend to ram, or anything else). This avoids the need for the original kernel to jump through hoops to hibernate itself in place. A hibernate sequence would be approximately as follows: 1. Free some memory if needed or desired, and disable the swap device if it is going to be used to write the hibernate image. 2. Load the fresh kernel in a chunk of available (possibly pre-allocated) memory (there must also be enough available memory for this kernel to use). 3. Disable interrupts and stop all devices. 4. Jump to the new kernel, passing whatever state information will be needed by it to know how to write the image. 5. The new kernel loads, and then either kernel space or user space writes the necessary data from the old kernel to disk. 6. The new kernel either powers off or suspends to ram. If it suspends to ram, then it would need to be able to jump back to the old kernel when it resumes from ram. The advantages of this approach include: - having a completely functional system (with a completely functional userspace) from which the image is written, without having to worry about messing up the state that is being saved (hell, the user could even do it via an interactive shell on the new kernel); - no need to worry about trying to use drivers while some processes are frozen; - no need for complicated process freezing; the same logic that can be used for suspend to ram should be sufficient; - no need for an atomic copy of memory, or any other complicated memory copying; the memory of the old kernel, including the page cache, can be written directly; - instead of needing a significant amount of free memory to store the atomic copy, only a few megabytes would needed to load and run the new kernel. It may or may not be necessary to require that the new kernel used to write the image is the same as the existing kernel; it will likely be useful to require that it is built from the same sources and with a similar config. It would likely be useful, however, to either compile out or (e.g. via the kernel command-line) disable the initialization of drivers that will not be needed to write the image, such as sound drivers, cdrom drivers, filesystems, and network drivers (if the image is not to be written via the network). Of course, if special initialization was needed under the original kernel to set up the devices that will be used to write the image, such as device mapper setup, or network initialization, that will have to be repeated under the new kernel as well. This is the principal disadvantage to this approach, but since it must be done during resume from hibernation in any case, it doesn't seem like a very significant disadvantage. The other disadvantage is that there would be the delay of loading the fresh kernel; this may, however, only take a second or two, which is relatively insignificant compared to the time required to actually write the image, and the delay could be reduced by stripping out unnecessary drivers from the image-writing kernel. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/