Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote: >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: >> >> > On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote: >> >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: >> >> >> >> [snip] >> >> >> >> > The ACPI NVS area is explicitly marked as reserved and we don't save it. >> >> > On x86_64 we don't save any memory areas marked as reserved and yet the >> > above >> >> > happens. >> >> >> >> I think you have mentioned before, though, that ACPI is first >> >> initialized by the boot kernel, before it is later initialized by >> >> resuming kernel. This could well be the source of the problem. >> >> > No, it's not. I have tested that too with an ACPI-less boot kernel. >> >> Well, it seems that there just must be some other bug. I would define >> anything that differs between the post-resume initialization of ACPI > I'm not sure what you mean. >> from the normal boot initialization of ACPI as a bug. If the interaction >> with the hardware is the same, then the behavior will be the same. > The ACPI platform firmware is allowed to preserve information accross the > hibernation-resume cycle, so this need not be the same. All of my comments related to the case where S4 is not being used (instead the system is just powered off normally), and a boot kernel that does not initialize ACPI is used. In that case, the ACPI platform firmware should not be able to distinguish a normal boot from a resume from hibernation. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote: >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: >> >> [snip] >> >> > The ACPI NVS area is explicitly marked as reserved and we don't save it. >> > On x86_64 we don't save any memory areas marked as reserved and yet the > above >> > happens. >> >> I think you have mentioned before, though, that ACPI is first >> initialized by the boot kernel, before it is later initialized by >> resuming kernel. This could well be the source of the problem. > No, it's not. I have tested that too with an ACPI-less boot kernel. Well, it seems that there just must be some other bug. I would define anything that differs between the post-resume initialization of ACPI from the normal boot initialization of ACPI as a bug. If the interaction with the hardware is the same, then the behavior will be the same. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] > The ACPI NVS area is explicitly marked as reserved and we don't save it. > On x86_64 we don't save any memory areas marked as reserved and yet the above > happens. I think you have mentioned before, though, that ACPI is first initialized by the boot kernel, before it is later initialized by resuming kernel. This could well be the source of the problem. In particular, isn't it the case that you also switch the devices to low power mode before resuming? -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: > On Friday, 21 September 2007 15:14, huang ying wrote: >> On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote: >> > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote: >> > > Nigel Cunningham <[EMAIL PROTECTED]> writes: > [--snip--] >> > > >> > > No one has yet attacked the hard problem of coming up with separate >> > > hibernate methods for drivers. >> > >> > Well, I've been playing a bit with that for some time, but it's not easy by > any >> > means. >> > >> > In short, I'm seeing some problems related to the handling of ACPI that >> > seem > to >> > shatter the entire idea of having separate hibernate methods, at least as > far >> > as ACPI systems are concerned. >> >> So sadly to hear this. Can you details it a little? Or a link? > Well, the problem is that apparently some systems (eg. my HP nx6325) expect us > to execute the _PTS ACPI global control method before creating the image _and_ > to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the > system into the sleep state. In particular, on nx6325, if we don't do that, > then after the restore the status of the AC power will not be reported > correctly (and if you replace the battery while in the sleep state, the > battery status will not be updated correctly after the restore). Similar > issues have been reported for other machines. Suppose that instead of using ACPI S4 state at all, you instead just power off. Yes, you'll lose wakeup event functionality, and flashy LEDs, but doesn't this take care of the problem? The firmware shouldn't see the hibernate as anything other than a shutdown and reboot. ACPI should be initialized normally when resuming, which should take care of getting AC power status reported properly. This should be the behavior, anyway, on the many systems that do not support S4. > Now, the ACPI specification requires us to put devices into low power states > before executing _PTS and that's exactly what we're doing before a suspend to > RAM. Thus, it seems that in general we need to do the same for hibernation on > ACPI systems. It seems that if ACPI S4 is going to be used, Switching to low power state is something that should be done only immediately before entering that state (i.e. after the image has already been saved). In particular, it should not be done just before the atomic copy. It is true that (during resume) after the atomic copy snapshot is restored, drivers will need to be prepared (i.e. have saved whatever information is necessary) to _resume_ devices from the low power state, but that does not mean they have to actually be put into that low power state before the copy is made. I agree that for the kexec implementation there may be additional issues. For swsusp, uswsusp, and tuxonice, though, I don't see why there should be a problem. I think that, as was recognized before, all of the issues are resolved by properly considering exactly what each callback should do and when it should be called. The problems stem from ambiguous specifications, or trying to use the same callback for two different purposes or in two different cases. Let me know if I'm mistaken. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
Rafael J. Wysocki [EMAIL PROTECTED] writes: On Friday, 21 September 2007 15:14, huang ying wrote: On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Friday, 21 September 2007 05:33, Eric W. Biederman wrote: Nigel Cunningham [EMAIL PROTECTED] writes: [--snip--] No one has yet attacked the hard problem of coming up with separate hibernate methods for drivers. Well, I've been playing a bit with that for some time, but it's not easy by any means. In short, I'm seeing some problems related to the handling of ACPI that seem to shatter the entire idea of having separate hibernate methods, at least as far as ACPI systems are concerned. So sadly to hear this. Can you details it a little? Or a link? Well, the problem is that apparently some systems (eg. my HP nx6325) expect us to execute the _PTS ACPI global control method before creating the image _and_ to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the system into the sleep state. In particular, on nx6325, if we don't do that, then after the restore the status of the AC power will not be reported correctly (and if you replace the battery while in the sleep state, the battery status will not be updated correctly after the restore). Similar issues have been reported for other machines. Suppose that instead of using ACPI S4 state at all, you instead just power off. Yes, you'll lose wakeup event functionality, and flashy LEDs, but doesn't this take care of the problem? The firmware shouldn't see the hibernate as anything other than a shutdown and reboot. ACPI should be initialized normally when resuming, which should take care of getting AC power status reported properly. This should be the behavior, anyway, on the many systems that do not support S4. Now, the ACPI specification requires us to put devices into low power states before executing _PTS and that's exactly what we're doing before a suspend to RAM. Thus, it seems that in general we need to do the same for hibernation on ACPI systems. It seems that if ACPI S4 is going to be used, Switching to low power state is something that should be done only immediately before entering that state (i.e. after the image has already been saved). In particular, it should not be done just before the atomic copy. It is true that (during resume) after the atomic copy snapshot is restored, drivers will need to be prepared (i.e. have saved whatever information is necessary) to _resume_ devices from the low power state, but that does not mean they have to actually be put into that low power state before the copy is made. I agree that for the kexec implementation there may be additional issues. For swsusp, uswsusp, and tuxonice, though, I don't see why there should be a problem. I think that, as was recognized before, all of the issues are resolved by properly considering exactly what each callback should do and when it should be called. The problems stem from ambiguous specifications, or trying to use the same callback for two different purposes or in two different cases. Let me know if I'm mistaken. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] The ACPI NVS area is explicitly marked as reserved and we don't save it. On x86_64 we don't save any memory areas marked as reserved and yet the above happens. I think you have mentioned before, though, that ACPI is first initialized by the boot kernel, before it is later initialized by resuming kernel. This could well be the source of the problem. In particular, isn't it the case that you also switch the devices to low power mode before resuming? -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
Rafael J. Wysocki [EMAIL PROTECTED] writes: On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote: Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] The ACPI NVS area is explicitly marked as reserved and we don't save it. On x86_64 we don't save any memory areas marked as reserved and yet the above happens. I think you have mentioned before, though, that ACPI is first initialized by the boot kernel, before it is later initialized by resuming kernel. This could well be the source of the problem. No, it's not. I have tested that too with an ACPI-less boot kernel. Well, it seems that there just must be some other bug. I would define anything that differs between the post-resume initialization of ACPI from the normal boot initialization of ACPI as a bug. If the interaction with the hardware is the same, then the behavior will be the same. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
Rafael J. Wysocki [EMAIL PROTECTED] writes: On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote: Rafael J. Wysocki [EMAIL PROTECTED] writes: On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote: Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] The ACPI NVS area is explicitly marked as reserved and we don't save it. On x86_64 we don't save any memory areas marked as reserved and yet the above happens. I think you have mentioned before, though, that ACPI is first initialized by the boot kernel, before it is later initialized by resuming kernel. This could well be the source of the problem. No, it's not. I have tested that too with an ACPI-less boot kernel. Well, it seems that there just must be some other bug. I would define anything that differs between the post-resume initialization of ACPI I'm not sure what you mean. from the normal boot initialization of ACPI as a bug. If the interaction with the hardware is the same, then the behavior will be the same. The ACPI platform firmware is allowed to preserve information accross the hibernation-resume cycle, so this need not be the same. All of my comments related to the case where S4 is not being used (instead the system is just powered off normally), and a boot kernel that does not initialize ACPI is used. In that case, the ACPI platform firmware should not be able to distinguish a normal boot from a resume from hibernation. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL / MPL license issues.
Dave Jones <[EMAIL PROTECTED]> writes: > There are a number of files in the kernel that have in their > headers a notice that the file is under the Mozilla Public License, > which alone, is incompatible with the GPL. > This itself is fine, as long as the resulting code claims > to be Dual MPL/GPL, however there are a few cases where this > doesn't seem to be happening. All of the files that you cite include a notice that they are licensed under the GPLv2, in addition to the MPL. There is no reason that MODULE_LICENSE needs to indicate that some portions of code may also be available under an alternative license. Furthermore, for some modules that contain both code licensed under the GPLv2 exclusively, and code dual-licensed under both the GPLv2 and the MPL, it would be incorrect to state that the combined work is dual-licensed under the GPLv2 and the MPL. As far as providing a convenience to users, I can't see why anyone would really care that a particular module includes some code that may be licensed under the MPL as well. Anyone actually looking through the kernel for code to incorporate into an MPL project would surely read the copyright headers at the top of the source files, rather than try to use the MODULE_LICENSE notes. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL / MPL license issues.
Dave Jones [EMAIL PROTECTED] writes: There are a number of files in the kernel that have in their headers a notice that the file is under the Mozilla Public License, which alone, is incompatible with the GPL. This itself is fine, as long as the resulting code claims to be Dual MPL/GPL, however there are a few cases where this doesn't seem to be happening. All of the files that you cite include a notice that they are licensed under the GPLv2, in addition to the MPL. There is no reason that MODULE_LICENSE needs to indicate that some portions of code may also be available under an alternative license. Furthermore, for some modules that contain both code licensed under the GPLv2 exclusively, and code dual-licensed under both the GPLv2 and the MPL, it would be incorrect to state that the combined work is dual-licensed under the GPLv2 and the MPL. As far as providing a convenience to users, I can't see why anyone would really care that a particular module includes some code that may be licensed under the MPL as well. Anyone actually looking through the kernel for code to incorporate into an MPL project would surely read the copyright headers at the top of the source files, rather than try to use the MODULE_LICENSE notes. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 12/26] ext2 white-out support
Jörn Engel <[EMAIL PROTECTED]> writes: > On Wed, 1 August 2007 15:33:30 -0400, Josef Sipek wrote: >> >> This brings up an very interesting (but painful) question...which makes more >> sense? Allowing the modifications in only the top-most branch, or any branch >> (given the user allows it at mount-time)? >> >> This is really question to the community at large, not just you, Dave :) > Only write to top-most layer. > There are two reasons for this. First it allows users to create a union > mount, test something (e.g. update the distribution) and remove every > trace from the test by umounting the top-most layer. Such a thing can > be quite valuable. Josef did specifically state that modification to the lower layers would be allowed only if a special mount flag is given. > The second reason is simplicity. I personally couldn't even start to > describe the semantics. If the user does a rename, which layer will the > change end up in? What if source or target exist in multiple layers? > How to rename a directory in a lower layer containing a new file in an > upper layer? > Finding new and interesting corner cases for such a beast can be quite > entertaining. And until someone has properly documented the semantics > for _all_ the corner cases, my enthusiasm is below freezing point. Does > such a documentation exist? I think that if someone can come up with consistent (and useful) semantics for a mount option that allows modifications to other layers as well, it would be a useful additional feature to support. It seems that it should be possible to add this feature at a later time in any case. Perhaps referring to the plan9 semantics could be helpful. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 12/26] ext2 white-out support
Jörn Engel [EMAIL PROTECTED] writes: On Wed, 1 August 2007 15:33:30 -0400, Josef Sipek wrote: This brings up an very interesting (but painful) question...which makes more sense? Allowing the modifications in only the top-most branch, or any branch (given the user allows it at mount-time)? This is really question to the community at large, not just you, Dave :) Only write to top-most layer. There are two reasons for this. First it allows users to create a union mount, test something (e.g. update the distribution) and remove every trace from the test by umounting the top-most layer. Such a thing can be quite valuable. Josef did specifically state that modification to the lower layers would be allowed only if a special mount flag is given. The second reason is simplicity. I personally couldn't even start to describe the semantics. If the user does a rename, which layer will the change end up in? What if source or target exist in multiple layers? How to rename a directory in a lower layer containing a new file in an upper layer? Finding new and interesting corner cases for such a beast can be quite entertaining. And until someone has properly documented the semantics for _all_ the corner cases, my enthusiasm is below freezing point. Does such a documentation exist? I think that if someone can come up with consistent (and useful) semantics for a mount option that allows modifications to other layers as well, it would be a useful additional feature to support. It seems that it should be possible to add this feature at a later time in any case. Perhaps referring to the plan9 semantics could be helpful. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Pavel Machek <[EMAIL PROTECTED]> writes: [snip] > So it will be break at least battery status and "AC plugged in" > status, because those are handled by ACPI and we do not know how to > control them by hand. It seems that it should be possible to initialize ACPI as if the system just booted up normally. Then battery status and such should be correct, since they are correct after normal initialization. It should be possible to make hibernate look just like a reboot to all of the devices, including ACPI stuff. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Pavel Machek [EMAIL PROTECTED] writes: [snip] So it will be break at least battery status and AC plugged in status, because those are handled by ACPI and we do not know how to control them by hand. It seems that it should be possible to initialize ACPI as if the system just booted up normally. Then battery status and such should be correct, since they are correct after normal initialization. It should be possible to make hibernate look just like a reboot to all of the devices, including ACPI stuff. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
It seems that you could still potentially get a failure to freeze if one FUSE process depends on another, and the one that is frozen second just happens to be waiting on the one that is frozen first when it is frozen. I admit that this situation is unlikely, and perhaps acceptable. A larger concern is that it seems that freezing FUSE processes at all _will_ generate deadlocks if a non-synchronous or memory-map-supporting filesystem is loopback mounted from a FUSE filesystem. In that case, if you attempt to sync or free memory once FUSE is frozen, you are sure to get a deadlock. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Alan Stern <[EMAIL PROTECTED]> writes: > On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: >> >> when doing a suspend-to-ram you get to a point where you just don't use >> >> any userspace. >> >> > What do you mean? How can you prevent user tasks from running? That's >> > basically what the freezer does, and the whole point of this approach >> > is to eliminate the freezer. Right? >> >> Presumably no tasks at all would be scheduled. > How would you prevent tasks from being scheduled? How would you > prevent drivers from deadlocking because in order to put their device > in a low-power state they need to acquire a lock which is held by a > user task? Perhaps this isn't an issue once the device is already quiesced. I'm just conjecturing. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Alan Stern <[EMAIL PROTECTED]> writes: > On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote: >> > Userspace can submit I/O requests. Someone will have to audit every >> > driver to make sure that such I/O requests don't cause a quiesced >> > device to become active. If the device is active, it will make the >> > memory snapshot inconsistent with the on-device data. >> >> assuming this is the suspend-from-ram after a kexec back from the >> write-to-disk kernel I don't think you are correct. >> >> when doing a suspend-to-ram you get to a point where you just don't use >> any userspace. > What do you mean? How can you prevent user tasks from running? That's > basically what the freezer does, and the whole point of this approach > is to eliminate the freezer. Right? Presumably no tasks at all would be scheduled. >> from that point on you are just walking the device tree >> putting things into low-power mode. This is the point where we are talking >> about jumping to. > Yes. And putting things into low-power mode requires the ability to > run the scheduler, which means that user tasks can be scheduled, which > means that they can run. Does it really (fundamentally) require scheduling tasks, particularly in the case that the devices have already been put in the "quiesced" state? -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] >> Or add a small bit of infrastructure that errors writes at make_request >> if you don't have a magic "i am a direct block device write from >> userspace" flag on the bio. >> >> The hibernate may fail, but you don't corrupt the media. >> >> If you don't get the image out, resume back to the "this is resume" >> instead of the power-down path. > Well, I don't think that is much prettier than the freezer ... It seems that a better solution to the "how do we write to a file on an in-use partition" has been suggested, which also handles swap partitions and swap files, and does not require mounting filesystems, so it seems that the filesystem issue need not be considered. [snip] > No. I'm saying that when you go back from the image-saving kernel to the > hibernated kernel, you need to make sure that no task will cause any > filesystem's on-disk state to be actually updated. If you can't make such > a guarantee, you just can't do that. > With the current state of the drivers, it's not doable without the > freezer. It seems that it should be feasible to fix the drivers so that 1. they can be taken from normal state to quiesced state without requiring the freezer; 2. they can be taken from normal state to low power state without requiring the freezer; 3. they can be taken from quiesced state to low power state without requiring the freezer. In the particular, it seems that it should be possible to do (3) without needing to schedule tasks. It seems likely that (2) may in fact be almost exactly the same as, or at least similar to, (1) followed by (3), at least for many drivers. (1) is required by the kexec hibernate approach even ignoring suspend to both or S4. (2) is required for suspend to ram without the freezer, which seems to be desired anyway. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Milton Miller <[EMAIL PROTECTED]> writes: [snip] >>>> (7) how to avoid corrupting filesystems mounted by the hibernated kernel >>> >>> I didn't realize this was a discussion item. I thought the options were >>> clear, for some filesystem types you can mount them read-only, but for >>> ext3 (and possilby other less common ones) you just plain cannot touch >>> them. >> >> That's correct. And since you cannot thouch ext3, you need either to assume >> that you won't touch filesystems at all, or to have a code to recognize the >> filesystem you're dealing with. > Or add a small bit of infrastructure that errors writes at make_request if you > don't have a magic "i am a direct block device write from userspace" flag on > the > bio. I still don't understand why there is this fixation on accessing dirty filesystems in use by the hibernated system. Even if you avoid corrupting the filesystem by avoiding writing to the block device, there isn't any real guarantee about the state of the data, except for a filesystem that specifically makes guarantees about such data (and I don't believe any of the existing ones do). It isn't necessary to be able to access such filesystems: everything can be done from an initramfs/initrd. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Milton Miller [EMAIL PROTECTED] writes: [snip] (7) how to avoid corrupting filesystems mounted by the hibernated kernel I didn't realize this was a discussion item. I thought the options were clear, for some filesystem types you can mount them read-only, but for ext3 (and possilby other less common ones) you just plain cannot touch them. That's correct. And since you cannot thouch ext3, you need either to assume that you won't touch filesystems at all, or to have a code to recognize the filesystem you're dealing with. Or add a small bit of infrastructure that errors writes at make_request if you don't have a magic i am a direct block device write from userspace flag on the bio. I still don't understand why there is this fixation on accessing dirty filesystems in use by the hibernated system. Even if you avoid corrupting the filesystem by avoiding writing to the block device, there isn't any real guarantee about the state of the data, except for a filesystem that specifically makes guarantees about such data (and I don't believe any of the existing ones do). It isn't necessary to be able to access such filesystems: everything can be done from an initramfs/initrd. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] Or add a small bit of infrastructure that errors writes at make_request if you don't have a magic i am a direct block device write from userspace flag on the bio. The hibernate may fail, but you don't corrupt the media. If you don't get the image out, resume back to the this is resume instead of the power-down path. Well, I don't think that is much prettier than the freezer ... It seems that a better solution to the how do we write to a file on an in-use partition has been suggested, which also handles swap partitions and swap files, and does not require mounting filesystems, so it seems that the filesystem issue need not be considered. [snip] No. I'm saying that when you go back from the image-saving kernel to the hibernated kernel, you need to make sure that no task will cause any filesystem's on-disk state to be actually updated. If you can't make such a guarantee, you just can't do that. With the current state of the drivers, it's not doable without the freezer. It seems that it should be feasible to fix the drivers so that 1. they can be taken from normal state to quiesced state without requiring the freezer; 2. they can be taken from normal state to low power state without requiring the freezer; 3. they can be taken from quiesced state to low power state without requiring the freezer. In the particular, it seems that it should be possible to do (3) without needing to schedule tasks. It seems likely that (2) may in fact be almost exactly the same as, or at least similar to, (1) followed by (3), at least for many drivers. (1) is required by the kexec hibernate approach even ignoring suspend to both or S4. (2) is required for suspend to ram without the freezer, which seems to be desired anyway. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Alan Stern [EMAIL PROTECTED] writes: On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote: Userspace can submit I/O requests. Someone will have to audit every driver to make sure that such I/O requests don't cause a quiesced device to become active. If the device is active, it will make the memory snapshot inconsistent with the on-device data. assuming this is the suspend-from-ram after a kexec back from the write-to-disk kernel I don't think you are correct. when doing a suspend-to-ram you get to a point where you just don't use any userspace. What do you mean? How can you prevent user tasks from running? That's basically what the freezer does, and the whole point of this approach is to eliminate the freezer. Right? Presumably no tasks at all would be scheduled. from that point on you are just walking the device tree putting things into low-power mode. This is the point where we are talking about jumping to. Yes. And putting things into low-power mode requires the ability to run the scheduler, which means that user tasks can be scheduled, which means that they can run. Does it really (fundamentally) require scheduling tasks, particularly in the case that the devices have already been put in the quiesced state? -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Alan Stern [EMAIL PROTECTED] writes: On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: when doing a suspend-to-ram you get to a point where you just don't use any userspace. What do you mean? How can you prevent user tasks from running? That's basically what the freezer does, and the whole point of this approach is to eliminate the freezer. Right? Presumably no tasks at all would be scheduled. How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? Perhaps this isn't an issue once the device is already quiesced. I'm just conjecturing. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] >> There are NO ACPI LIMITS! There only are things that you need to implement >> if you're going to support ACPI, but they need not be used ALWAYS, no? > yes there are limits. the fact that you can't remove the battery in S4 mode > without messing things up is a limit, You won't mess things up as long as the resuming kernel knows that it should resume as if the system were shutdown, rather than sent to S4 state. Maybe it is even possible to detect what type of resuming is needed automatically. Similarly, booting another OS shouldn't be a problem, except that if you do it without powering off the system first, some devices might not work under the other OS if the other OS doesn't initialize them properly. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] >> How do you guarantee that no tasks are scheduled when you get back to the >> hibernated kernel? > just don't schedule any userspace tasks. all you need to do is to execute the > ACPI sleep functions. you normally do that after stopping userspace > anyway. What does "stopping userspace" mean? You already said it does not mean disabling interrupts. But using the freezer is also not an option, since the avoidance of that is the main reason for the kexec approach in the first place. [snip] >> Well, not exactly. If your battery runs out of power while you're suspended, >> but you have the image saved, it's still better to restore from the image, > even >> if something may not work correctly after the restore, than to risk a loss of >> data. > if things don't work correctly you are still risking the loss of data, the > user > just doesn't know it. It should be possible on any system to do a hibernate followed by a shutdown (and then resume properly, without any problems). Thus, for handling suspend to both, you resume as if the system had been shutdown, rather than resuming as if the system came from S4. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] >>> * figure out which devices can wake up >>> * put devices into low power states (wake-up devices are placed in the Dx >>> states compatible with the wake capability, the others are powered off) > this can't be done by the image-saving kernel if that kernel doesn't know > about > the device. The image-saving kernel can be made to know about all of the "wake up" devices; all other devices should have already been powered off by the "hibernated" kernel. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] > this is where we disagree. > why not? if all that the hibernated kernel does is to suspend-to-ram and makes > no changes to disks or TCP connections anything that it does do would be lost > if > power were to fail and you instead did a restore from disk. It would be okay to switch the "hibernated" kernel in order to e.g. initiate a suspend to ram provided that everything is done atomically with interrupts off, for instance. It is not clear, though, that it is possible to suspend to ram atomically like that. There is also the question of what state the devices will be in when switching back from the "save image" kernel to the "hibernated" kernel. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] > the non-ACPI hibernate behaves very differently, and for some people (and I > think I am one of them) it will meet their needs better then _any_ of the ACPI > suspends. It may have certain differences from the user point of view, but from the implementation view, it seems that it is nearly exactly the same. The only differences seem to be: - rather than shutting down, do whatever is necessary to stick the system in S4 state. - make sure ACPI isn't initialized by the "load image" kernel - rather than "resume from hibernate" ACPI by initializing it normally, issue the special hibernate-related methods. Thus, it seems that supporting ACPI S4 will have a very minimal affect on the hibernate implementation. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] >> Rafael, for those of us who aren't thoroughly familiar with all the ins >> and outs of the ACPI spec, could you please summarize a list of the >> ACPI calls needed in the second and third cases above? Indicate which >> ones need to be done from within the original kernel and which should >> be done from within a kexec'd hibernation kernel. > Sure. > In the third case (ie. transition to S4) we are supposed to do the following: > (1) Upon entering the sleep state, which IMO can be done _after_ the image > has been saved: I assume you mean "in order to enter the sleep state", rather than "upon entering the sleep state". I still don't understand what you mean by "which IMO can be done _after_ the image has been saved"; as far as I understand, the last step of this process, "make the platform enter S4", is almost like a shutdown as far as the kernel is concerned (except for the tiny detail of having to call those special ACPI methods on resume); consequently, it would seem that nothing can be done after that step. > * figure out which devices can wake up > * put devices into low power states (wake-up devices are placed in the Dx > states compatible with the wake capability, the others are powered off) > * execute the _PTS global control method > * switch off the nonlocal CPUs (eg. nonboot CPUs on x86) > * execute the _GTS global control method > * set the GPE enable registers corresponding to the wake-up devices) > * make the platform enter S4 (there's a well defined procedure for that) > I think that this should be done by the image-saving kernel. I agree. > (2) Upon start-up (by which I mean what happens after the user has pressed > the power button or something like that): > * check if the image is present (and valid) _without_ enabling ACPI (we > don't > do that now, but I see no reason for not doing it in the new framework) > * if the image is present (and valid), load it > * turn on ACPI (unless already turned on by the BIOS, that is) > * execute the _BFS global control method > * execute the _WAK global control method > * continue > Here, the first two things should be done by the image-loading kernel, but > the remaining operations have to be carried out by the restored > kernel. It doesn't seem like a problem for that to be the case, but out of curiosity why do those methods need to be executed by the "restored" kernel, rather than the "image loading" kernel. Do they require some information from ACPI-related kernel data structures that were populated by the normal ACPI initialization? [snip] > ... we can't return to the hibernated kernel unless we are going to cancel the > hibernation. I agree. > That's why I think that for the suspend-to-both the image-saving kernel will > need to support the same set of devices as the hibernated kernel. If all of the devices that the image writing kernel doesn't know about have already been shut down/powered off by the hibernated kernel, then does the "image writing" kernel still need to know about them in order to suspend to RAM properly (i.e. without leaving some devices on wasting power)? -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] > Well, first, the fact is that _some_ systems _will_ be powered while in > hibernation (the majority of notebooks, for example) and you should assume > that the platform _may_ retain some information accross the > hibernation/restore > cycle. In that case you _should_ _not_ trash the information retained by the > platform. I'm not sure the majority of notebook users will want wakeup support in exchange for some power consumption while the system is off. I think many people would not consider the trouble of having to press the power button instead of merely opening the lid too great. Furthermore, S4 mode is of course also not suitable if you intend to replace the battery while the system is hibernated. It does seem that it is useful to provide S4 as an option, but certainly just shutting down should also be an option on all systems. > Now, with that in mind, ACPI requires us to make the system enter the S4 sleep > state as a result of the hibernation procedure. In my opinion this may be > done > after saving the image, but still this means, for example, that the > image-saving kernel needs to support ACPI. It seems that it most certainly must be done AFTER saving the image, as the image obviously cannot be saved after entering S4 state, since S4 state is nearly the same as powering off completely and all memory will be lost. > Next, during the restore, we should first check if the image is present (and > valid) _without_ turning ACPI on (note that this is not done by the current > hibernation code and that leads to strange problems on some systems). Then, > if the image is present (and valid), we should first load it, jump to the > hibernated kernel and _then_ turn ACPI on and execute the _BFS and > _WAK ACPI global methods (again, this is not done by the current code in that > order, which is wrong). Only after that is the hibernated kernel supposed to > continue. It seems that the implementation of that behavior for Linux cannot be quite so simple, since resume from hibernation is driven (in general) from an initrd/initramfs rather than directly from the kernel initialization sequence, in order to support modular drivers and features like DM and LVM. Thus, there would have to be a new "delay_acpi_initialization" kernel command-line option. Additionally, there would be a sysfs interface to tell the kernel to proceed with the ACPI initialization as normal. This would be used by an initrd/initramfs after determining that a resume from hibernate will not be done. If a resume from hibernate is done, this hook won't be used, and instead the resumed kernel will call the ACPI hibernate resume stuff if S4 state was used; otherwise, the resumed kernel will just re-initialize ACPI as normal. Also, if the in-kernel code for checking if a resume can be done does not find a hibernate image, it will also invoke the delayed ACPI initialization. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] > I'm afraid of one thing, though. > If we create a framework without ACPI (well, ACPI needs to be enabled in the > kernel anyway for other reasons, like the ability to suspend to RAM) and then > it turns out that we have to add some ACPI hooks to it, that might be > difficult > to do cleanly. > Thus, it seems reasonable to think of the ACPI handling in advance. As far as I understand, ACPI support is only useful for hibernate to the extent that it allows some or all of the following features: - possibly shows a nice looking "hibernate" LED - possibly allows the BIOS to show something about hibernate - possibly allows the lid or keyboard to "wake up" (turn on) the system Note that properly restoring device state (or even properly determining whether on external/mains power vs. battery) on resume is not something that should require special hibernate ACPI support, since it should be possible to make hibernate (and in general it will be the case that hibernate will) look exactly like a reboot to the BIOS/ACPI/devices. The problem that you mentioned on your system regarding power source information would seem to just be a problem with how ACPI is reinitialized after resuming from hibernation, which is not at all surprising since we know it (the use of driver calls for hibernate) is currently broken in many ways. It seems that enabling S4 mode should just be treated as a special shutdown mode, independent of hibernate. In practice, it may likely only be useful in conjunction with hibernate, but there doesn't seem to be any reason it needs to be coupled. It would be useful to determine whether it is necessary to initialize ACPI specially after "resuming" from S4 mode, though, or whether they can be initialized normally (i.e. by a normal kernel for instance, completely unaware of hibernate). If they can be initialized normally, then it seems that it is unnecessary to have any ACPI S4 mode support in the resume path, and it can merely exist as a special shutdown mode. Note that it seems a bit odd if ACPI can't be initialized normally after resume from S4 (and still work), since the "load image" kernel initializes everything normally before attempting to resume the hibernated system. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] I'm afraid of one thing, though. If we create a framework without ACPI (well, ACPI needs to be enabled in the kernel anyway for other reasons, like the ability to suspend to RAM) and then it turns out that we have to add some ACPI hooks to it, that might be difficult to do cleanly. Thus, it seems reasonable to think of the ACPI handling in advance. As far as I understand, ACPI support is only useful for hibernate to the extent that it allows some or all of the following features: - possibly shows a nice looking hibernate LED - possibly allows the BIOS to show something about hibernate - possibly allows the lid or keyboard to wake up (turn on) the system Note that properly restoring device state (or even properly determining whether on external/mains power vs. battery) on resume is not something that should require special hibernate ACPI support, since it should be possible to make hibernate (and in general it will be the case that hibernate will) look exactly like a reboot to the BIOS/ACPI/devices. The problem that you mentioned on your system regarding power source information would seem to just be a problem with how ACPI is reinitialized after resuming from hibernation, which is not at all surprising since we know it (the use of driver calls for hibernate) is currently broken in many ways. It seems that enabling S4 mode should just be treated as a special shutdown mode, independent of hibernate. In practice, it may likely only be useful in conjunction with hibernate, but there doesn't seem to be any reason it needs to be coupled. It would be useful to determine whether it is necessary to initialize ACPI specially after resuming from S4 mode, though, or whether they can be initialized normally (i.e. by a normal kernel for instance, completely unaware of hibernate). If they can be initialized normally, then it seems that it is unnecessary to have any ACPI S4 mode support in the resume path, and it can merely exist as a special shutdown mode. Note that it seems a bit odd if ACPI can't be initialized normally after resume from S4 (and still work), since the load image kernel initializes everything normally before attempting to resume the hibernated system. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] Well, first, the fact is that _some_ systems _will_ be powered while in hibernation (the majority of notebooks, for example) and you should assume that the platform _may_ retain some information accross the hibernation/restore cycle. In that case you _should_ _not_ trash the information retained by the platform. I'm not sure the majority of notebook users will want wakeup support in exchange for some power consumption while the system is off. I think many people would not consider the trouble of having to press the power button instead of merely opening the lid too great. Furthermore, S4 mode is of course also not suitable if you intend to replace the battery while the system is hibernated. It does seem that it is useful to provide S4 as an option, but certainly just shutting down should also be an option on all systems. Now, with that in mind, ACPI requires us to make the system enter the S4 sleep state as a result of the hibernation procedure. In my opinion this may be done after saving the image, but still this means, for example, that the image-saving kernel needs to support ACPI. It seems that it most certainly must be done AFTER saving the image, as the image obviously cannot be saved after entering S4 state, since S4 state is nearly the same as powering off completely and all memory will be lost. Next, during the restore, we should first check if the image is present (and valid) _without_ turning ACPI on (note that this is not done by the current hibernation code and that leads to strange problems on some systems). Then, if the image is present (and valid), we should first load it, jump to the hibernated kernel and _then_ turn ACPI on and execute the _BFS and _WAK ACPI global methods (again, this is not done by the current code in that order, which is wrong). Only after that is the hibernated kernel supposed to continue. It seems that the implementation of that behavior for Linux cannot be quite so simple, since resume from hibernation is driven (in general) from an initrd/initramfs rather than directly from the kernel initialization sequence, in order to support modular drivers and features like DM and LVM. Thus, there would have to be a new delay_acpi_initialization kernel command-line option. Additionally, there would be a sysfs interface to tell the kernel to proceed with the ACPI initialization as normal. This would be used by an initrd/initramfs after determining that a resume from hibernate will not be done. If a resume from hibernate is done, this hook won't be used, and instead the resumed kernel will call the ACPI hibernate resume stuff if S4 state was used; otherwise, the resumed kernel will just re-initialize ACPI as normal. Also, if the in-kernel code for checking if a resume can be done does not find a hibernate image, it will also invoke the delayed ACPI initialization. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] Rafael, for those of us who aren't thoroughly familiar with all the ins and outs of the ACPI spec, could you please summarize a list of the ACPI calls needed in the second and third cases above? Indicate which ones need to be done from within the original kernel and which should be done from within a kexec'd hibernation kernel. Sure. In the third case (ie. transition to S4) we are supposed to do the following: (1) Upon entering the sleep state, which IMO can be done _after_ the image has been saved: I assume you mean in order to enter the sleep state, rather than upon entering the sleep state. I still don't understand what you mean by which IMO can be done _after_ the image has been saved; as far as I understand, the last step of this process, make the platform enter S4, is almost like a shutdown as far as the kernel is concerned (except for the tiny detail of having to call those special ACPI methods on resume); consequently, it would seem that nothing can be done after that step. * figure out which devices can wake up * put devices into low power states (wake-up devices are placed in the Dx states compatible with the wake capability, the others are powered off) * execute the _PTS global control method * switch off the nonlocal CPUs (eg. nonboot CPUs on x86) * execute the _GTS global control method * set the GPE enable registers corresponding to the wake-up devices) * make the platform enter S4 (there's a well defined procedure for that) I think that this should be done by the image-saving kernel. I agree. (2) Upon start-up (by which I mean what happens after the user has pressed the power button or something like that): * check if the image is present (and valid) _without_ enabling ACPI (we don't do that now, but I see no reason for not doing it in the new framework) * if the image is present (and valid), load it * turn on ACPI (unless already turned on by the BIOS, that is) * execute the _BFS global control method * execute the _WAK global control method * continue Here, the first two things should be done by the image-loading kernel, but the remaining operations have to be carried out by the restored kernel. It doesn't seem like a problem for that to be the case, but out of curiosity why do those methods need to be executed by the restored kernel, rather than the image loading kernel. Do they require some information from ACPI-related kernel data structures that were populated by the normal ACPI initialization? [snip] ... we can't return to the hibernated kernel unless we are going to cancel the hibernation. I agree. That's why I think that for the suspend-to-both the image-saving kernel will need to support the same set of devices as the hibernated kernel. If all of the devices that the image writing kernel doesn't know about have already been shut down/powered off by the hibernated kernel, then does the image writing kernel still need to know about them in order to suspend to RAM properly (i.e. without leaving some devices on wasting power)? -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] this is where we disagree. why not? if all that the hibernated kernel does is to suspend-to-ram and makes no changes to disks or TCP connections anything that it does do would be lost if power were to fail and you instead did a restore from disk. It would be okay to switch the hibernated kernel in order to e.g. initiate a suspend to ram provided that everything is done atomically with interrupts off, for instance. It is not clear, though, that it is possible to suspend to ram atomically like that. There is also the question of what state the devices will be in when switching back from the save image kernel to the hibernated kernel. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] * figure out which devices can wake up * put devices into low power states (wake-up devices are placed in the Dx states compatible with the wake capability, the others are powered off) this can't be done by the image-saving kernel if that kernel doesn't know about the device. The image-saving kernel can be made to know about all of the wake up devices; all other devices should have already been powered off by the hibernated kernel. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] How do you guarantee that no tasks are scheduled when you get back to the hibernated kernel? just don't schedule any userspace tasks. all you need to do is to execute the ACPI sleep functions. you normally do that after stopping userspace anyway. What does stopping userspace mean? You already said it does not mean disabling interrupts. But using the freezer is also not an option, since the avoidance of that is the main reason for the kexec approach in the first place. [snip] Well, not exactly. If your battery runs out of power while you're suspended, but you have the image saved, it's still better to restore from the image, even if something may not work correctly after the restore, than to risk a loss of data. if things don't work correctly you are still risking the loss of data, the user just doesn't know it. It should be possible on any system to do a hibernate followed by a shutdown (and then resume properly, without any problems). Thus, for handling suspend to both, you resume as if the system had been shutdown, rather than resuming as if the system came from S4. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] There are NO ACPI LIMITS! There only are things that you need to implement if you're going to support ACPI, but they need not be used ALWAYS, no? yes there are limits. the fact that you can't remove the battery in S4 mode without messing things up is a limit, You won't mess things up as long as the resuming kernel knows that it should resume as if the system were shutdown, rather than sent to S4 state. Maybe it is even possible to detect what type of resuming is needed automatically. Similarly, booting another OS shouldn't be a problem, except that if you do it without powering off the system first, some devices might not work under the other OS if the other OS doesn't initialize them properly. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] >> Isn't is possible to avoid this problem by mounting an ext3 filesystem >> as readonly ext2? Provided the filesystem isn't dirty it should be >> doable. (And provided the filesystem doesn't use any ext3 extensions >> that are incompatible with ext2.) > from the last discussion I saw on the kernel mailing list, no. the act of > mounting the ext3 filesystem as ext2 read-only will change it as the > unsupported > extentions get turned off (and I think the journal contents at least are lost > as > part of this) The fact of the matter is that it really doesn't matter whether mounting it read-only actually corrupts the data on disk or not. Regardless, it should not be done, because you are accessing a dirty filesystem that is still in use, and consequently there are no guarantees that either the metadata or the file contents are consistent. It isn't necessary for hibernation to be able to access mounted partitions anyway. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation considerations
[EMAIL PROTECTED] writes: [snip] Isn't is possible to avoid this problem by mounting an ext3 filesystem as readonly ext2? Provided the filesystem isn't dirty it should be doable. (And provided the filesystem doesn't use any ext3 extensions that are incompatible with ext2.) from the last discussion I saw on the kernel mailing list, no. the act of mounting the ext3 filesystem as ext2 read-only will change it as the unsupported extentions get turned off (and I think the journal contents at least are lost as part of this) The fact of the matter is that it really doesn't matter whether mounting it read-only actually corrupts the data on disk or not. Regardless, it should not be done, because you are accessing a dirty filesystem that is still in use, and consequently there are no guarantees that either the metadata or the file contents are consistent. It isn't necessary for hibernation to be able to access mounted partitions anyway. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] > Okay, I have thought it through and I think that, as an initial step, we can > do > something like this: > - preload the image-saving kernel before hibernation > - in the hibernation code path replace device_suspend() with the shutting down > of > all devices without unregistering them (not very nice, but should be > sufficient > for a while) It seems that the effect of what is done by the current hibernate implementations is to shutdown all of the devices, but according to kernel data structures, have it look like the devices were merely suspended (i.e. device_suspend). Then in the resume path, the "restore image" kernel also calls device_suspend just before jumping to the hibernated kernel, so all of the devices the "restore image" kernel knew about are in the state device_suspend expects them to be in, except that they were actually suspended by a different kernel, so they might not be in quite the right state. There is also the issue that the "restore image" kernel might not know about all of the devices; for instance, if USB support is modular, and, as is likely to be the case, the user didn't load the USB modules in the "restore image" kernel from an initrd or something, then the USB devices will actually be powered off, rather than "suspended". Despite these apparent discrepancies, it seems that for many devices (I'm not sure USB devices are included, though), device_restore happens to do the right thing so that the device is placed back in the state it needs to be so that the driver can begin talking to it as it did before, and the device is recognized as the same device as was there before (since otherwise mounted filesystems backed by block devices that came back as a different device would cause great havoc). Since I recall there being issues with USB devices being recognized as the same devices post-hibernate-resume, without looking at the code I'm inclined to believe that the USB drivers still don't end up resuming from hibernate correctly. Note that I am describing what is done currently, not what is planned to be done (i.e. change device_suspend to quiesce and device_resume to unquiesce). It seems that ironically, despite everyone believing that device_suspend/device_resume is incorrect for hibernate, many of the things that those functions do (like saving the PCI configuration, perhaps, and then restoring it later, or re-initializing the device) are actually necessary, especially for modular drivers that won't be loaded by the "restore image" kernel. What needs to be done is for the devices to be shut down (or possibly just quiesced for a select few, but we won't worry about that complication until later; in the case of the current implementations, they should all be quiesced rather than shut down), but whatever information that will be needed later to reinitialize the device (ideally the reinitialization should be able to handle the device either being in a quiesced state or completely off) and recognize it as the same device must be saved. This probably means they cannot be "unregistered", as otherwise there would be nothing with which to associate the saved information. The resume path needs to use the saved state to reinitialize the device and recognize it as the same device. It seems that the existing reprobing code may not be sufficient for this. Note that exactly the same thing must be done on resume for both the current hibernate implementations and the kexec approach. It seems that properly restoring the devices should be relatively easy for the devices that already get this correct, like IDE devices and basic PCI devices (and SATA and SCSI devices as well perhaps?), and possibly harder for Firewire or USB devices. > - when we've called device_power_down() and save_processor_state(), jump to > the image-saving kernel and let it run > - make the image-saving kernel set up everything, save the image without > starting any user space (we may use the existing image-saving code for this > purpose, with some modifications) and power off the system (or make it enter > S4) I suppose this has the advantage of not requiring that a kernel-to-userspace interface be created for this purpose. > - use the existing restoration code to load the image and jump to the > hibernated kernel This would again avoid the need for a separate userspace-kernelspace interface for the purpose, so I agree it could be a useful thing to do initially. > - in the restore code patch replace device_resume() with the reprobing of all > devices. See my comments above. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes: [snip] > Not necessarily. If we don't put devices into low power states before > creating > the image, that should work just fine (quiesce devices, create the image or > kexec the new kernel, reprobe devices, save the image, suspend to RAM, > resume from RAM, continue - or restore from the image if power failed in the > meantime). Still, for this purpose, both kernels need to be able to handle > the > same set of devices. I don't know much about the suspend to RAM, but it seems that it would indeed be necessary to have a device driver for a device in order to switch it from e.g. a quiesced state to a low power state. If, however, the original kernel already completely turned off the device, then it seems that the "save image" kernel shouldn't have to do anything to it in order to suspend to RAM. The drawback, though, is that since the old kernel would have no way (unless the user tells it) to know which devices should be left quiesced and which should be turned off, it would have to turn them all off, which would mean spinning up and down the disks. On the other hand, being able to build the "save image" kernel with only minimal hardware support could save a significant amount of the time required to boot it. [snip] > No, it can't. For example, it can't access filesystems mounted by the > hibernated kernel, or they may get corrupted after the restore (if they are > journaling, it can't even read from them). That is true, but this also holds for the current hibernate implementations. > Which reminds me of one more issue, which is that the image-saving kernel > won't be able to use these filesystems either, so its modules and user space > will have to be available from somewhere else (like a RAM disk or dedicated > partition). So things get ugly. This is not the issue that it appears to be, though. Under the current hibernate implementations, this very same userspace and set of modules must be available "somewhere else" (i.e. an initrd) because it is needed by the restore path. Note that under the kexec approach, save and restore become rather symmetric operations. > Apart from this, the new kernel's user space cannot blindly modify swap space > that might be in use by the hibernated kernel. But it seems easy enough to swapoff in order to completely free up the swap space. I suppose the disadvantage is that instead of failing cleanly if there is insufficient memory, the OOM killer will be invoked and cause all sorts of havoc. This suggests that it may indeed be important to support "cooperation" with the old kernel on saving the image sooner, rather than later. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernating To Swap Considered Harmful
[EMAIL PROTECTED] (Joseph Fannin) writes: [snip] > Intel Macs use GPT partition tables, which support a huge number > of primary partitions, and so don't support secondary partitions. > 32bit Windows does not support GPT, so PC-style MBR partition tables > must also be used. GPT was designed to coexist with MBR tools, so > this mostly works, but you're limited to the union of supported > features -- 4 primary partitions, no secondaries. There is a very simple solution to this obscure problem: (if I understand correctly, you want to dual boot Mac OS X and Linux (and maybe also Windows?)) use LVM, thus allowing you to have as many volumes as you like in the partition -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernating To Swap Considered Harmful
[EMAIL PROTECTED] (Joseph Fannin) writes: [snip] Intel Macs use GPT partition tables, which support a huge number of primary partitions, and so don't support secondary partitions. 32bit Windows does not support GPT, so PC-style MBR partition tables must also be used. GPT was designed to coexist with MBR tools, so this mostly works, but you're limited to the union of supported features -- 4 primary partitions, no secondaries. There is a very simple solution to this obscure problem: (if I understand correctly, you want to dual boot Mac OS X and Linux (and maybe also Windows?)) use LVM, thus allowing you to have as many volumes as you like in the partition -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] Not necessarily. If we don't put devices into low power states before creating the image, that should work just fine (quiesce devices, create the image or kexec the new kernel, reprobe devices, save the image, suspend to RAM, resume from RAM, continue - or restore from the image if power failed in the meantime). Still, for this purpose, both kernels need to be able to handle the same set of devices. I don't know much about the suspend to RAM, but it seems that it would indeed be necessary to have a device driver for a device in order to switch it from e.g. a quiesced state to a low power state. If, however, the original kernel already completely turned off the device, then it seems that the save image kernel shouldn't have to do anything to it in order to suspend to RAM. The drawback, though, is that since the old kernel would have no way (unless the user tells it) to know which devices should be left quiesced and which should be turned off, it would have to turn them all off, which would mean spinning up and down the disks. On the other hand, being able to build the save image kernel with only minimal hardware support could save a significant amount of the time required to boot it. [snip] No, it can't. For example, it can't access filesystems mounted by the hibernated kernel, or they may get corrupted after the restore (if they are journaling, it can't even read from them). That is true, but this also holds for the current hibernate implementations. Which reminds me of one more issue, which is that the image-saving kernel won't be able to use these filesystems either, so its modules and user space will have to be available from somewhere else (like a RAM disk or dedicated partition). So things get ugly. This is not the issue that it appears to be, though. Under the current hibernate implementations, this very same userspace and set of modules must be available somewhere else (i.e. an initrd) because it is needed by the restore path. Note that under the kexec approach, save and restore become rather symmetric operations. Apart from this, the new kernel's user space cannot blindly modify swap space that might be in use by the hibernated kernel. But it seems easy enough to swapoff in order to completely free up the swap space. I suppose the disadvantage is that instead of failing cleanly if there is insufficient memory, the OOM killer will be invoked and cause all sorts of havoc. This suggests that it may indeed be important to support cooperation with the old kernel on saving the image sooner, rather than later. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
Rafael J. Wysocki [EMAIL PROTECTED] writes: [snip] Okay, I have thought it through and I think that, as an initial step, we can do something like this: - preload the image-saving kernel before hibernation - in the hibernation code path replace device_suspend() with the shutting down of all devices without unregistering them (not very nice, but should be sufficient for a while) It seems that the effect of what is done by the current hibernate implementations is to shutdown all of the devices, but according to kernel data structures, have it look like the devices were merely suspended (i.e. device_suspend). Then in the resume path, the restore image kernel also calls device_suspend just before jumping to the hibernated kernel, so all of the devices the restore image kernel knew about are in the state device_suspend expects them to be in, except that they were actually suspended by a different kernel, so they might not be in quite the right state. There is also the issue that the restore image kernel might not know about all of the devices; for instance, if USB support is modular, and, as is likely to be the case, the user didn't load the USB modules in the restore image kernel from an initrd or something, then the USB devices will actually be powered off, rather than suspended. Despite these apparent discrepancies, it seems that for many devices (I'm not sure USB devices are included, though), device_restore happens to do the right thing so that the device is placed back in the state it needs to be so that the driver can begin talking to it as it did before, and the device is recognized as the same device as was there before (since otherwise mounted filesystems backed by block devices that came back as a different device would cause great havoc). Since I recall there being issues with USB devices being recognized as the same devices post-hibernate-resume, without looking at the code I'm inclined to believe that the USB drivers still don't end up resuming from hibernate correctly. Note that I am describing what is done currently, not what is planned to be done (i.e. change device_suspend to quiesce and device_resume to unquiesce). It seems that ironically, despite everyone believing that device_suspend/device_resume is incorrect for hibernate, many of the things that those functions do (like saving the PCI configuration, perhaps, and then restoring it later, or re-initializing the device) are actually necessary, especially for modular drivers that won't be loaded by the restore image kernel. What needs to be done is for the devices to be shut down (or possibly just quiesced for a select few, but we won't worry about that complication until later; in the case of the current implementations, they should all be quiesced rather than shut down), but whatever information that will be needed later to reinitialize the device (ideally the reinitialization should be able to handle the device either being in a quiesced state or completely off) and recognize it as the same device must be saved. This probably means they cannot be unregistered, as otherwise there would be nothing with which to associate the saved information. The resume path needs to use the saved state to reinitialize the device and recognize it as the same device. It seems that the existing reprobing code may not be sufficient for this. Note that exactly the same thing must be done on resume for both the current hibernate implementations and the kexec approach. It seems that properly restoring the devices should be relatively easy for the devices that already get this correct, like IDE devices and basic PCI devices (and SATA and SCSI devices as well perhaps?), and possibly harder for Firewire or USB devices. - when we've called device_power_down() and save_processor_state(), jump to the image-saving kernel and let it run - make the image-saving kernel set up everything, save the image without starting any user space (we may use the existing image-saving code for this purpose, with some modifications) and power off the system (or make it enter S4) I suppose this has the advantage of not requiring that a kernel-to-userspace interface be created for this purpose. - use the existing restoration code to load the image and jump to the hibernated kernel This would again avoid the need for a separate userspace-kernelspace interface for the purpose, so I agree it could be a useful thing to do initially. - in the restore code patch replace device_resume() with the reprobing of all devices. See my comments above. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi <[EMAIL PROTECTED]> writes: > Mark Lord wrote: >> Jeremy Maitin-Shepard wrote: >> > I'll certainly admit the kexec idea is vaporware currently, > Your idea is starting to become a reality with this thread: > "[PATCH 0/2] Kexec jump: The first step to kexec base hibernation" Someone else pointed out that the idea was actually proposed by Andrew Morton over a year ago, but it didn't get very much consideration then. It is good to see that quite a few people are thinking about it now, though, and that Ying Huang has started writing some code. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
Mark Lord <[EMAIL PROTECTED]> writes: [snip] > Whoops.. wrong half of the script. > For TuxOnIce in 10 seconds, it does this: [snip] I'd argue that for most usage patterns, it doesn't matter all that much how long it takes to hibernate and power off the system. What really matter is that it is extremely reliable, and how fast it takes to resume. The reason for this is as follows: A typical usage pattern of hibernate on a laptop is to shut the lid, causing the system to start to hibernate, and to place the machine in the bag. This is fine, as long as you aren't too rough moving it into the bag, and the hibernation is extremely reliable (i.e. there is no chance that it fails to hibernate, and remains powered on.) Presumably some additional userspace logic could help here, like start beeping loudly if the hibernate fails, or perhaps just initiate a shut down, to avoid the machine overheating in the bag. Note that in this usage pattern, it doesn't matter how long it takes to hibernate, because you don't actually wait for it to finish. The only waiting occurs when you turn it on, and the resume path should be essentially exactly the same under kexec hibernate as with the existing hibernate. Thus, if kexec hibernate improves reliability (as it might, given that it eliminates the need for the freezer), it may be worth the slightly increased hibernate time. I think the actual amount of extra time it will take may be very small; a stripped down kernel may only take a second or two to initialize. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
e the uswsusp kernel code for this purpose. >> >> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can >> >> be hibernate/resume by the normal kernel too. This way, a real >> >> kexec/boot-up is only needed for the first time. >> > >> > I'm not sure what you mean. >> >> he's trying to get fancy again, the best way to speed up the boot of the >> kexec kernel is make it smaller and avoid probing for devices (hotplug >> should NOT be used for normal suspend situations) > Still, I believe that we should do our best to use only one kernel (meaning > one > kernel image) here. It seems that it is not very difficult to make the choice of using a different kernel or not one that the user can make. The only extra thing required to allow a different kernel to be used is to save and restore the text sections. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
for devices (hotplug should NOT be used for normal suspend situations) Still, I believe that we should do our best to use only one kernel (meaning one kernel image) here. It seems that it is not very difficult to make the choice of using a different kernel or not one that the user can make. The only extra thing required to allow a different kernel to be used is to save and restore the text sections. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
Mark Lord [EMAIL PROTECTED] writes: [snip] Whoops.. wrong half of the script. For TuxOnIce in 10 seconds, it does this: [snip] I'd argue that for most usage patterns, it doesn't matter all that much how long it takes to hibernate and power off the system. What really matter is that it is extremely reliable, and how fast it takes to resume. The reason for this is as follows: A typical usage pattern of hibernate on a laptop is to shut the lid, causing the system to start to hibernate, and to place the machine in the bag. This is fine, as long as you aren't too rough moving it into the bag, and the hibernation is extremely reliable (i.e. there is no chance that it fails to hibernate, and remains powered on.) Presumably some additional userspace logic could help here, like start beeping loudly if the hibernate fails, or perhaps just initiate a shut down, to avoid the machine overheating in the bag. Note that in this usage pattern, it doesn't matter how long it takes to hibernate, because you don't actually wait for it to finish. The only waiting occurs when you turn it on, and the resume path should be essentially exactly the same under kexec hibernate as with the existing hibernate. Thus, if kexec hibernate improves reliability (as it might, given that it eliminates the need for the freezer), it may be worth the slightly increased hibernate time. I think the actual amount of extra time it will take may be very small; a stripped down kernel may only take a second or two to initialize. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi [EMAIL PROTECTED] writes: Mark Lord wrote: Jeremy Maitin-Shepard wrote: I'll certainly admit the kexec idea is vaporware currently, Your idea is starting to become a reality with this thread: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Someone else pointed out that the idea was actually proposed by Andrew Morton over a year ago, but it didn't get very much consideration then. It is good to see that quite a few people are thinking about it now, though, and that Ying Huang has started writing some code. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sysrq-t dumps of s2ram/fuse deadlock
Matthew Garrett <[EMAIL PROTECTED]> writes: > On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote: >> Hi! >> >> Can we get them? They are neccessary for debugging 'what in suspend >> calls fuse' problem. And yes, that problem is there even when you >> remove freezer. > I can produce them, but haven't managed to do that in any way that lets > me get them off the system yet. If you can see them, then perhaps you could use a digital camera or just copy the text manually. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Nigel Cunningham <[EMAIL PROTECTED]> writes: [snip] > No other _proper_ solutions have been proposed. Everyone who suggests > removing > the freezer also suggests implementing it all over again. It might be sending > SIGSTOP to everything. It might be shifting the desk chairs around and > creating a completely new kernel context, but they always have the same > goal - stopping the existing activity, and they all come with their own > issues (even if they're not obvious yet because the alternatives are > currently vapourware to one extent or another). I'll certainly admit the kexec idea is vaporware currently, but it does differ in a significant way from freezer-based approaches, such that I don't think it should be referred to as just another implementation of a freezer. Specifically, it doesn't require that the "old kernel" be in a "consistent" state to a greater extent than suspend to ram; it is the case that all of the devices must be quiesced or shut down to some extent, but doing this without races and deadlocks (and without the freezer) is certainly very, very similar to what needs to be done for suspend to ram, which will need to be solved anyway. Unlike the existing hibernate approaches, however, it will not be necessary to use any of the driver infrastructure once switched to the "save image" kernel, and thus it will not matter what locks are held, for instance. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Nigel Cunningham [EMAIL PROTECTED] writes: [snip] No other _proper_ solutions have been proposed. Everyone who suggests removing the freezer also suggests implementing it all over again. It might be sending SIGSTOP to everything. It might be shifting the desk chairs around and creating a completely new kernel context, but they always have the same goal - stopping the existing activity, and they all come with their own issues (even if they're not obvious yet because the alternatives are currently vapourware to one extent or another). I'll certainly admit the kexec idea is vaporware currently, but it does differ in a significant way from freezer-based approaches, such that I don't think it should be referred to as just another implementation of a freezer. Specifically, it doesn't require that the old kernel be in a consistent state to a greater extent than suspend to ram; it is the case that all of the devices must be quiesced or shut down to some extent, but doing this without races and deadlocks (and without the freezer) is certainly very, very similar to what needs to be done for suspend to ram, which will need to be solved anyway. Unlike the existing hibernate approaches, however, it will not be necessary to use any of the driver infrastructure once switched to the save image kernel, and thus it will not matter what locks are held, for instance. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sysrq-t dumps of s2ram/fuse deadlock
Matthew Garrett [EMAIL PROTECTED] writes: On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote: Hi! Can we get them? They are neccessary for debugging 'what in suspend calls fuse' problem. And yes, that problem is there even when you remove freezer. I can produce them, but haven't managed to do that in any way that lets me get them off the system yet. If you can see them, then perhaps you could use a digital camera or just copy the text manually. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi <[EMAIL PROTECTED]> writes: [snip] > Exactly, there may well be overlap between Xen and the kexec hibernate > approach, for which code structures should definitely be leveraged. > And, I wasn't suggesting to use Xen as an HV, which wouldn't really solve > anything, but was trying to point out that there is no need to maintain two > separate kernels, much like Xen, which inlines two modes into the kernel: > host and guest. With relocatable kernels, or by simply using the "backup the first 16 or 64 MB of physical memory" approach, the same kernel image could be used both as the normal kernel as the "save image" kernel. The actual behavior of the system would likely depend on kernel command-line parameters or an initrd, rather than being hard-coded into the kernel image. If it is made a requirement that the same kernel be used, then as is done currently, the text sections need not be touched at all. There is a significant advantage, however, to using a different kernel: unneeded drivers can be compiled out, leading to faster load times. > So kexec really seems the way to go, which mimics the way APM used to do it, > which is known to work flawlessly with minimal OS involvement. Now all that is needed is someone with enough time and interest to implement it. :) -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: > Jeremy Maitin-Shepard wrote: >> I don't know a whole lot about xen, but it seems that one issue with >> this approach is that it requires you run your system under a hypervisor >> at all times, which may introduce some overhead. >> > No, I don't think that's what Al is proposing. The kernel-internal interfaces > we've put in place to make Xen work could be reused to do some of the things > you're talking about. In particular, a kernel running under Xen has to be > able > to deal with non-contiguous physical pages, and reusing the same pagetable > hooks > would allow a kexeced kernel to run happily out of any random assortment of > pages you manage to allocate for it. I suppose that would be an interesting thing to look into. Another possible approach for having the kernel run in non-contiguous memory is to specify a memmap exactly to the kernel on the command-line, as I believe is done for the crashdump kernels currently. It would, of course, require an extremely long and complicated memmap specification in general. I recall reading, though, that even with the relocatable kernel support, there are still significant alignment requirements for loading the kernel. In particular, I seem to recall that it is necessary to load an x86 kernel at maybe a 16MB boundary, and on other platforms the alignment requirements may be even more restrictive. In addition, I recall that the Linux boot procedure on x86 and on some other platforms necessarily uses certain low-address memory, like the first 640K, which must be backed up regardless. For these reasons, it seems that it would be easiest to simply backup the first e.g. 16 or 64 MB of memory, and not have to worry about loading the kernel at a non-standard address and specifying a complicated exact memmap. Someone might prove me wrong, though. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Jeremy Maitin-Shepard wrote: I don't know a whole lot about xen, but it seems that one issue with this approach is that it requires you run your system under a hypervisor at all times, which may introduce some overhead. No, I don't think that's what Al is proposing. The kernel-internal interfaces we've put in place to make Xen work could be reused to do some of the things you're talking about. In particular, a kernel running under Xen has to be able to deal with non-contiguous physical pages, and reusing the same pagetable hooks would allow a kexeced kernel to run happily out of any random assortment of pages you manage to allocate for it. I suppose that would be an interesting thing to look into. Another possible approach for having the kernel run in non-contiguous memory is to specify a memmap exactly to the kernel on the command-line, as I believe is done for the crashdump kernels currently. It would, of course, require an extremely long and complicated memmap specification in general. I recall reading, though, that even with the relocatable kernel support, there are still significant alignment requirements for loading the kernel. In particular, I seem to recall that it is necessary to load an x86 kernel at maybe a 16MB boundary, and on other platforms the alignment requirements may be even more restrictive. In addition, I recall that the Linux boot procedure on x86 and on some other platforms necessarily uses certain low-address memory, like the first 640K, which must be backed up regardless. For these reasons, it seems that it would be easiest to simply backup the first e.g. 16 or 64 MB of memory, and not have to worry about loading the kernel at a non-standard address and specifying a complicated exact memmap. Someone might prove me wrong, though. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi [EMAIL PROTECTED] writes: [snip] Exactly, there may well be overlap between Xen and the kexec hibernate approach, for which code structures should definitely be leveraged. And, I wasn't suggesting to use Xen as an HV, which wouldn't really solve anything, but was trying to point out that there is no need to maintain two separate kernels, much like Xen, which inlines two modes into the kernel: host and guest. With relocatable kernels, or by simply using the backup the first 16 or 64 MB of physical memory approach, the same kernel image could be used both as the normal kernel as the save image kernel. The actual behavior of the system would likely depend on kernel command-line parameters or an initrd, rather than being hard-coded into the kernel image. If it is made a requirement that the same kernel be used, then as is done currently, the text sections need not be touched at all. There is a significant advantage, however, to using a different kernel: unneeded drivers can be compiled out, leading to faster load times. So kexec really seems the way to go, which mimics the way APM used to do it, which is known to work flawlessly with minimal OS involvement. Now all that is needed is someone with enough time and interest to implement it. :) -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi <[EMAIL PROTECTED]> writes: [snip] > Who said we need two kernels? You could inline it like Xen, which would give > you one kernel with two modes: normal and hibernate. I don't know a whole lot about xen, but it seems that one issue with this approach is that it requires you run your system under a hypervisor at all times, which may introduce some overhead. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks
Alan Stern <[EMAIL PROTECTED]> writes: > On Mon, 9 Jul 2007, Jeremy Maitin-Shepard wrote: >> Pavel Machek <[EMAIL PROTECTED]> writes: >> >> [snip] >> >> > I don't know how to do that mechanism... but if we knew where to trap >> > filesystem writes, we could simply freeze at that point, and at that >> > point only, no? >> >> Any operation at all that has an external effect must not occur after >> the snapshot is made; otherwise, there will be random hard-to-find >> corruptions and other problems occurring as a result. Thus, for >> example, any writes (either directly or indirectly through e.g. a >> filesystem) to non-volatile storage, any network traffic, any >> communication with hardware like a printer must be prevented after the >> snapshot. > You have forgotten one critical point: The writes to save the snapshot > image must be allowed. That's what makes it really hard. Well, I didn't forget about that, although my language may have been a bit ambiguous. I was referring only to the operations that are done by normal (i.e. non-hibernate) portions of the system and which are not explicitly for the purpose of hibernating the system. It is very difficult to maintain this guarantee while also attempting to reuse the same infrastructure that is supposed to not be processing any "normal" requests in order to write the snapshot. The kdump approach handily avoids this problem by *not* reusing the same infrastructure while still allowing complete flexibility (i.e. not depending on a drivers/suspend/ide-simple). -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Oliver Neukum <[EMAIL PROTECTED]> writes: > Am Montag, 9. Juli 2007 schrieb Jeremy Maitin-Shepard: >> Oliver Neukum <[EMAIL PROTECTED]> writes: >> >> [snip] >> >> > Hm, once the new kernel is booted, this decision is irrevocable, isn't it? >> > Is there any way to deal with errors by handing control back? >> >> Returning to the old kernel can be done by telling drivers to set the >> hardware to the appropriate state, then copying the backed up memory >> back to the beginning of physical memory, and finally jumping to the old >> kernel. It would be much like what is done to resume from hibernation. > If you can do that, why load a new kernel image? The challenges in doing that are analogous to the challenges in suspending to RAM, for which it has been agreed that drivers should be fixed such that the freezer is not necessary. The hard part of hibernate is not creating the snapshot; rather, the hard part is writing the snapshot, and allowing the user some flexibility in how and where the snapshot is written. The kdump approach allows complete flexibility in writing the snapshot (essentially any kernel or user space facility can be used), while not interfering at all with the snapshot state. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design
Pavel Machek <[EMAIL PROTECTED]> writes: [snip] >> why do you say that neither would work for the "lets hibernate my >> notebook" case? > Both would work. One would eat 8-64MB of your RAM, permanently; As I have stated in other messages, the kdump approach would not waste any RAM permanently. The reason that kdump must reserve memory at boot is that on panic, it cannot attempt to nicely stop drivers, and consequently there might be ongoing DMAs that could clobber anything but the reserved area; this reason does not apply to hibernate, though. I'll quote a previous message in which I stated a solution that can be used: Immediately before jumping to the new kernel, the first X bytes (where X is the amount of memory the new kernel will get, typically 16MB or 64MB) of physical memory are backed up into the arbitrary discontiguous pages that are made available. This will not take very long, because copying even 64MB of memory is extremely fast. Then the new kernel is free to use the first X bytes of contiguous physical memory. Problem solved. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Oliver Neukum <[EMAIL PROTECTED]> writes: [snip] > Hm, once the new kernel is booted, this decision is irrevocable, isn't it? > Is there any way to deal with errors by handing control back? Returning to the old kernel can be done by telling drivers to set the hardware to the appropriate state, then copying the backed up memory back to the beginning of physical memory, and finally jumping to the old kernel. It would be much like what is done to resume from hibernation. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Oliver Neukum [EMAIL PROTECTED] writes: [snip] Hm, once the new kernel is booted, this decision is irrevocable, isn't it? Is there any way to deal with errors by handing control back? Returning to the old kernel can be done by telling drivers to set the hardware to the appropriate state, then copying the backed up memory back to the beginning of physical memory, and finally jumping to the old kernel. It would be much like what is done to resume from hibernation. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hibernation/snapshot design
Pavel Machek [EMAIL PROTECTED] writes: [snip] why do you say that neither would work for the lets hibernate my notebook case? Both would work. One would eat 8-64MB of your RAM, permanently; As I have stated in other messages, the kdump approach would not waste any RAM permanently. The reason that kdump must reserve memory at boot is that on panic, it cannot attempt to nicely stop drivers, and consequently there might be ongoing DMAs that could clobber anything but the reserved area; this reason does not apply to hibernate, though. I'll quote a previous message in which I stated a solution that can be used: Immediately before jumping to the new kernel, the first X bytes (where X is the amount of memory the new kernel will get, typically 16MB or 64MB) of physical memory are backed up into the arbitrary discontiguous pages that are made available. This will not take very long, because copying even 64MB of memory is extremely fast. Then the new kernel is free to use the first X bytes of contiguous physical memory. Problem solved. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Oliver Neukum [EMAIL PROTECTED] writes: Am Montag, 9. Juli 2007 schrieb Jeremy Maitin-Shepard: Oliver Neukum [EMAIL PROTECTED] writes: [snip] Hm, once the new kernel is booted, this decision is irrevocable, isn't it? Is there any way to deal with errors by handing control back? Returning to the old kernel can be done by telling drivers to set the hardware to the appropriate state, then copying the backed up memory back to the beginning of physical memory, and finally jumping to the old kernel. It would be much like what is done to resume from hibernation. If you can do that, why load a new kernel image? The challenges in doing that are analogous to the challenges in suspending to RAM, for which it has been agreed that drivers should be fixed such that the freezer is not necessary. The hard part of hibernate is not creating the snapshot; rather, the hard part is writing the snapshot, and allowing the user some flexibility in how and where the snapshot is written. The kdump approach allows complete flexibility in writing the snapshot (essentially any kernel or user space facility can be used), while not interfering at all with the snapshot state. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks
Alan Stern [EMAIL PROTECTED] writes: On Mon, 9 Jul 2007, Jeremy Maitin-Shepard wrote: Pavel Machek [EMAIL PROTECTED] writes: [snip] I don't know how to do that mechanism... but if we knew where to trap filesystem writes, we could simply freeze at that point, and at that point only, no? Any operation at all that has an external effect must not occur after the snapshot is made; otherwise, there will be random hard-to-find corruptions and other problems occurring as a result. Thus, for example, any writes (either directly or indirectly through e.g. a filesystem) to non-volatile storage, any network traffic, any communication with hardware like a printer must be prevented after the snapshot. You have forgotten one critical point: The writes to save the snapshot image must be allowed. That's what makes it really hard. Well, I didn't forget about that, although my language may have been a bit ambiguous. I was referring only to the operations that are done by normal (i.e. non-hibernate) portions of the system and which are not explicitly for the purpose of hibernating the system. It is very difficult to maintain this guarantee while also attempting to reuse the same infrastructure that is supposed to not be processing any normal requests in order to write the snapshot. The kdump approach handily avoids this problem by *not* reusing the same infrastructure while still allowing complete flexibility (i.e. not depending on a drivers/suspend/ide-simple). -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi [EMAIL PROTECTED] writes: [snip] Who said we need two kernels? You could inline it like Xen, which would give you one kernel with two modes: normal and hibernate. I don't know a whole lot about xen, but it seems that one issue with this approach is that it requires you run your system under a hypervisor at all times, which may introduce some overhead. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Nick Piggin <[EMAIL PROTECTED]> writes: > Jeremy Maitin-Shepard wrote: >> Nick Piggin <[EMAIL PROTECTED]> writes: >>> This is the Morton method, isn't it? :) I remember it sounding like a >>> very good idea when he brought it up, but I can't remember the details >>> of why it was rejected or what the problems were. >> >> >> Perhaps he did bring it up before I did. Please forward me a link to >> the thread or other reference if you can find it, as I'd be interested >> in reading it. > Sent in the next mail. Thanks. I've started reading over the thread. >>> I suspect that freeing memory on the fly for the new kernel >>> would be non-trivial (but possible), however simply having a reserve >>> RAM region for the new kernel would be fine for a first step. >> >> >> Freeing memory on the fly should be extremely easy for the kernel (this >> is precisely what it does when it needs to satisfy an allocation). Note >> that the memory allocated need not be contiguous. > Yes, I have a rough idea about how page reclaim works. But I just > mean it would not be trivial to load the new kernel into physically > discontiguous memory. Possible of course, but I don't think kexec or > the setup code could quite cope ATM. It would indeed be a pain for the new kernel to be loaded and have to use discontiguous memory. The trick is, though, that this is not necessary. Immediately before jumping to the new kernel, the first X bytes (where X is the amount of memory the new kernel will get, typically 16MB or 64MB) of physical memory are backed up into the arbitrary discontiguous pages that are made available. This will not take very long, because copying even 64MB of memory is extremely fast. Then the new kernel is free to use the first X bytes of contiguous physical memory. Problem solved. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Nick Piggin <[EMAIL PROTECTED]> writes: > Jeremy Maitin-Shepard wrote: >> Al Boldi <[EMAIL PROTECTED]> writes: >> >> >>> Pavel Machek wrote: >>> >>>> We are stuck with refrigerator for now, and at least for hibernation, >>>> I don't see any feasible alternative. >> >> >>> Feasible alternative? >> >> >> I posted such an alternative to the list a short time ago: hibenrating >> from a *new* kernel space/user space that is created by loading a new >> kernel in a manner similar to what is done for kexec crashdumps. Unlike >> kexec crashdumps, however, it would not require reserving any memory at >> boot, because the necessary memory (maybe 16MB or 64MB) can be freed >> just before hibernating, and device drivers can be properly stopped so >> that DMAs don't stomp over certain memory. > This is the Morton method, isn't it? :) I remember it sounding like a > very good idea when he brought it up, but I can't remember the details > of why it was rejected or what the problems were. Perhaps he did bring it up before I did. Please forward me a link to the thread or other reference if you can find it, as I'd be interested in reading it. >> This approach eliminates the need for the freezer, as it would make >> hibernate look a lot a bit like suspend to ram from the perspective of >> the "old" kernel (the kernel being hibernated), as the hibernate >> operation itself would be completely atomic from the perspective of the >> "old" kernel. That is not to say, of course, that any code paths would >> actually be shared, or that the drivers would do the same things >> (because they probably would not). > Well it basically is suspend to RAM with the additional step that a > new kernel gets booted and writes out the data from RAM to disk then > shuts down. There is the key difference, though, that the drivers should do rather different things. In particular, rather than place the hardware in a low-power mode, it should place it in some state such that the new kernel being loaded can handle it. > I suspect that freeing memory on the fly for the new kernel > would be non-trivial (but possible), however simply having a reserve > RAM region for the new kernel would be fine for a first step. Freeing memory on the fly should be extremely easy for the kernel (this is precisely what it does when it needs to satisfy an allocation). Note that the memory allocated need not be contiguous. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks
Pavel Machek <[EMAIL PROTECTED]> writes: [snip] > I don't know how to do that mechanism... but if we knew where to trap > filesystem writes, we could simply freeze at that point, and at that > point only, no? Any operation at all that has an external effect must not occur after the snapshot is made; otherwise, there will be random hard-to-find corruptions and other problems occurring as a result. Thus, for example, any writes (either directly or indirectly through e.g. a filesystem) to non-volatile storage, any network traffic, any communication with hardware like a printer must be prevented after the snapshot. It seems, though, that in general the kernel will have no way to know which operations are safe, and which are not safe. (This is why the whole "proper filesystem snapshot support is the solution" argument is bogus.) -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi <[EMAIL PROTECTED]> writes: > Pavel Machek wrote: >> We are stuck with refrigerator for now, and at least for hibernation, >> I don't see any feasible alternative. > Feasible alternative? I posted such an alternative to the list a short time ago: hibenrating from a *new* kernel space/user space that is created by loading a new kernel in a manner similar to what is done for kexec crashdumps. Unlike kexec crashdumps, however, it would not require reserving any memory at boot, because the necessary memory (maybe 16MB or 64MB) can be freed just before hibernating, and device drivers can be properly stopped so that DMAs don't stomp over certain memory. This approach eliminates the need for the freezer, as it would make hibernate look a lot a bit like suspend to ram from the perspective of the "old" kernel (the kernel being hibernated), as the hibernate operation itself would be completely atomic from the perspective of the "old" kernel. That is not to say, of course, that any code paths would actually be shared, or that the drivers would do the same things (because they probably would not). [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Al Boldi [EMAIL PROTECTED] writes: Pavel Machek wrote: We are stuck with refrigerator for now, and at least for hibernation, I don't see any feasible alternative. Feasible alternative? I posted such an alternative to the list a short time ago: hibenrating from a *new* kernel space/user space that is created by loading a new kernel in a manner similar to what is done for kexec crashdumps. Unlike kexec crashdumps, however, it would not require reserving any memory at boot, because the necessary memory (maybe 16MB or 64MB) can be freed just before hibernating, and device drivers can be properly stopped so that DMAs don't stomp over certain memory. This approach eliminates the need for the freezer, as it would make hibernate look a lot a bit like suspend to ram from the perspective of the old kernel (the kernel being hibernated), as the hibernate operation itself would be completely atomic from the perspective of the old kernel. That is not to say, of course, that any code paths would actually be shared, or that the drivers would do the same things (because they probably would not). [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks
Pavel Machek [EMAIL PROTECTED] writes: [snip] I don't know how to do that mechanism... but if we knew where to trap filesystem writes, we could simply freeze at that point, and at that point only, no? Any operation at all that has an external effect must not occur after the snapshot is made; otherwise, there will be random hard-to-find corruptions and other problems occurring as a result. Thus, for example, any writes (either directly or indirectly through e.g. a filesystem) to non-volatile storage, any network traffic, any communication with hardware like a printer must be prevented after the snapshot. It seems, though, that in general the kernel will have no way to know which operations are safe, and which are not safe. (This is why the whole proper filesystem snapshot support is the solution argument is bogus.) -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Nick Piggin [EMAIL PROTECTED] writes: Jeremy Maitin-Shepard wrote: Al Boldi [EMAIL PROTECTED] writes: Pavel Machek wrote: We are stuck with refrigerator for now, and at least for hibernation, I don't see any feasible alternative. Feasible alternative? I posted such an alternative to the list a short time ago: hibenrating from a *new* kernel space/user space that is created by loading a new kernel in a manner similar to what is done for kexec crashdumps. Unlike kexec crashdumps, however, it would not require reserving any memory at boot, because the necessary memory (maybe 16MB or 64MB) can be freed just before hibernating, and device drivers can be properly stopped so that DMAs don't stomp over certain memory. This is the Morton method, isn't it? :) I remember it sounding like a very good idea when he brought it up, but I can't remember the details of why it was rejected or what the problems were. Perhaps he did bring it up before I did. Please forward me a link to the thread or other reference if you can find it, as I'd be interested in reading it. This approach eliminates the need for the freezer, as it would make hibernate look a lot a bit like suspend to ram from the perspective of the old kernel (the kernel being hibernated), as the hibernate operation itself would be completely atomic from the perspective of the old kernel. That is not to say, of course, that any code paths would actually be shared, or that the drivers would do the same things (because they probably would not). Well it basically is suspend to RAM with the additional step that a new kernel gets booted and writes out the data from RAM to disk then shuts down. There is the key difference, though, that the drivers should do rather different things. In particular, rather than place the hardware in a low-power mode, it should place it in some state such that the new kernel being loaded can handle it. I suspect that freeing memory on the fly for the new kernel would be non-trivial (but possible), however simply having a reserve RAM region for the new kernel would be fine for a first step. Freeing memory on the fly should be extremely easy for the kernel (this is precisely what it does when it needs to satisfy an allocation). Note that the memory allocated need not be contiguous. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hibernation Redesign
Nick Piggin [EMAIL PROTECTED] writes: Jeremy Maitin-Shepard wrote: Nick Piggin [EMAIL PROTECTED] writes: This is the Morton method, isn't it? :) I remember it sounding like a very good idea when he brought it up, but I can't remember the details of why it was rejected or what the problems were. Perhaps he did bring it up before I did. Please forward me a link to the thread or other reference if you can find it, as I'd be interested in reading it. Sent in the next mail. Thanks. I've started reading over the thread. I suspect that freeing memory on the fly for the new kernel would be non-trivial (but possible), however simply having a reserve RAM region for the new kernel would be fine for a first step. Freeing memory on the fly should be extremely easy for the kernel (this is precisely what it does when it needs to satisfy an allocation). Note that the memory allocated need not be contiguous. Yes, I have a rough idea about how page reclaim works. But I just mean it would not be trivial to load the new kernel into physically discontiguous memory. Possible of course, but I don't think kexec or the setup code could quite cope ATM. It would indeed be a pain for the new kernel to be loaded and have to use discontiguous memory. The trick is, though, that this is not necessary. Immediately before jumping to the new kernel, the first X bytes (where X is the amount of memory the new kernel will get, typically 16MB or 64MB) of physical memory are backed up into the arbitrary discontiguous pages that are made available. This will not take very long, because copying even 64MB of memory is extremely fast. Then the new kernel is free to use the first X bytes of contiguous physical memory. Problem solved. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes: [snip] > At the end of the day, I stand my ground, the freezer cannot be made > reliable without massive infrastructure changes or giving up on very > useful features such as fuse among others. Besides, it only partially > "hides" the problem of requests going to drivers, thus it's a bad > solutions. I agree that the freezer absolutely should not be used for suspend to ram ("suspend"), since it is unnecessary with properly written drivers, which are important to have anyway. It seems that it is indeed the consensus that it will be phased out sooner or later. It does seem that the current device suspend interface does not tell the drivers enough, since as discussed, they need to know whether to merely block if they receive a request while suspended (as should be done while initiating a suspend to ram), or if they should wake up the device (as should be done if a suspend to ram is not in progress). Clearly these two cases need to be addressed by every driver supporting suspend/resume (but possibly indirectly if the subsystem handles it for them). The current hibernate approach used by all of the existing implementations for Linux seems to depend fundamentally on the freezer, though, in order to actually save the system state. Thus, it will still be necessary to fix all of the issues with the freezer, or adopt an alternate hibernate approach (which is unlikely). Unfortunately, even leaving kernel threads and certain drivers running after the snapshot is taken means that the saved image isn't completely correct, and the freezer cannot help with these issues. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Matthew Garrett <[EMAIL PROTECTED]> writes: > On Thu, Jul 05, 2007 at 04:09:24PM +0200, Rafael J. Wysocki wrote: >> On Thursday, 5 July 2007 15:46, Matthew Garrett wrote: >> > I have a model for STD that avoids the need to freeze the entirity of >> > userspace, but I need to find some more time to flesh it out. >> >> You can just describe it, as far as I'm concerned. :-) [snip: new hibernate idea] I think my kexec-based hibernate idea is simpler and more feasible than this approach, and also avoids the freezer. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Benjamin Herrenschmidt [EMAIL PROTECTED] writes: [snip] At the end of the day, I stand my ground, the freezer cannot be made reliable without massive infrastructure changes or giving up on very useful features such as fuse among others. Besides, it only partially hides the problem of requests going to drivers, thus it's a bad solutions. I agree that the freezer absolutely should not be used for suspend to ram (suspend), since it is unnecessary with properly written drivers, which are important to have anyway. It seems that it is indeed the consensus that it will be phased out sooner or later. It does seem that the current device suspend interface does not tell the drivers enough, since as discussed, they need to know whether to merely block if they receive a request while suspended (as should be done while initiating a suspend to ram), or if they should wake up the device (as should be done if a suspend to ram is not in progress). Clearly these two cases need to be addressed by every driver supporting suspend/resume (but possibly indirectly if the subsystem handles it for them). The current hibernate approach used by all of the existing implementations for Linux seems to depend fundamentally on the freezer, though, in order to actually save the system state. Thus, it will still be necessary to fix all of the issues with the freezer, or adopt an alternate hibernate approach (which is unlikely). Unfortunately, even leaving kernel threads and certain drivers running after the snapshot is taken means that the saved image isn't completely correct, and the freezer cannot help with these issues. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway
Matthew Garrett [EMAIL PROTECTED] writes: On Thu, Jul 05, 2007 at 04:09:24PM +0200, Rafael J. Wysocki wrote: On Thursday, 5 July 2007 15:46, Matthew Garrett wrote: I have a model for STD that avoids the need to freeze the entirity of userspace, but I need to find some more time to flesh it out. You can just describe it, as far as I'm concerned. :-) [snip: new hibernate idea] I think my kexec-based hibernate idea is simpler and more feasible than this approach, and also avoids the freezer. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3
Carlo Wood <[EMAIL PROTECTED]> writes: > On Thu, Jun 14, 2007 at 01:09:46PM -0700, Linus Torvalds wrote: >> I'm the original author, and I selected the GPLv2 for Linux. > [...] >> I'm not going to bother discussing this any more. You don't seem to >> respect my right to choose the license for my own code. > This is the main reason I dislike GPLwhatever: there is no notion > of "orginal author". You might have written 99% of the code, that > doesn't matter. You have no rights whatsoever once you release > something under the GPL (no more than ANYOne else). You retain the copyright, and in particular the right to relicense. Only if you make the mistake of including the "or any later version" phrase do you allow others to redistribute the work under a different version of the GPL. Although this provision may seem slightly convenient to authors, its effect is to grant a very large amount of relicensing permission to the FSF. It almost certainly doesn't make sense to place that much trust in a single organization. > The GPL is nice for the community, and for the users - but very, > very bad towards it's authors (taking all and every right you might > have). If John Doe wants to re-release the whole kernel under > GPLv3, then all he needs is a website and some bandwidth. Well, he also needs one tiny little extra thing: the permission of every copyright holder in Linux. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3
Alexandre Oliva <[EMAIL PROTECTED]> writes: > On Jun 14, 2007, Linus Torvalds <[EMAIL PROTECTED]> wrote: >> On Thu, 14 Jun 2007, Alexandre Oliva wrote: >>> >>> Hmm... So, if someone takes one of the many GPLv2+ contributions and >>> makes improvements under GPLv3+, you're going to make an effort to >>> accept them, rather than rejecting them because they're under the >>> GPLv3? >> You *cannot* make GPLv3-only contributions to the kernel. > I can make improvements to GPLv2+ files under GPLv3 (or rather will, > after GPLv3 is published). You can do that, but you won't be able to distribute those changes along with the rest of the kernel. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3
Alexandre Oliva [EMAIL PROTECTED] writes: On Jun 14, 2007, Linus Torvalds [EMAIL PROTECTED] wrote: On Thu, 14 Jun 2007, Alexandre Oliva wrote: Hmm... So, if someone takes one of the many GPLv2+ contributions and makes improvements under GPLv3+, you're going to make an effort to accept them, rather than rejecting them because they're under the GPLv3? You *cannot* make GPLv3-only contributions to the kernel. I can make improvements to GPLv2+ files under GPLv3 (or rather will, after GPLv3 is published). You can do that, but you won't be able to distribute those changes along with the rest of the kernel. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3
Carlo Wood [EMAIL PROTECTED] writes: On Thu, Jun 14, 2007 at 01:09:46PM -0700, Linus Torvalds wrote: I'm the original author, and I selected the GPLv2 for Linux. [...] I'm not going to bother discussing this any more. You don't seem to respect my right to choose the license for my own code. This is the main reason I dislike GPLwhatever: there is no notion of orginal author. You might have written 99% of the code, that doesn't matter. You have no rights whatsoever once you release something under the GPL (no more than ANYOne else). You retain the copyright, and in particular the right to relicense. Only if you make the mistake of including the or any later version phrase do you allow others to redistribute the work under a different version of the GPL. Although this provision may seem slightly convenient to authors, its effect is to grant a very large amount of relicensing permission to the FSF. It almost certainly doesn't make sense to place that much trust in a single organization. The GPL is nice for the community, and for the users - but very, very bad towards it's authors (taking all and every right you might have). If John Doe wants to re-release the whole kernel under GPLv3, then all he needs is a website and some bandwidth. Well, he also needs one tiny little extra thing: the permission of every copyright holder in Linux. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Xavier Bestel <[EMAIL PROTECTED]> writes: [snip] > If I were helping you coding I'd suggest to only concentrate on having > your project work on standard filesystems, and then when it works maybe > think about suspending on crypto-over-loop-over-fuse-over-vpn-over-wifi. > But talk is cheap so I'm shutting up. Right now. :) Well, the whole idea of the kexec approach is that the hibernate system doesn't need to know anything at all about filesystems or any particular device. So if it works at all, it will work for crypto-over-loop-over-fuse-over-vpn-over-wifi -over-pigeon-carrier-protocol-over-printer-and-scanner. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Xavier Bestel <[EMAIL PROTECTED]> writes: > On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote: >> >> You might claim then that the solution is to simply keep the network >> >> driver quiesced or stopped. But then it is impossible to write the >> >> image over the network. The way to get around this problem is to write >> >> the image over the network using a fresh network stack. >> >> > Or teach the driver stack about the difference/reset it. Remember that >> > even if you get a fresh network stack, you'll still be getting packets >> > for the old stack. Getting a new ip (assuming one is available) won't >> > stop other connections getting killed, either because we send resets >> > from the kexec'd kernel, or because they timeout looking for the old >> > ip. >> >> I could be mistaken, but I think that bringing up the network interface >> with a different IP address would prevent it from reseting existing TCP >> connections, because it would never receive the packets for those >> existing connections. > That can't work. There are networks where the client must have a fixed > IP, or must accept the adress given by dhcp in order to talk to > fileservers. And you still have the same mac adress, which may cause > problems. I wasn't suggesting that using a different IP address would be a general solution. It might be a solution for a few people. In general, I'd imagine that most people would not bring up the network interface at all, and most of the people that do would bring it up with the same IP address, causing some existing TCP connections to possibly be reset. I think that causing connections to be reset is, however, far better than acking packets that are then silently thrown away. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Pavel Machek <[EMAIL PROTECTED]> writes: [snip] >> > If _I_ were willing to add some runtime overhead to make hibernation >> > simpler, I'd just use some virtualization to do that... with added >> > advantage of "hibernate here, resume on different hw". >> >> I don't believe there is going to be any runtime overhead. > 64MB less memory seems like runtime overhead for me. If you know how > to do kexec without pre-reserving memory, I believe kexec/kdump team > will be interested. The main reason kdump needs to reserve memory at boot is that it needs to preload the crashdump kernel into memory so that it will be available on panic (and however much memory the crashdump kernel will need to run will also need to be available at all times, since a panic can occur at any time), and also because no attempt is made to shutdown devices on panic, and consequently devices may clobber existing memory with ongoing DMA, so a reserved area of memory must be used by the crashdump kernel. For hibernate via kexec, however, these issues do not exist. The simplest solution would be to simply backup the first say 16MB or 64MB (or however much is desired for the "save" kernel to have) of memory into free pages just before copying the "save" kernel into the desired position and jumping to it. Due to the speed of memory copying, this should not add any significant overhead. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
rrent approach. > [snip] >> To me, it seems a lot easier to get right than the current approaches. > But you can't get what you said you wanted - a fully functional system > with a fully functional userspace isn't possible. You're running a > different kernel and can't safely mount filesystems that were mounted by > the first kernel. You'll have to set up a limited userspace that runs > from some sort of initrd/ramfs and will end up (so far as I can see now) > with similar restrictions to what we have now with uswsusp or suspend2's > userui. (Reads more... oh, I see you said that below :>) Well, it is fully functional in the sense that everything works as advertised. I don't know exactly how uswsusp works, but the kexec approach would have the advantage that you don't have to follow any special rules like: - better not write to the mounted filesystems, or you'll corrupt things - better not try to talk to any other processes, because they're frozen and you'll just hang - better not fork any other processes, because only specially listed processes get to run (maybe this isn't the case, I don't know). Essentially, with the current approaches, you end up with two independent userspaces anyway, but you just try to run them under a single kernel (and really it would be preferable to have two independent kernel spaces as well in the case of certain device drivers, but of course this cannot be done under one kernel, hence the reason for kexec). >> > Moreover, I think it would require some problems that we don't even >> > anticipate to be solved. >> >> Possibly. The alternative, though, seems to be to add hack after hack >> to get certain functionality to work. > As I argued above, both systems involve some degree of 'hack'. Kexec > only seems clean until you release that you wanted some of the context > you just switched away from. (Perhaps see my comments above.) Also, perhaps see the reply to Pavel about the need to reserve memory, which I'm about to write. ;) Please don't take my comments in this thread too harshly. I'm not trying to undermine that work that you and the other hibernate developers have done. I just think this kexec approach is an interesting idea, and I brought it up so that it might get explored. I still don't know if it actually makes sense (although I've managed to mostly convince myself), and discussing it with you and the other hibernate developers helps in figuring that out. If I didn't strongly advocate it, it wouldn't get any thought. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
sort of initrd/ramfs and will end up (so far as I can see now) with similar restrictions to what we have now with uswsusp or suspend2's userui. (Reads more... oh, I see you said that below :) Well, it is fully functional in the sense that everything works as advertised. I don't know exactly how uswsusp works, but the kexec approach would have the advantage that you don't have to follow any special rules like: - better not write to the mounted filesystems, or you'll corrupt things - better not try to talk to any other processes, because they're frozen and you'll just hang - better not fork any other processes, because only specially listed processes get to run (maybe this isn't the case, I don't know). Essentially, with the current approaches, you end up with two independent userspaces anyway, but you just try to run them under a single kernel (and really it would be preferable to have two independent kernel spaces as well in the case of certain device drivers, but of course this cannot be done under one kernel, hence the reason for kexec). Moreover, I think it would require some problems that we don't even anticipate to be solved. Possibly. The alternative, though, seems to be to add hack after hack to get certain functionality to work. As I argued above, both systems involve some degree of 'hack'. Kexec only seems clean until you release that you wanted some of the context you just switched away from. (Perhaps see my comments above.) Also, perhaps see the reply to Pavel about the need to reserve memory, which I'm about to write. ;) Please don't take my comments in this thread too harshly. I'm not trying to undermine that work that you and the other hibernate developers have done. I just think this kexec approach is an interesting idea, and I brought it up so that it might get explored. I still don't know if it actually makes sense (although I've managed to mostly convince myself), and discussing it with you and the other hibernate developers helps in figuring that out. If I didn't strongly advocate it, it wouldn't get any thought. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Pavel Machek [EMAIL PROTECTED] writes: [snip] If _I_ were willing to add some runtime overhead to make hibernation simpler, I'd just use some virtualization to do that... with added advantage of hibernate here, resume on different hw. I don't believe there is going to be any runtime overhead. 64MB less memory seems like runtime overhead for me. If you know how to do kexec without pre-reserving memory, I believe kexec/kdump team will be interested. The main reason kdump needs to reserve memory at boot is that it needs to preload the crashdump kernel into memory so that it will be available on panic (and however much memory the crashdump kernel will need to run will also need to be available at all times, since a panic can occur at any time), and also because no attempt is made to shutdown devices on panic, and consequently devices may clobber existing memory with ongoing DMA, so a reserved area of memory must be used by the crashdump kernel. For hibernate via kexec, however, these issues do not exist. The simplest solution would be to simply backup the first say 16MB or 64MB (or however much is desired for the save kernel to have) of memory into free pages just before copying the save kernel into the desired position and jumping to it. Due to the speed of memory copying, this should not add any significant overhead. [snip] -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Xavier Bestel [EMAIL PROTECTED] writes: On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote: You might claim then that the solution is to simply keep the network driver quiesced or stopped. But then it is impossible to write the image over the network. The way to get around this problem is to write the image over the network using a fresh network stack. Or teach the driver stack about the difference/reset it. Remember that even if you get a fresh network stack, you'll still be getting packets for the old stack. Getting a new ip (assuming one is available) won't stop other connections getting killed, either because we send resets from the kexec'd kernel, or because they timeout looking for the old ip. I could be mistaken, but I think that bringing up the network interface with a different IP address would prevent it from reseting existing TCP connections, because it would never receive the packets for those existing connections. That can't work. There are networks where the client must have a fixed IP, or must accept the adress given by dhcp in order to talk to fileservers. And you still have the same mac adress, which may cause problems. I wasn't suggesting that using a different IP address would be a general solution. It might be a solution for a few people. In general, I'd imagine that most people would not bring up the network interface at all, and most of the people that do would bring it up with the same IP address, causing some existing TCP connections to possibly be reset. I think that causing connections to be reset is, however, far better than acking packets that are then silently thrown away. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
Xavier Bestel [EMAIL PROTECTED] writes: [snip] If I were helping you coding I'd suggest to only concentrate on having your project work on standard filesystems, and then when it works maybe think about suspending on crypto-over-loop-over-fuse-over-vpn-over-wifi. But talk is cheap so I'm shutting up. Right now. :) Well, the whole idea of the kexec approach is that the hibernate system doesn't need to know anything at all about filesystems or any particular device. So if it works at all, it will work for crypto-over-loop-over-fuse-over-vpn-over-wifi -over-pigeon-carrier-protocol-over-printer-and-scanner. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote: > [snip] > > You might claim then that the solution is to simply keep the network > > driver quiesced or stopped. But then it is impossible to write the > > image over the network. The way to get around this problem is to write > > the image over the network using a fresh network stack. > > The "fresh network stack" will RST any connections that were going, > which is ugly, too. It will only do this if you bring up the network device with the same IP address in the new kernel (which you would have no reason to do if you don't need to write the image over the network.) Maybe the ideal behavior would be to tell the network stack to just ignore unexpected TCP packets, rather than send RST, while saving or reading the image, but that is probably not necessary for most uses and would be a hack. I also think that sending RST is far better than sending ACK and then silently tossing out the data, which is what is currently done. (Since I believe currently the network devices are brought back up along with all other devices after the atomic copy is made.) Silently losing data is something that should only occur on a crash. This is likely to actually be a somewhat serious problem for servers on which hibernate is used to move the server between rooms without losing connections. Of course, you can get around this by adding a hack to not bring up network devices based on some option or other, but that just solves one specific case with an ugly solution. In contrast, using the kexec approach, the network device or any other device would quite naturally not be brought back up unless it was needed for hibernate, and even if it is brought back up, no data are silently lost. > [snip] > > To me, it seems a lot easier to get right than the current approaches. > > Well, you are certainly welcome to create the patch. "suspend3" name > is still free, AFAICT. I could be sneaky and call it "hibernate". Probably nicer though to use the name "kexec hibernate" to be later simplified to just "hibernate". I was hoping that everyone would like the idea so much that they would rush to implement it, so that I wouldn't have to try. (I haven't written much kernel code before, and I have a number of other time-requiring projects to work on.) It looks like that is not too likely to happen though ;). Maybe I'll try implementing it though, and find that it isn't very much work. It would be very convenient if the current work being done to improve the driver interfaces for hibernate also results in the proper interfaces needed for this approach. It looks like the resume path should be exactly the same with this approach as with the existing approaches, but the hibernate path is not exactly the same. In particular, it seems that all devices should be shut down to a greater extent that merely the quiescing neccessary for the current approaches while making an atomic copy, but also they should not be completely shut down to the extent that they cannot be restored to the desired state when resuming or aborting. > > If _I_ were willing to add some runtime overhead to make hibernation > simpler, I'd just use some virtualization to do that... with added > advantage of "hibernate here, resume on different hw". I don't believe there is going to be any runtime overhead. To some extent, (see some of the explanations I gave in the other e-mail I sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be viewed as a cleaner variant of userspace hibernate. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
re. The only real impact would be that the user would need to somehow specify how to access the "save image kernel" and the additional kernel command-line arguments to include. If an initrd is to be used instead of an initramfs, then that would have to be specified as well. I don't think this setup requirement is significantly more taxing than having to specify the path to the user interface program, for instance. > * adding interfaces to tell kexec/dump/whatever what pages need to be > saved and reloaded Any hibernation mechanism needs to know which pages to save. This approach is no different. The "interface" could likely be one of the following: 1. Just before jumping to the new kernel, with interrupts disabled and devices already stopped, the original kernel prepares a list of pages to write somewhere in memory. The old kernel passes the address of this list as a kernel command-line argument to the new kernel. The initramfs or initrd userspace (or the kernel itself, although there would be no advantage in doing this in the kernel) gets this address from the kernel command-line and then reads that list to determine which pages to write. Presumably preparing the list would be a small amount of code, and presumably both suspend2 and the in-kernel swsusp already need to do something like this. 2. The old kernel prepares no new data structures, and simply provides a few pointers as kernel command-line arguments to the new kernel to the existing data structures that describe the pages that are used. The code running under the new kernel responsible for writing the hibernation image simply accesses these data structures using the pointers from the kernel command-line to determine which pages to write. > * adding convolutions in which at resume time we boot one kernel, switch > to another kernel to do the loading and then switch back again to the > resumed kernel (assuming I understand what you're suggesting). This shouldn't actually be necessary. It should be possible to do the resume in exactly the same way the in-kernel swsusp resumes currently (except that userspace could be used to actually load the image into memory, and then tells the kernel to do the necessary manipulations to stop devices, shuffle the pages around so they are in the right positions, and then jump to the resumed kernel). > > It all sounds terribly complicated and confusing to me, and that's > before I even begin to think about how this second kernel could possibly > write the image to an encrypted device or LVM or such like that the > first kernel knows about and might use now. I find in some ways it is much simpler than the current approaches. The "save kernel" has to re-initialize device mapper devices that are needed to write the image in exactly the same way that the resume kernel needs to reinitialize those devices. In fact, it could probably use the very same initramfs/initrd code to do it. The fact that it imposes this symmetry is arguably an advantage. > Can't we just get the freezer right and be done with it? The question is: can the freezer ever be right? As far as I can see, no level of correctness of the freezer is going to allow you to save the hibernation image to something on a fuse filesystem, because essentially any code that is run while writing the image needs to live in an special box that is totally isolated from the rest of the system in order to avoid problems; thus, it seems like it makes sense to implement this box by simply using a separate kernel, rather than adding hacks. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
to include. If an initrd is to be used instead of an initramfs, then that would have to be specified as well. I don't think this setup requirement is significantly more taxing than having to specify the path to the user interface program, for instance. * adding interfaces to tell kexec/dump/whatever what pages need to be saved and reloaded Any hibernation mechanism needs to know which pages to save. This approach is no different. The interface could likely be one of the following: 1. Just before jumping to the new kernel, with interrupts disabled and devices already stopped, the original kernel prepares a list of pages to write somewhere in memory. The old kernel passes the address of this list as a kernel command-line argument to the new kernel. The initramfs or initrd userspace (or the kernel itself, although there would be no advantage in doing this in the kernel) gets this address from the kernel command-line and then reads that list to determine which pages to write. Presumably preparing the list would be a small amount of code, and presumably both suspend2 and the in-kernel swsusp already need to do something like this. 2. The old kernel prepares no new data structures, and simply provides a few pointers as kernel command-line arguments to the new kernel to the existing data structures that describe the pages that are used. The code running under the new kernel responsible for writing the hibernation image simply accesses these data structures using the pointers from the kernel command-line to determine which pages to write. * adding convolutions in which at resume time we boot one kernel, switch to another kernel to do the loading and then switch back again to the resumed kernel (assuming I understand what you're suggesting). This shouldn't actually be necessary. It should be possible to do the resume in exactly the same way the in-kernel swsusp resumes currently (except that userspace could be used to actually load the image into memory, and then tells the kernel to do the necessary manipulations to stop devices, shuffle the pages around so they are in the right positions, and then jump to the resumed kernel). It all sounds terribly complicated and confusing to me, and that's before I even begin to think about how this second kernel could possibly write the image to an encrypted device or LVM or such like that the first kernel knows about and might use now. I find in some ways it is much simpler than the current approaches. The save kernel has to re-initialize device mapper devices that are needed to write the image in exactly the same way that the resume kernel needs to reinitialize those devices. In fact, it could probably use the very same initramfs/initrd code to do it. The fact that it imposes this symmetry is arguably an advantage. Can't we just get the freezer right and be done with it? The question is: can the freezer ever be right? As far as I can see, no level of correctness of the freezer is going to allow you to save the hibernation image to something on a fuse filesystem, because essentially any code that is run while writing the image needs to live in an special box that is totally isolated from the rest of the system in order to avoid problems; thus, it seems like it makes sense to implement this box by simply using a separate kernel, rather than adding hacks. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote: [snip] You might claim then that the solution is to simply keep the network driver quiesced or stopped. But then it is impossible to write the image over the network. The way to get around this problem is to write the image over the network using a fresh network stack. The fresh network stack will RST any connections that were going, which is ugly, too. It will only do this if you bring up the network device with the same IP address in the new kernel (which you would have no reason to do if you don't need to write the image over the network.) Maybe the ideal behavior would be to tell the network stack to just ignore unexpected TCP packets, rather than send RST, while saving or reading the image, but that is probably not necessary for most uses and would be a hack. I also think that sending RST is far better than sending ACK and then silently tossing out the data, which is what is currently done. (Since I believe currently the network devices are brought back up along with all other devices after the atomic copy is made.) Silently losing data is something that should only occur on a crash. This is likely to actually be a somewhat serious problem for servers on which hibernate is used to move the server between rooms without losing connections. Of course, you can get around this by adding a hack to not bring up network devices based on some option or other, but that just solves one specific case with an ugly solution. In contrast, using the kexec approach, the network device or any other device would quite naturally not be brought back up unless it was needed for hibernate, and even if it is brought back up, no data are silently lost. [snip] To me, it seems a lot easier to get right than the current approaches. Well, you are certainly welcome to create the patch. suspend3 name is still free, AFAICT. I could be sneaky and call it hibernate. Probably nicer though to use the name kexec hibernate to be later simplified to just hibernate. I was hoping that everyone would like the idea so much that they would rush to implement it, so that I wouldn't have to try. (I haven't written much kernel code before, and I have a number of other time-requiring projects to work on.) It looks like that is not too likely to happen though ;). Maybe I'll try implementing it though, and find that it isn't very much work. It would be very convenient if the current work being done to improve the driver interfaces for hibernate also results in the proper interfaces needed for this approach. It looks like the resume path should be exactly the same with this approach as with the existing approaches, but the hibernate path is not exactly the same. In particular, it seems that all devices should be shut down to a greater extent that merely the quiescing neccessary for the current approaches while making an atomic copy, but also they should not be completely shut down to the extent that they cannot be restored to the desired state when resuming or aborting. If _I_ were willing to add some runtime overhead to make hibernation simpler, I'd just use some virtualization to do that... with added advantage of hibernate here, resume on different hw. I don't believe there is going to be any runtime overhead. To some extent, (see some of the explanations I gave in the other e-mail I sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be viewed as a cleaner variant of userspace hibernate. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A kexec approach to hibernation
ady in place. :-) I suppose you do that by using more sophisticated logic to atomically copy the pages to their final location after loading them from disk. In particular, I suppose you must order the page copies carefully to avoid clobbering pages that have not yet been copied. Seems reasonable. In that case, there is indeed probably no reason to not use that approach for resuming. [snip] >> The whole reason to want to checkpoint filesystems was so that the >> original kernel would remain a fully-functional system with a >> fully-functional userspace that can continue to access the filesystems >> while the hibernate image is being written. In addition to the lack of >> checkpoint support, however, there are a number of other issues that >> this would create: Even if you can checkpoint filesystems, you can't >> checkpoint the entire world. The kernel will keep acking network >> packets, and userspace as well will send any normal replies. If a >> document was sent off to be printed right before the checkpoint, it >> might end up printing while the image is being saved, and then printed >> again when the system resumes. > That's correct. >> Fundamentally, I don't think checkpointing is the right answer. What is >> desired is a fully functional system with a fully functional userspace >> during the image writing. But we don't want this to be the _same_ >> system that is actually being imaged. >> >> That is why I think the kexec solution is the elegant solution. > Frankly, I think it's tricky. ;-) To me, it seems a lot easier to get right than the current approaches. > Moreover, I think it would require some problems that we don't even > anticipate to be solved. Possibly. The alternative, though, seems to be to add hack after hack to get certain functionality to work. >> > I see two basic advantages of your approach: >> > 1) We don't need to freeze tasks. >> > 2) We can create images larger than 50% of RAM. >> >> There is also the key benefit of allowing an arbitrary userspace in a >> fully functional system to be used to both save and load the image. As >> far as I understand, uswsusp allows a single userspace processes to run >> to handle the loading and saving, but the processes runs in a rather >> fragile userspace with most things disabled; in particular, this >> userspace process can't access a fuse filesystem and probably can't do >> other things like fork. > The user space running on top of the new kernel would be limited by the > fact that the old kernel's filesystems would be inaccessible to it. That > would, effectively, require the user to have special filesystems for the > image-saving kernel and its user space, which isn't very realistic > IMO. Fundamentally, saving of the image can't access any of the normal filesystems anyway. The userspace would likely be provided as an initramfs or initrd, exactly as is done for userspace resume from hibernate currently. The same initramfs could probably be used for both saving the image and restoring the image, since exactly the same procedure would be used to set up the necessary devices for both the save and restore case, and the GUI that is used might also be the same. >> > Still, I don't think we could implement it quickly and easily. >> >> It is hard to say how hard it would be. I think a lot of the existing >> kexec and hibernate code could be leveraged. > Yes, I think so, but at least we need to fix the quiescing of devices before > we think of implementing that. It seems like fixing of device stopping/suspend/quiescing is an orthogonal issue to the actual hibernate implementation. It would probably be most reliable and simplest if on every jump between kernels, all devices are fully stopped by the jumping kernel, and then fully reinitialized by the jumped-to kernel. Presumably the time spent doing this initialization will not be very significant compared to the time required to write the image. -- Jeremy Maitin-Shepard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/