from:"Jeremy Maitin\-Shepard"

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote:
>> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> 
>> > On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
>> >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> >> 
>> >> [snip]
>> >> 
>> >> > The ACPI NVS area is explicitly marked as reserved and we don't save it.
>> >> > On x86_64 we don't save any memory areas marked as reserved and yet the
>> > above
>> >> > happens.
>> >> 
>> >> I think you have mentioned before, though, that ACPI is first
>> >> initialized by the boot kernel, before it is later initialized by
>> >> resuming kernel.  This could well be the source of the problem.
>> 
>> > No, it's not.  I have tested that too with an ACPI-less boot kernel.
>> 
>> Well, it seems that there just must be some other bug.  I would define
>> anything that differs between the post-resume initialization of ACPI

> I'm not sure what you mean.

>> from the normal boot initialization of ACPI as a bug.  If the interaction
>> with the hardware is the same, then the behavior will be the same.

> The ACPI platform firmware is allowed to preserve information accross the
> hibernation-resume cycle, so this need not be the same.

All of my comments related to the case where S4 is not being used
(instead the system is just powered off normally), and a boot kernel
that does not initialize ACPI is used.  In that case, the ACPI platform
firmware should not be able to distinguish a normal boot from a resume
from hibernation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
>> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> 
>> [snip]
>> 
>> > The ACPI NVS area is explicitly marked as reserved and we don't save it.
>> > On x86_64 we don't save any memory areas marked as reserved and yet the
> above
>> > happens.
>> 
>> I think you have mentioned before, though, that ACPI is first
>> initialized by the boot kernel, before it is later initialized by
>> resuming kernel.  This could well be the source of the problem.

> No, it's not.  I have tested that too with an ACPI-less boot kernel.

Well, it seems that there just must be some other bug.  I would define
anything that differs between the post-resume initialization of ACPI from
the normal boot initialization of ACPI as a bug.  If the interaction
with the hardware is the same, then the behavior will be the same.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

> The ACPI NVS area is explicitly marked as reserved and we don't save it.
> On x86_64 we don't save any memory areas marked as reserved and yet the above
> happens.

I think you have mentioned before, though, that ACPI is first
initialized by the boot kernel, before it is later initialized by
resuming kernel.  This could well be the source of the problem.

In particular, isn't it the case that you also switch the devices to low
power mode before resuming?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Friday, 21 September 2007 15:14, huang ying wrote:
>> On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
>> > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
>> > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> [--snip--]
>> > >
>> > > No one has yet attacked the hard problem of coming up with separate
>> > > hibernate methods for drivers.
>> >
>> > Well, I've been playing a bit with that for some time, but it's not easy by
> any
>> > means.
>> >
>> > In short, I'm seeing some problems related to the handling of ACPI that 
>> > seem
> to
>> > shatter the entire idea of having separate hibernate methods, at least as
> far
>> > as ACPI systems are concerned.
>> 
>> So sadly to hear this. Can you details it a little? Or a link?

> Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
> to execute the _PTS ACPI global control method before creating the image _and_
> to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
> system into the sleep state.  In particular, on nx6325, if we don't do that,
> then after the restore the status of the AC power will not be reported
> correctly (and if you replace the battery while in the sleep state, the
> battery status will not be updated correctly after the restore).  Similar
> issues have been reported for other machines.

Suppose that instead of using ACPI S4 state at all, you instead just
power off.  Yes, you'll lose wakeup event functionality, and flashy
LEDs, but doesn't this take care of the problem?  The firmware shouldn't
see the hibernate as anything other than a shutdown and reboot.  ACPI
should be initialized normally when resuming, which should take care of
getting AC power status reported properly.

This should be the behavior, anyway, on the many systems that do not
support S4.

> Now, the ACPI specification requires us to put devices into low power states
> before executing _PTS and that's exactly what we're doing before a suspend to
> RAM.  Thus, it seems that in general we need to do the same for hibernation on
> ACPI systems.

It seems that if ACPI S4 is going to be used, Switching to low power
state is something that should be done only immediately before entering
that state (i.e. after the image has already been saved).  In
particular, it should not be done just before the atomic copy.  It is
true that (during resume) after the atomic copy snapshot is restored,
drivers will need to be prepared (i.e. have saved whatever information
is necessary) to _resume_ devices from the low power state, but that
does not mean they have to actually be put into that low power state
before the copy is made.

I agree that for the kexec implementation there may be additional
issues.  For swsusp, uswsusp, and tuxonice, though, I don't see why
there should be a problem.  I think that, as was recognized before, all
of the issues are resolved by properly considering exactly what each
callback should do and when it should be called.  The problems stem from
ambiguous specifications, or trying to use the same callback for two
different purposes or in two different cases.

Let me know if I'm mistaken.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

 On Friday, 21 September 2007 15:14, huang ying wrote:
 On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
   Nigel Cunningham [EMAIL PROTECTED] writes:
 [--snip--]
  
   No one has yet attacked the hard problem of coming up with separate
   hibernate methods for drivers.
 
  Well, I've been playing a bit with that for some time, but it's not easy by
 any
  means.
 
  In short, I'm seeing some problems related to the handling of ACPI that 
  seem
 to
  shatter the entire idea of having separate hibernate methods, at least as
 far
  as ACPI systems are concerned.
 
 So sadly to hear this. Can you details it a little? Or a link?

 Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
 to execute the _PTS ACPI global control method before creating the image _and_
 to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
 system into the sleep state.  In particular, on nx6325, if we don't do that,
 then after the restore the status of the AC power will not be reported
 correctly (and if you replace the battery while in the sleep state, the
 battery status will not be updated correctly after the restore).  Similar
 issues have been reported for other machines.

Suppose that instead of using ACPI S4 state at all, you instead just
power off.  Yes, you'll lose wakeup event functionality, and flashy
LEDs, but doesn't this take care of the problem?  The firmware shouldn't
see the hibernate as anything other than a shutdown and reboot.  ACPI
should be initialized normally when resuming, which should take care of
getting AC power status reported properly.

This should be the behavior, anyway, on the many systems that do not
support S4.

 Now, the ACPI specification requires us to put devices into low power states
 before executing _PTS and that's exactly what we're doing before a suspend to
 RAM.  Thus, it seems that in general we need to do the same for hibernation on
 ACPI systems.

It seems that if ACPI S4 is going to be used, Switching to low power
state is something that should be done only immediately before entering
that state (i.e. after the image has already been saved).  In
particular, it should not be done just before the atomic copy.  It is
true that (during resume) after the atomic copy snapshot is restored,
drivers will need to be prepared (i.e. have saved whatever information
is necessary) to _resume_ devices from the low power state, but that
does not mean they have to actually be put into that low power state
before the copy is made.

I agree that for the kexec implementation there may be additional
issues.  For swsusp, uswsusp, and tuxonice, though, I don't see why
there should be a problem.  I think that, as was recognized before, all
of the issues are resolved by properly considering exactly what each
callback should do and when it should be called.  The problems stem from
ambiguous specifications, or trying to use the same callback for two
different purposes or in two different cases.

Let me know if I'm mistaken.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 The ACPI NVS area is explicitly marked as reserved and we don't save it.
 On x86_64 we don't save any memory areas marked as reserved and yet the above
 happens.

I think you have mentioned before, though, that ACPI is first
initialized by the boot kernel, before it is later initialized by
resuming kernel.  This could well be the source of the problem.

In particular, isn't it the case that you also switch the devices to low
power mode before resuming?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

 On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
 [snip]
 
  The ACPI NVS area is explicitly marked as reserved and we don't save it.
  On x86_64 we don't save any memory areas marked as reserved and yet the
 above
  happens.
 
 I think you have mentioned before, though, that ACPI is first
 initialized by the boot kernel, before it is later initialized by
 resuming kernel.  This could well be the source of the problem.

 No, it's not.  I have tested that too with an ACPI-less boot kernel.

Well, it seems that there just must be some other bug.  I would define
anything that differs between the post-resume initialization of ACPI from
the normal boot initialization of ACPI as a bug.  If the interaction
with the hardware is the same, then the behavior will be the same.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

 On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
  On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
  Rafael J. Wysocki [EMAIL PROTECTED] writes:
  
  [snip]
  
   The ACPI NVS area is explicitly marked as reserved and we don't save it.
   On x86_64 we don't save any memory areas marked as reserved and yet the
  above
   happens.
  
  I think you have mentioned before, though, that ACPI is first
  initialized by the boot kernel, before it is later initialized by
  resuming kernel.  This could well be the source of the problem.
 
  No, it's not.  I have tested that too with an ACPI-less boot kernel.
 
 Well, it seems that there just must be some other bug.  I would define
 anything that differs between the post-resume initialization of ACPI

 I'm not sure what you mean.

 from the normal boot initialization of ACPI as a bug.  If the interaction
 with the hardware is the same, then the behavior will be the same.

 The ACPI platform firmware is allowed to preserve information accross the
 hibernation-resume cycle, so this need not be the same.

All of my comments related to the case where S4 is not being used
(instead the system is just powered off normally), and a boot kernel
that does not initialize ACPI is used.  In that case, the ACPI platform
firmware should not be able to distinguish a normal boot from a resume
from hibernation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL / MPL license issues.

2007-08-07 Thread Jeremy Maitin-Shepard

Dave Jones <[EMAIL PROTECTED]> writes:

> There are a number of files in the kernel that have in their
> headers a notice that the file is under the Mozilla Public License,
> which alone, is incompatible with the GPL.

> This itself is fine, as long as the resulting code claims
> to be Dual MPL/GPL, however there are a few cases where this
> doesn't seem to be happening.

All of the files that you cite include a notice that they are licensed
under the GPLv2, in addition to the MPL.  There is no reason that
MODULE_LICENSE needs to indicate that some portions of code may also be
available under an alternative license.  Furthermore, for some modules
that contain both code licensed under the GPLv2 exclusively, and code
dual-licensed under both the GPLv2 and the MPL, it would be incorrect to
state that the combined work is dual-licensed under the GPLv2 and the
MPL.

As far as providing a convenience to users, I can't see why anyone would
really care that a particular module includes some code that may be
licensed under the MPL as well.  Anyone actually looking through the
kernel for code to incorporate into an MPL project would surely read the
copyright headers at the top of the source files, rather than try to use
the MODULE_LICENSE notes.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL / MPL license issues.

2007-08-07 Thread Jeremy Maitin-Shepard

Dave Jones [EMAIL PROTECTED] writes:

 There are a number of files in the kernel that have in their
 headers a notice that the file is under the Mozilla Public License,
 which alone, is incompatible with the GPL.

 This itself is fine, as long as the resulting code claims
 to be Dual MPL/GPL, however there are a few cases where this
 doesn't seem to be happening.

All of the files that you cite include a notice that they are licensed
under the GPLv2, in addition to the MPL.  There is no reason that
MODULE_LICENSE needs to indicate that some portions of code may also be
available under an alternative license.  Furthermore, for some modules
that contain both code licensed under the GPLv2 exclusively, and code
dual-licensed under both the GPLv2 and the MPL, it would be incorrect to
state that the combined work is dual-licensed under the GPLv2 and the
MPL.

As far as providing a convenience to users, I can't see why anyone would
really care that a particular module includes some code that may be
licensed under the MPL as well.  Anyone actually looking through the
kernel for code to incorporate into an MPL project would surely read the
copyright headers at the top of the source files, rather than try to use
the MODULE_LICENSE notes.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 12/26] ext2 white-out support

2007-08-02 Thread Jeremy Maitin-Shepard

Jörn Engel <[EMAIL PROTECTED]> writes:

> On Wed, 1 August 2007 15:33:30 -0400, Josef Sipek wrote:
>> 
>> This brings up an very interesting (but painful) question...which makes more
>> sense? Allowing the modifications in only the top-most branch, or any branch
>> (given the user allows it at mount-time)?
>> 
>> This is really question to the community at large, not just you, Dave :)

> Only write to top-most layer.

> There are two reasons for this.  First it allows users to create a union
> mount, test something (e.g. update the distribution) and remove every
> trace from the test by umounting the top-most layer.  Such a thing can
> be quite valuable.

Josef did specifically state that modification to the lower layers would
be allowed only if a special mount flag is given.

> The second reason is simplicity.  I personally couldn't even start to
> describe the semantics.  If the user does a rename, which layer will the
> change end up in?  What if source or target exist in multiple layers?
> How to rename a directory in a lower layer containing a new file in an
> upper layer?

> Finding new and interesting corner cases for such a beast can be quite
> entertaining.  And until someone has properly documented the semantics
> for _all_ the corner cases, my enthusiasm is below freezing point.  Does
> such a documentation exist?

I think that if someone can come up with consistent (and useful)
semantics for a mount option that allows modifications to other layers
as well, it would be a useful additional feature to support.  It seems
that it should be possible to add this feature at a later time in any
case.

Perhaps referring to the plan9 semantics could be helpful.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 12/26] ext2 white-out support

2007-08-02 Thread Jeremy Maitin-Shepard

Jörn Engel [EMAIL PROTECTED] writes:

 On Wed, 1 August 2007 15:33:30 -0400, Josef Sipek wrote:
 
 This brings up an very interesting (but painful) question...which makes more
 sense? Allowing the modifications in only the top-most branch, or any branch
 (given the user allows it at mount-time)?
 
 This is really question to the community at large, not just you, Dave :)

 Only write to top-most layer.

 There are two reasons for this.  First it allows users to create a union
 mount, test something (e.g. update the distribution) and remove every
 trace from the test by umounting the top-most layer.  Such a thing can
 be quite valuable.

Josef did specifically state that modification to the lower layers would
be allowed only if a special mount flag is given.

 The second reason is simplicity.  I personally couldn't even start to
 describe the semantics.  If the user does a rename, which layer will the
 change end up in?  What if source or target exist in multiple layers?
 How to rename a directory in a lower layer containing a new file in an
 upper layer?

 Finding new and interesting corner cases for such a beast can be quite
 entertaining.  And until someone has properly documented the semantics
 for _all_ the corner cases, my enthusiasm is below freezing point.  Does
 such a documentation exist?

I think that if someone can come up with consistent (and useful)
semantics for a mount option that allows modifications to other layers
as well, it would be a useful additional feature to support.  It seems
that it should be possible to add this feature at a later time in any
case.

Perhaps referring to the plan9 semantics could be helpful.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Jeremy Maitin-Shepard

It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-21 Thread Jeremy Maitin-Shepard

Pavel Machek <[EMAIL PROTECTED]> writes:

[snip]

> So it will be break at least battery status and "AC plugged in"
> status, because those are handled by ACPI and we do not know how to
> control them by hand.

It seems that it should be possible to initialize ACPI as if the system
just booted up normally.  Then battery status and such should be
correct, since they are correct after normal initialization.

It should be possible to make hibernate look just like a reboot to all
of the devices, including ACPI stuff.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-21 Thread Jeremy Maitin-Shepard

Pavel Machek [EMAIL PROTECTED] writes:

[snip]

 So it will be break at least battery status and AC plugged in
 status, because those are handled by ACPI and we do not know how to
 control them by hand.

It seems that it should be possible to initialize ACPI as if the system
just booted up normally.  Then battery status and such should be
correct, since they are correct after normal initialization.

It should be possible to make hibernate look just like a reboot to all
of the devices, including ACPI stuff.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Jeremy Maitin-Shepard

It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Alan Stern <[EMAIL PROTECTED]> writes:

> On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:
>> >> when doing a suspend-to-ram you get to a point where you just don't use 
>> >> any userspace.
>> 
>> > What do you mean?  How can you prevent user tasks from running?  That's 
>> > basically what the freezer does, and the whole point of this approach 
>> > is to eliminate the freezer.  Right?
>> 
>> Presumably no tasks at all would be scheduled.

> How would you prevent tasks from being scheduled?  How would you
> prevent drivers from deadlocking because in order to put their device
> in a low-power state they need to acquire a lock which is held by a
> user task?

Perhaps this isn't an issue once the device is already quiesced.  I'm
just conjecturing.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Alan Stern <[EMAIL PROTECTED]> writes:

> On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote:
>> > Userspace can submit I/O requests.  Someone will have to audit every
>> > driver to make sure that such I/O requests don't cause a quiesced
>> > device to become active.  If the device is active, it will make the
>> > memory snapshot inconsistent with the on-device data.
>> 
>> assuming this is the suspend-from-ram after a kexec back from the 
>> write-to-disk kernel I don't think you are correct.
>> 
>> when doing a suspend-to-ram you get to a point where you just don't use 
>> any userspace.

> What do you mean?  How can you prevent user tasks from running?  That's 
> basically what the freezer does, and the whole point of this approach 
> is to eliminate the freezer.  Right?

Presumably no tasks at all would be scheduled.

>> from that point on you are just walking the device tree 
>> putting things into low-power mode. This is the point where we are talking 
>> about jumping to.

> Yes.  And putting things into low-power mode requires the ability to 
> run the scheduler, which means that user tasks can be scheduled, which 
> means that they can run.

Does it really (fundamentally) require scheduling tasks, particularly in
the case that the devices have already been put in the "quiesced" state?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

>> Or add a small bit of infrastructure that errors writes at make_request 
>> if you don't have a magic "i am a direct block device write from 
>> userspace" flag on the bio.
>> 
>> The hibernate may fail, but you don't corrupt the media.
>> 
>> If you don't get the image out, resume back to the "this is resume" 
>> instead of the power-down path.

> Well, I don't think that is much prettier than the freezer ...

It seems that a better solution to the "how do we write to a file on an
in-use partition" has been suggested, which also handles swap partitions
and swap files, and does not require mounting filesystems, so it seems
that the filesystem issue need not be considered.

[snip]

> No.  I'm saying that when you go back from the image-saving kernel to the
> hibernated kernel, you need to make sure that no task will cause any
> filesystem's on-disk state to be actually updated.  If you can't make such
> a guarantee, you just can't do that.

> With the current state of the drivers, it's not doable without the
> freezer.

It seems that it should be feasible to fix the drivers so that

1. they can be taken from normal state to quiesced state without
   requiring the freezer;

2. they can be taken from normal state to low power state without
   requiring the freezer;

3. they can be taken from quiesced state to low power state without
   requiring the freezer.

In the particular, it seems that it should be possible to do (3) without
needing to schedule tasks.

It seems likely that (2) may in fact be almost exactly the same as, or
at least similar to, (1) followed by (3), at least for many drivers.
(1) is required by the kexec hibernate approach even ignoring suspend to
both or S4.  (2) is required for suspend to ram without the freezer,
which seems to be desired anyway.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Milton Miller <[EMAIL PROTECTED]> writes:

[snip]

>>>> (7) how to avoid corrupting filesystems mounted by the hibernated kernel
>>> 
>>> I didn't realize this was a discussion item. I thought the options were
>>> clear, for some filesystem types you can mount them read-only, but for
>>> ext3 (and possilby other less common ones) you just plain cannot touch
>>> them.
>> 
>> That's correct.  And since you cannot thouch ext3, you need either to assume
>> that you won't touch filesystems at all, or to have a code to recognize the
>> filesystem you're dealing with.

> Or add a small bit of infrastructure that errors writes at make_request if you
> don't have a magic "i am a direct block device write from userspace" flag on 
> the
> bio.

I still don't understand why there is this fixation on accessing dirty
filesystems in use by the hibernated system.  Even if you avoid
corrupting the filesystem by avoiding writing to the block device, there
isn't any real guarantee about the state of the data, except for a
filesystem that specifically makes guarantees about such data (and I
don't believe any of the existing ones do).

It isn't necessary to be able to access such filesystems: everything can
be done from an initramfs/initrd.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Milton Miller [EMAIL PROTECTED] writes:

[snip]

 (7) how to avoid corrupting filesystems mounted by the hibernated kernel
 
 I didn't realize this was a discussion item. I thought the options were
 clear, for some filesystem types you can mount them read-only, but for
 ext3 (and possilby other less common ones) you just plain cannot touch
 them.
 
 That's correct.  And since you cannot thouch ext3, you need either to assume
 that you won't touch filesystems at all, or to have a code to recognize the
 filesystem you're dealing with.

 Or add a small bit of infrastructure that errors writes at make_request if you
 don't have a magic i am a direct block device write from userspace flag on 
 the
 bio.

I still don't understand why there is this fixation on accessing dirty
filesystems in use by the hibernated system.  Even if you avoid
corrupting the filesystem by avoiding writing to the block device, there
isn't any real guarantee about the state of the data, except for a
filesystem that specifically makes guarantees about such data (and I
don't believe any of the existing ones do).

It isn't necessary to be able to access such filesystems: everything can
be done from an initramfs/initrd.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 Or add a small bit of infrastructure that errors writes at make_request 
 if you don't have a magic i am a direct block device write from 
 userspace flag on the bio.
 
 The hibernate may fail, but you don't corrupt the media.
 
 If you don't get the image out, resume back to the this is resume 
 instead of the power-down path.

 Well, I don't think that is much prettier than the freezer ...

It seems that a better solution to the how do we write to a file on an
in-use partition has been suggested, which also handles swap partitions
and swap files, and does not require mounting filesystems, so it seems
that the filesystem issue need not be considered.

[snip]

 No.  I'm saying that when you go back from the image-saving kernel to the
 hibernated kernel, you need to make sure that no task will cause any
 filesystem's on-disk state to be actually updated.  If you can't make such
 a guarantee, you just can't do that.

 With the current state of the drivers, it's not doable without the
 freezer.

It seems that it should be feasible to fix the drivers so that

1. they can be taken from normal state to quiesced state without
   requiring the freezer;

2. they can be taken from normal state to low power state without
   requiring the freezer;

3. they can be taken from quiesced state to low power state without
   requiring the freezer.

In the particular, it seems that it should be possible to do (3) without
needing to schedule tasks.

It seems likely that (2) may in fact be almost exactly the same as, or
at least similar to, (1) followed by (3), at least for many drivers.
(1) is required by the kexec hibernate approach even ignoring suspend to
both or S4.  (2) is required for suspend to ram without the freezer,
which seems to be desired anyway.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Alan Stern [EMAIL PROTECTED] writes:

 On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote:
  Userspace can submit I/O requests.  Someone will have to audit every
  driver to make sure that such I/O requests don't cause a quiesced
  device to become active.  If the device is active, it will make the
  memory snapshot inconsistent with the on-device data.
 
 assuming this is the suspend-from-ram after a kexec back from the 
 write-to-disk kernel I don't think you are correct.
 
 when doing a suspend-to-ram you get to a point where you just don't use 
 any userspace.

 What do you mean?  How can you prevent user tasks from running?  That's 
 basically what the freezer does, and the whole point of this approach 
 is to eliminate the freezer.  Right?

Presumably no tasks at all would be scheduled.

 from that point on you are just walking the device tree 
 putting things into low-power mode. This is the point where we are talking 
 about jumping to.

 Yes.  And putting things into low-power mode requires the ability to 
 run the scheduler, which means that user tasks can be scheduled, which 
 means that they can run.

Does it really (fundamentally) require scheduling tasks, particularly in
the case that the devices have already been put in the quiesced state?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard

Alan Stern [EMAIL PROTECTED] writes:

 On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:
  when doing a suspend-to-ram you get to a point where you just don't use 
  any userspace.
 
  What do you mean?  How can you prevent user tasks from running?  That's 
  basically what the freezer does, and the whole point of this approach 
  is to eliminate the freezer.  Right?
 
 Presumably no tasks at all would be scheduled.

 How would you prevent tasks from being scheduled?  How would you
 prevent drivers from deadlocking because in order to put their device
 in a low-power state they need to acquire a lock which is held by a
 user task?

Perhaps this isn't an issue once the device is already quiesced.  I'm
just conjecturing.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

>> There are NO ACPI LIMITS!  There only are things that you need to implement
>> if you're going to support ACPI, but they need not be used ALWAYS, no?

> yes there are limits. the fact that you can't remove the battery in S4 mode
> without messing things up is a limit,

You won't mess things up as long as the resuming kernel knows that it
should resume as if the system were shutdown, rather than sent to S4
state.  Maybe it is even possible to detect what type of resuming is
needed automatically.  Similarly, booting another OS shouldn't be a
problem, except that if you do it without powering off the system first,
some devices might not work under the other OS if the other OS doesn't
initialize them properly.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

>> How do you guarantee that no tasks are scheduled when you get back to the
>> hibernated kernel?

> just don't schedule any userspace tasks. all you need to do is to execute the
> ACPI sleep functions. you normally do that after stopping userspace
> anyway.

What does "stopping userspace" mean?  You already said it does not mean
disabling interrupts.  But using the freezer is also not an option,
since the avoidance of that is the main reason for the kexec approach in
the first place.

[snip]

>> Well, not exactly.  If your battery runs out of power while you're suspended,
>> but you have the image saved, it's still better to restore from the image,
> even
>> if something may not work correctly after the restore, than to risk a loss of
>> data.

> if things don't work correctly you are still risking the loss of data, the 
> user
> just doesn't know it.

It should be possible on any system to do a hibernate followed by a
shutdown (and then resume properly, without any problems).  Thus, for
handling suspend to both, you resume as if the system had been shutdown,
rather than resuming as if the system came from S4.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

>>> * figure out which devices can wake up
>>> * put devices into low power states (wake-up devices are placed in the Dx
>>> states compatible with the wake capability, the others are powered off)

> this can't be done by the image-saving kernel if that kernel doesn't know 
> about
> the device.

The image-saving kernel can be made to know about all of the "wake up"
devices; all other devices should have already been powered off by the
"hibernated" kernel.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

> this is where we disagree.

> why not? if all that the hibernated kernel does is to suspend-to-ram and makes
> no changes to disks or TCP connections anything that it does do would be lost 
> if
> power were to fail and you instead did a restore from disk.

It would be okay to switch the "hibernated" kernel in order to
e.g. initiate a suspend to ram provided that everything is done
atomically with interrupts off, for instance.  It is not clear, though,
that it is possible to suspend to ram atomically like that.

There is also the question of what state the devices will be in when
switching back from the "save image" kernel to the "hibernated" kernel.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

> the non-ACPI hibernate behaves very differently, and for some people (and I
> think I am one of them) it will meet their needs better then _any_ of the ACPI
> suspends.

It may have certain differences from the user point of view, but from
the implementation view, it seems that it is nearly exactly the same.
The only differences seem to be: 

 - rather than shutting down, do whatever is necessary to stick the
   system in S4 state.

 - make sure ACPI isn't initialized by the "load image" kernel

 - rather than "resume from hibernate" ACPI by initializing it normally,
   issue the special hibernate-related methods.

Thus, it seems that supporting ACPI S4 will have a very minimal affect
on the hibernate implementation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

>> Rafael, for those of us who aren't thoroughly familiar with all the ins
>> and outs of the ACPI spec, could you please summarize a list of the
>> ACPI calls needed in the second and third cases above?  Indicate which
>> ones need to be done from within the original kernel and which should
>> be done from within a kexec'd hibernation kernel.

> Sure.

> In the third case (ie. transition to S4) we are supposed to do the following:

> (1) Upon entering the sleep state, which IMO can be done _after_ the image
> has been saved:

I assume you mean "in order to enter the sleep state", rather than "upon
entering the sleep state".  I still don't understand what you mean by
"which IMO can be done _after_ the image has been saved"; as far as I
understand, the last step of this process, "make the platform enter S4",
is almost like a shutdown as far as the kernel is concerned (except for
the tiny detail of having to call those special ACPI methods on resume);
consequently, it would seem that nothing can be done after that step.

>   * figure out which devices can wake up
>   * put devices into low power states (wake-up devices are placed in the Dx
> states compatible with the wake capability, the others are powered off)
>   * execute the _PTS global control method
>   * switch off the nonlocal CPUs (eg. nonboot CPUs on x86)
>   * execute the _GTS global control method
>   * set the GPE enable registers corresponding to the wake-up devices)
>   * make the platform enter S4 (there's a well defined procedure for that)
>   I think that this should be done by the image-saving kernel.

I agree.

> (2) Upon start-up (by which I mean what happens after the user has pressed
> the power button or something like that):
>   * check if the image is present (and valid) _without_ enabling ACPI (we 
> don't
> do that now, but I see no reason for not doing it in the new framework)
>   * if the image is present (and valid), load it
>   * turn on ACPI (unless already turned on by the BIOS, that is)
>   * execute the _BFS global control method
>   * execute the _WAK global control method
>   * continue
>   Here, the first two things should be done by the image-loading kernel, but
>   the remaining operations have to be carried out by the restored
> kernel.

It doesn't seem like a problem for that to be the case, but out of
curiosity why do those methods need to be executed by the "restored"
kernel, rather than the "image loading" kernel.  Do they require some
information from ACPI-related kernel data structures that were populated
by the normal ACPI initialization?

[snip]

> ... we can't return to the hibernated kernel unless we are going to cancel the
> hibernation.

I agree.

> That's why I think that for the suspend-to-both the image-saving kernel will
> need to support the same set of devices as the hibernated kernel.

If all of the devices that the image writing kernel doesn't know about
have already been shut down/powered off by the hibernated kernel, then
does the "image writing" kernel still need to know about them in order
to suspend to RAM properly (i.e. without leaving some devices on wasting
power)?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

> Well, first, the fact is that _some_ systems _will_ be powered while in
> hibernation (the majority of notebooks, for example) and you should assume
> that the platform _may_ retain some information accross the 
> hibernation/restore
> cycle.  In that case you _should_ _not_ trash the information retained by the
> platform.

I'm not sure the majority of notebook users will want wakeup support in
exchange for some power consumption while the system is off.  I think
many people would not consider the trouble of having to press the power
button instead of merely opening the lid too great.

Furthermore, S4 mode is of course also not suitable if you intend to
replace the battery while the system is hibernated.

It does seem that it is useful to provide S4 as an option, but certainly
just shutting down should also be an option on all systems.

> Now, with that in mind, ACPI requires us to make the system enter the S4 sleep
> state as a result of the hibernation procedure.  In my opinion this may be 
> done
> after saving the image, but still this means, for example, that the
> image-saving kernel needs to support ACPI.

It seems that it most certainly must be done AFTER saving the image, as
the image obviously cannot be saved after entering S4 state, since S4
state is nearly the same as powering off completely and all memory will
be lost.

> Next, during the restore, we should first check if the image is present (and
> valid) _without_ turning ACPI on (note that this is not done by the current
> hibernation code and that leads to strange problems on some systems).  Then,
> if the image is present (and valid), we should first load it, jump to the
> hibernated kernel and _then_ turn ACPI on and execute the _BFS and
> _WAK ACPI global methods (again, this is not done by the current code in that
> order, which is wrong).  Only after that is the hibernated kernel supposed to
> continue.

It seems that the implementation of that behavior for Linux cannot be
quite so simple, since resume from hibernation is driven (in general)
from an initrd/initramfs rather than directly from the kernel
initialization sequence, in order to support modular drivers and
features like DM and LVM.

Thus, there would have to be a new "delay_acpi_initialization" kernel
command-line option.  Additionally, there would be a sysfs interface to
tell the kernel to proceed with the ACPI initialization as normal.  This
would be used by an initrd/initramfs after determining that a resume
from hibernate will not be done.  If a resume from hibernate is done,
this hook won't be used, and instead the resumed kernel will call the
ACPI hibernate resume stuff if S4 state was used; otherwise, the resumed
kernel will just re-initialize ACPI as normal.  Also, if the in-kernel
code for checking if a resume can be done does not find a hibernate
image, it will also invoke the delayed ACPI initialization.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

> I'm afraid of one thing, though.

> If we create a framework without ACPI (well, ACPI needs to be enabled in the
> kernel anyway for other reasons, like the ability to suspend to RAM) and then
> it turns out that we have to add some ACPI hooks to it, that might be 
> difficult
> to do cleanly.

> Thus, it seems reasonable to think of the ACPI handling in advance.

As far as I understand, ACPI support is only useful for hibernate to the
extent that it allows some or all of the following features:

 - possibly shows a nice looking "hibernate" LED
 - possibly allows the BIOS to show something about hibernate
 - possibly allows the lid or keyboard to "wake up" (turn on) the system

Note that properly restoring device state (or even properly determining
whether on external/mains power vs. battery) on resume is not something
that should require special hibernate ACPI support, since it should be
possible to make hibernate (and in general it will be the case that
hibernate will) look exactly like a reboot to the BIOS/ACPI/devices.
The problem that you mentioned on your system regarding power source
information would seem to just be a problem with how ACPI is
reinitialized after resuming from hibernation, which is not at all
surprising since we know it (the use of driver calls for hibernate) is
currently broken in many ways.

It seems that enabling S4 mode should just be treated as a special
shutdown mode, independent of hibernate.  In practice, it may likely
only be useful in conjunction with hibernate, but there doesn't seem to
be any reason it needs to be coupled.

It would be useful to determine whether it is necessary to initialize
ACPI specially after "resuming" from S4 mode, though, or whether they
can be initialized normally (i.e. by a normal kernel for instance,
completely unaware of hibernate).  If they can be initialized normally,
then it seems that it is unnecessary to have any ACPI S4 mode support in
the resume path, and it can merely exist as a special shutdown mode.
Note that it seems a bit odd if ACPI can't be initialized normally after
resume from S4 (and still work), since the "load image" kernel
initializes everything normally before attempting to resume the
hibernated system.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 I'm afraid of one thing, though.

 If we create a framework without ACPI (well, ACPI needs to be enabled in the
 kernel anyway for other reasons, like the ability to suspend to RAM) and then
 it turns out that we have to add some ACPI hooks to it, that might be 
 difficult
 to do cleanly.

 Thus, it seems reasonable to think of the ACPI handling in advance.

As far as I understand, ACPI support is only useful for hibernate to the
extent that it allows some or all of the following features:

 - possibly shows a nice looking hibernate LED
 - possibly allows the BIOS to show something about hibernate
 - possibly allows the lid or keyboard to wake up (turn on) the system

Note that properly restoring device state (or even properly determining
whether on external/mains power vs. battery) on resume is not something
that should require special hibernate ACPI support, since it should be
possible to make hibernate (and in general it will be the case that
hibernate will) look exactly like a reboot to the BIOS/ACPI/devices.
The problem that you mentioned on your system regarding power source
information would seem to just be a problem with how ACPI is
reinitialized after resuming from hibernation, which is not at all
surprising since we know it (the use of driver calls for hibernate) is
currently broken in many ways.

It seems that enabling S4 mode should just be treated as a special
shutdown mode, independent of hibernate.  In practice, it may likely
only be useful in conjunction with hibernate, but there doesn't seem to
be any reason it needs to be coupled.

It would be useful to determine whether it is necessary to initialize
ACPI specially after resuming from S4 mode, though, or whether they
can be initialized normally (i.e. by a normal kernel for instance,
completely unaware of hibernate).  If they can be initialized normally,
then it seems that it is unnecessary to have any ACPI S4 mode support in
the resume path, and it can merely exist as a special shutdown mode.
Note that it seems a bit odd if ACPI can't be initialized normally after
resume from S4 (and still work), since the load image kernel
initializes everything normally before attempting to resume the
hibernated system.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 Well, first, the fact is that _some_ systems _will_ be powered while in
 hibernation (the majority of notebooks, for example) and you should assume
 that the platform _may_ retain some information accross the 
 hibernation/restore
 cycle.  In that case you _should_ _not_ trash the information retained by the
 platform.

I'm not sure the majority of notebook users will want wakeup support in
exchange for some power consumption while the system is off.  I think
many people would not consider the trouble of having to press the power
button instead of merely opening the lid too great.

Furthermore, S4 mode is of course also not suitable if you intend to
replace the battery while the system is hibernated.

It does seem that it is useful to provide S4 as an option, but certainly
just shutting down should also be an option on all systems.

 Now, with that in mind, ACPI requires us to make the system enter the S4 sleep
 state as a result of the hibernation procedure.  In my opinion this may be 
 done
 after saving the image, but still this means, for example, that the
 image-saving kernel needs to support ACPI.

It seems that it most certainly must be done AFTER saving the image, as
the image obviously cannot be saved after entering S4 state, since S4
state is nearly the same as powering off completely and all memory will
be lost.

 Next, during the restore, we should first check if the image is present (and
 valid) _without_ turning ACPI on (note that this is not done by the current
 hibernation code and that leads to strange problems on some systems).  Then,
 if the image is present (and valid), we should first load it, jump to the
 hibernated kernel and _then_ turn ACPI on and execute the _BFS and
 _WAK ACPI global methods (again, this is not done by the current code in that
 order, which is wrong).  Only after that is the hibernated kernel supposed to
 continue.

It seems that the implementation of that behavior for Linux cannot be
quite so simple, since resume from hibernation is driven (in general)
from an initrd/initramfs rather than directly from the kernel
initialization sequence, in order to support modular drivers and
features like DM and LVM.

Thus, there would have to be a new delay_acpi_initialization kernel
command-line option.  Additionally, there would be a sysfs interface to
tell the kernel to proceed with the ACPI initialization as normal.  This
would be used by an initrd/initramfs after determining that a resume
from hibernate will not be done.  If a resume from hibernate is done,
this hook won't be used, and instead the resumed kernel will call the
ACPI hibernate resume stuff if S4 state was used; otherwise, the resumed
kernel will just re-initialize ACPI as normal.  Also, if the in-kernel
code for checking if a resume can be done does not find a hibernate
image, it will also invoke the delayed ACPI initialization.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 Rafael, for those of us who aren't thoroughly familiar with all the ins
 and outs of the ACPI spec, could you please summarize a list of the
 ACPI calls needed in the second and third cases above?  Indicate which
 ones need to be done from within the original kernel and which should
 be done from within a kexec'd hibernation kernel.

 Sure.

 In the third case (ie. transition to S4) we are supposed to do the following:

 (1) Upon entering the sleep state, which IMO can be done _after_ the image
 has been saved:

I assume you mean in order to enter the sleep state, rather than upon
entering the sleep state.  I still don't understand what you mean by
which IMO can be done _after_ the image has been saved; as far as I
understand, the last step of this process, make the platform enter S4,
is almost like a shutdown as far as the kernel is concerned (except for
the tiny detail of having to call those special ACPI methods on resume);
consequently, it would seem that nothing can be done after that step.

   * figure out which devices can wake up
   * put devices into low power states (wake-up devices are placed in the Dx
 states compatible with the wake capability, the others are powered off)
   * execute the _PTS global control method
   * switch off the nonlocal CPUs (eg. nonboot CPUs on x86)
   * execute the _GTS global control method
   * set the GPE enable registers corresponding to the wake-up devices)
   * make the platform enter S4 (there's a well defined procedure for that)
   I think that this should be done by the image-saving kernel.

I agree.

 (2) Upon start-up (by which I mean what happens after the user has pressed
 the power button or something like that):
   * check if the image is present (and valid) _without_ enabling ACPI (we 
 don't
 do that now, but I see no reason for not doing it in the new framework)
   * if the image is present (and valid), load it
   * turn on ACPI (unless already turned on by the BIOS, that is)
   * execute the _BFS global control method
   * execute the _WAK global control method
   * continue
   Here, the first two things should be done by the image-loading kernel, but
   the remaining operations have to be carried out by the restored
 kernel.

It doesn't seem like a problem for that to be the case, but out of
curiosity why do those methods need to be executed by the restored
kernel, rather than the image loading kernel.  Do they require some
information from ACPI-related kernel data structures that were populated
by the normal ACPI initialization?

[snip]

 ... we can't return to the hibernated kernel unless we are going to cancel the
 hibernation.

I agree.

 That's why I think that for the suspend-to-both the image-saving kernel will
 need to support the same set of devices as the hibernated kernel.

If all of the devices that the image writing kernel doesn't know about
have already been shut down/powered off by the hibernated kernel, then
does the image writing kernel still need to know about them in order
to suspend to RAM properly (i.e. without leaving some devices on wasting
power)?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

 this is where we disagree.

 why not? if all that the hibernated kernel does is to suspend-to-ram and makes
 no changes to disks or TCP connections anything that it does do would be lost 
 if
 power were to fail and you instead did a restore from disk.

It would be okay to switch the hibernated kernel in order to
e.g. initiate a suspend to ram provided that everything is done
atomically with interrupts off, for instance.  It is not clear, though,
that it is possible to suspend to ram atomically like that.

There is also the question of what state the devices will be in when
switching back from the save image kernel to the hibernated kernel.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

 * figure out which devices can wake up
 * put devices into low power states (wake-up devices are placed in the Dx
 states compatible with the wake capability, the others are powered off)

 this can't be done by the image-saving kernel if that kernel doesn't know 
 about
 the device.

The image-saving kernel can be made to know about all of the wake up
devices; all other devices should have already been powered off by the
hibernated kernel.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

 How do you guarantee that no tasks are scheduled when you get back to the
 hibernated kernel?

 just don't schedule any userspace tasks. all you need to do is to execute the
 ACPI sleep functions. you normally do that after stopping userspace
 anyway.

What does stopping userspace mean?  You already said it does not mean
disabling interrupts.  But using the freezer is also not an option,
since the avoidance of that is the main reason for the kexec approach in
the first place.

[snip]

 Well, not exactly.  If your battery runs out of power while you're suspended,
 but you have the image saved, it's still better to restore from the image,
 even
 if something may not work correctly after the restore, than to risk a loss of
 data.

 if things don't work correctly you are still risking the loss of data, the 
 user
 just doesn't know it.

It should be possible on any system to do a hibernate followed by a
shutdown (and then resume properly, without any problems).  Thus, for
handling suspend to both, you resume as if the system had been shutdown,
rather than resuming as if the system came from S4.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-17 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]
 
 There are NO ACPI LIMITS!  There only are things that you need to implement
 if you're going to support ACPI, but they need not be used ALWAYS, no?

 yes there are limits. the fact that you can't remove the battery in S4 mode
 without messing things up is a limit,

You won't mess things up as long as the resuming kernel knows that it
should resume as if the system were shutdown, rather than sent to S4
state.  Maybe it is even possible to detect what type of resuming is
needed automatically.  Similarly, booting another OS shouldn't be a
problem, except that if you do it without powering off the system first,
some devices might not work under the other OS if the other OS doesn't
initialize them properly.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-15 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

>> Isn't is possible to avoid this problem by mounting an ext3 filesystem
>> as readonly ext2?  Provided the filesystem isn't dirty it should be
>> doable.  (And provided the filesystem doesn't use any ext3 extensions
>> that are incompatible with ext2.)

> from the last discussion I saw on the kernel mailing list, no. the act of
> mounting the ext3 filesystem as ext2 read-only will change it as the 
> unsupported
> extentions get turned off (and I think the journal contents at least are lost 
> as
> part of this)

The fact of the matter is that it really doesn't matter whether mounting
it read-only actually corrupts the data on disk or not.  Regardless, it
should not be done, because you are accessing a dirty filesystem that is
still in use, and consequently there are no guarantees that either the
metadata or the file contents are consistent.  It isn't necessary for
hibernation to be able to access mounted partitions anyway.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation considerations

2007-07-15 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] writes:

[snip]

 Isn't is possible to avoid this problem by mounting an ext3 filesystem
 as readonly ext2?  Provided the filesystem isn't dirty it should be
 doable.  (And provided the filesystem doesn't use any ext3 extensions
 that are incompatible with ext2.)

 from the last discussion I saw on the kernel mailing list, no. the act of
 mounting the ext3 filesystem as ext2 read-only will change it as the 
 unsupported
 extentions get turned off (and I think the journal contents at least are lost 
 as
 part of this)

The fact of the matter is that it really doesn't matter whether mounting
it read-only actually corrupts the data on disk or not.  Regardless, it
should not be done, because you are accessing a dirty filesystem that is
still in use, and consequently there are no guarantees that either the
metadata or the file contents are consistent.  It isn't necessary for
hibernation to be able to access mounted partitions anyway.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-13 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

> Okay, I have thought it through and I think that, as an initial step, we can 
> do
> something like this:

> - preload the image-saving kernel before hibernation
> - in the hibernation code path replace device_suspend() with the shutting down
> of
>   all devices without unregistering them (not very nice, but should be
> sufficient
>   for a while)

It seems that the effect of what is done by the current hibernate
implementations is to shutdown all of the devices, but according to
kernel data structures, have it look like the devices were merely
suspended (i.e. device_suspend).  Then in the resume path, the "restore
image" kernel also calls device_suspend just before jumping to the
hibernated kernel, so all of the devices the "restore image" kernel knew
about are in the state device_suspend expects them to be in, except that
they were actually suspended by a different kernel, so they might not be
in quite the right state.  There is also the issue that the "restore
image" kernel might not know about all of the devices; for instance, if
USB support is modular, and, as is likely to be the case, the user
didn't load the USB modules in the "restore image" kernel from an initrd
or something, then the USB devices will actually be powered off, rather
than "suspended".  Despite these apparent discrepancies, it seems that
for many devices (I'm not sure USB devices are included, though),
device_restore happens to do the right thing so that the device is
placed back in the state it needs to be so that the driver can begin
talking to it as it did before, and the device is recognized as the same
device as was there before (since otherwise mounted filesystems backed
by block devices that came back as a different device would cause great
havoc).

Since I recall there being issues with USB devices being recognized as
the same devices post-hibernate-resume, without looking at the code I'm
inclined to believe that the USB drivers still don't end up resuming
from hibernate correctly.

Note that I am describing what is done currently, not what is planned to
be done (i.e. change device_suspend to quiesce and device_resume to
unquiesce).  It seems that ironically, despite everyone believing that
device_suspend/device_resume is incorrect for hibernate, many of the
things that those functions do (like saving the PCI configuration,
perhaps, and then restoring it later, or re-initializing the device) are
actually necessary, especially for modular drivers that won't be loaded
by the "restore image" kernel.

What needs to be done is for the devices to be shut down (or possibly
just quiesced for a select few, but we won't worry about that
complication until later; in the case of the current implementations,
they should all be quiesced rather than shut down), but whatever
information that will be needed later to reinitialize the device
(ideally the reinitialization should be able to handle the device either
being in a quiesced state or completely off) and recognize it as the
same device must be saved.  This probably means they cannot be
"unregistered", as otherwise there would be nothing with which to
associate the saved information.

The resume path needs to use the saved state to reinitialize the device
and recognize it as the same device.  It seems that the existing
reprobing code may not be sufficient for this.  Note that exactly the
same thing must be done on resume for both the current hibernate
implementations and the kexec approach.  It seems that properly
restoring the devices should be relatively easy for the devices that
already get this correct, like IDE devices and basic PCI devices (and
SATA and SCSI devices as well perhaps?), and possibly harder for
Firewire or USB devices.

> - when we've called device_power_down() and save_processor_state(), jump to
>   the image-saving kernel and let it run
> - make the image-saving kernel set up everything, save the image without
>   starting any user space (we may use the existing image-saving code for this
>   purpose, with some modifications) and power off the system (or make it enter
>   S4)

I suppose this has the advantage of not requiring that a
kernel-to-userspace interface be created for this purpose.

> - use the existing restoration code to load the image and jump to the
>   hibernated kernel

This would again avoid the need for a separate userspace-kernelspace
interface for the purpose, so I agree it could be a useful thing to do
initially.

> - in the restore code patch replace device_resume() with the reprobing of all
>   devices.

See my comments above.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-13 Thread Jeremy Maitin-Shepard

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

> Not necessarily.  If we don't put devices into low power states before 
> creating
> the image, that should work just fine (quiesce devices, create the image or
> kexec the new kernel, reprobe devices, save the image, suspend to RAM,
> resume from RAM, continue - or restore from the image if power failed in the
> meantime).  Still, for this purpose, both kernels need to be able to handle 
> the
> same set of devices.

I don't know much about the suspend to RAM, but it seems that it would
indeed be necessary to have a device driver for a device in order to
switch it from e.g. a quiesced state to a low power state.  If, however,
the original kernel already completely turned off the device, then it
seems that the "save image" kernel shouldn't have to do anything to it
in order to suspend to RAM.  The drawback, though, is that since the old
kernel would have no way (unless the user tells it) to know which
devices should be left quiesced and which should be turned off, it would
have to turn them all off, which would mean spinning up and down the
disks.

On the other hand, being able to build the "save image" kernel with only
minimal hardware support could save a significant amount of the time
required to boot it.

[snip]

> No, it can't.  For example, it can't access filesystems mounted by the
> hibernated kernel, or they may get corrupted after the restore (if they are
> journaling, it can't even read from them).

That is true, but this also holds for the current hibernate
implementations.

> Which reminds me of one more issue, which is that the image-saving kernel
> won't be able to use these filesystems either, so its modules and user space
> will have to be available from somewhere else (like a RAM disk or dedicated
> partition).  So things get ugly.

This is not the issue that it appears to be, though.  Under the current
hibernate implementations, this very same userspace and set of modules
must be available "somewhere else" (i.e. an initrd) because it is needed
by the restore path.  Note that under the kexec approach, save and
restore become rather symmetric operations.

> Apart from this, the new kernel's user space cannot blindly modify swap space
> that might be in use by the hibernated kernel.

But it seems easy enough to swapoff in order to completely free up the
swap space.  I suppose the disadvantage is that instead of failing
cleanly if there is insufficient memory, the OOM killer will be invoked
and cause all sorts of havoc.  This suggests that it may indeed be
important to support "cooperation" with the old kernel on saving the
image sooner, rather than later.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernating To Swap Considered Harmful

2007-07-13 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] (Joseph Fannin) writes:

[snip]

> Intel Macs use GPT partition tables, which support a huge number
> of primary partitions, and so don't support secondary partitions.

> 32bit Windows does not support GPT, so PC-style MBR partition tables
> must also be used.  GPT was designed to coexist with MBR tools, so
> this mostly works, but you're limited to the union of supported
> features -- 4 primary partitions, no secondaries.

There is a very simple solution to this obscure problem: (if I
understand correctly, you want to dual boot Mac OS X and Linux (and
maybe also Windows?))

use LVM, thus allowing you to have as many volumes as you like in the
partition

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernating To Swap Considered Harmful

2007-07-13 Thread Jeremy Maitin-Shepard

[EMAIL PROTECTED] (Joseph Fannin) writes:

[snip]

 Intel Macs use GPT partition tables, which support a huge number
 of primary partitions, and so don't support secondary partitions.

 32bit Windows does not support GPT, so PC-style MBR partition tables
 must also be used.  GPT was designed to coexist with MBR tools, so
 this mostly works, but you're limited to the union of supported
 features -- 4 primary partitions, no secondaries.

There is a very simple solution to this obscure problem: (if I
understand correctly, you want to dual boot Mac OS X and Linux (and
maybe also Windows?))

use LVM, thus allowing you to have as many volumes as you like in the
partition

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-13 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 Not necessarily.  If we don't put devices into low power states before 
 creating
 the image, that should work just fine (quiesce devices, create the image or
 kexec the new kernel, reprobe devices, save the image, suspend to RAM,
 resume from RAM, continue - or restore from the image if power failed in the
 meantime).  Still, for this purpose, both kernels need to be able to handle 
 the
 same set of devices.

I don't know much about the suspend to RAM, but it seems that it would
indeed be necessary to have a device driver for a device in order to
switch it from e.g. a quiesced state to a low power state.  If, however,
the original kernel already completely turned off the device, then it
seems that the save image kernel shouldn't have to do anything to it
in order to suspend to RAM.  The drawback, though, is that since the old
kernel would have no way (unless the user tells it) to know which
devices should be left quiesced and which should be turned off, it would
have to turn them all off, which would mean spinning up and down the
disks.

On the other hand, being able to build the save image kernel with only
minimal hardware support could save a significant amount of the time
required to boot it.

[snip]

 No, it can't.  For example, it can't access filesystems mounted by the
 hibernated kernel, or they may get corrupted after the restore (if they are
 journaling, it can't even read from them).

That is true, but this also holds for the current hibernate
implementations.

 Which reminds me of one more issue, which is that the image-saving kernel
 won't be able to use these filesystems either, so its modules and user space
 will have to be available from somewhere else (like a RAM disk or dedicated
 partition).  So things get ugly.

This is not the issue that it appears to be, though.  Under the current
hibernate implementations, this very same userspace and set of modules
must be available somewhere else (i.e. an initrd) because it is needed
by the restore path.  Note that under the kexec approach, save and
restore become rather symmetric operations.

 Apart from this, the new kernel's user space cannot blindly modify swap space
 that might be in use by the hibernated kernel.

But it seems easy enough to swapoff in order to completely free up the
swap space.  I suppose the disadvantage is that instead of failing
cleanly if there is insufficient memory, the OOM killer will be invoked
and cause all sorts of havoc.  This suggests that it may indeed be
important to support cooperation with the old kernel on saving the
image sooner, rather than later.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-13 Thread Jeremy Maitin-Shepard

Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 Okay, I have thought it through and I think that, as an initial step, we can 
 do
 something like this:

 - preload the image-saving kernel before hibernation
 - in the hibernation code path replace device_suspend() with the shutting down
 of
   all devices without unregistering them (not very nice, but should be
 sufficient
   for a while)

It seems that the effect of what is done by the current hibernate
implementations is to shutdown all of the devices, but according to
kernel data structures, have it look like the devices were merely
suspended (i.e. device_suspend).  Then in the resume path, the restore
image kernel also calls device_suspend just before jumping to the
hibernated kernel, so all of the devices the restore image kernel knew
about are in the state device_suspend expects them to be in, except that
they were actually suspended by a different kernel, so they might not be
in quite the right state.  There is also the issue that the restore
image kernel might not know about all of the devices; for instance, if
USB support is modular, and, as is likely to be the case, the user
didn't load the USB modules in the restore image kernel from an initrd
or something, then the USB devices will actually be powered off, rather
than suspended.  Despite these apparent discrepancies, it seems that
for many devices (I'm not sure USB devices are included, though),
device_restore happens to do the right thing so that the device is
placed back in the state it needs to be so that the driver can begin
talking to it as it did before, and the device is recognized as the same
device as was there before (since otherwise mounted filesystems backed
by block devices that came back as a different device would cause great
havoc).

Since I recall there being issues with USB devices being recognized as
the same devices post-hibernate-resume, without looking at the code I'm
inclined to believe that the USB drivers still don't end up resuming
from hibernate correctly.

Note that I am describing what is done currently, not what is planned to
be done (i.e. change device_suspend to quiesce and device_resume to
unquiesce).  It seems that ironically, despite everyone believing that
device_suspend/device_resume is incorrect for hibernate, many of the
things that those functions do (like saving the PCI configuration,
perhaps, and then restoring it later, or re-initializing the device) are
actually necessary, especially for modular drivers that won't be loaded
by the restore image kernel.

What needs to be done is for the devices to be shut down (or possibly
just quiesced for a select few, but we won't worry about that
complication until later; in the case of the current implementations,
they should all be quiesced rather than shut down), but whatever
information that will be needed later to reinitialize the device
(ideally the reinitialization should be able to handle the device either
being in a quiesced state or completely off) and recognize it as the
same device must be saved.  This probably means they cannot be
unregistered, as otherwise there would be nothing with which to
associate the saved information.

The resume path needs to use the saved state to reinitialize the device
and recognize it as the same device.  It seems that the existing
reprobing code may not be sufficient for this.  Note that exactly the
same thing must be done on resume for both the current hibernate
implementations and the kexec approach.  It seems that properly
restoring the devices should be relatively easy for the devices that
already get this correct, like IDE devices and basic PCI devices (and
SATA and SCSI devices as well perhaps?), and possibly harder for
Firewire or USB devices.

 - when we've called device_power_down() and save_processor_state(), jump to
   the image-saving kernel and let it run
 - make the image-saving kernel set up everything, save the image without
   starting any user space (we may use the existing image-saving code for this
   purpose, with some modifications) and power off the system (or make it enter
   S4)

I suppose this has the advantage of not requiring that a
kernel-to-userspace interface be created for this purpose.

 - use the existing restoration code to load the image and jump to the
   hibernated kernel

This would again avoid the need for a separate userspace-kernelspace
interface for the purpose, so I agree it could be a useful thing to do
initially.

 - in the restore code patch replace device_resume() with the reprobing of all
   devices.

See my comments above.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-12 Thread Jeremy Maitin-Shepard

Al Boldi <[EMAIL PROTECTED]> writes:

> Mark Lord wrote:
>> Jeremy Maitin-Shepard wrote:
>> > I'll certainly admit the kexec idea is vaporware currently,

> Your idea is starting to become a reality with this thread:
> "[PATCH 0/2] Kexec jump: The first step to kexec base hibernation"

Someone else pointed out that the idea was actually proposed by Andrew
Morton over a year ago, but it didn't get very much consideration then.

It is good to see that quite a few people are thinking about it now,
though, and that Ying Huang has started writing some code.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-12 Thread Jeremy Maitin-Shepard

Mark Lord <[EMAIL PROTECTED]> writes:

[snip]

> Whoops.. wrong half of the script.
> For TuxOnIce in 10 seconds, it does this:

[snip]

I'd argue that for most usage patterns, it doesn't matter all that much
how long it takes to hibernate and power off the system.  What really
matter is that it is extremely reliable, and how fast it takes to
resume.

The reason for this is as follows:

A typical usage pattern of hibernate on a laptop is to shut the lid,
causing the system to start to hibernate, and to place the machine in
the bag.  This is fine, as long as you aren't too rough moving it into
the bag, and the hibernation is extremely reliable (i.e. there is no
chance that it fails to hibernate, and remains powered on.)  Presumably
some additional userspace logic could help here, like start beeping
loudly if the hibernate fails, or perhaps just initiate a shut down, to
avoid the machine overheating in the bag.

Note that in this usage pattern, it doesn't matter how long it takes to
hibernate, because you don't actually wait for it to finish.  The only
waiting occurs when you turn it on, and the resume path should be
essentially exactly the same under kexec hibernate as with the existing
hibernate.

Thus, if kexec hibernate improves reliability (as it might, given that
it eliminates the need for the freezer), it may be worth the slightly
increased hibernate time.  I think the actual amount of extra time it
will take may be very small; a stripped down kernel may only take a
second or two to initialize.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-12 Thread Jeremy Maitin-Shepard

e the uswsusp kernel code for this purpose.

>> >> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
>> >> be hibernate/resume by the normal kernel too. This way, a real
>> >> kexec/boot-up is only needed for the first time.
>> >
>> > I'm not sure what you mean.
>> 
>> he's trying to get fancy again, the best way to speed up the boot of the 
>> kexec kernel is make it smaller and avoid probing for devices (hotplug 
>> should NOT be used for normal suspend situations)

> Still, I believe that we should do our best to use only one kernel (meaning 
> one
> kernel image) here.

It seems that it is not very difficult to make the choice of using a
different kernel or not one that the user can make.  The only extra
thing required to allow a different kernel to be used is to save and
restore the text sections.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-12 Thread Jeremy Maitin-Shepard

 for devices (hotplug 
 should NOT be used for normal suspend situations)

 Still, I believe that we should do our best to use only one kernel (meaning 
 one
 kernel image) here.

It seems that it is not very difficult to make the choice of using a
different kernel or not one that the user can make.  The only extra
thing required to allow a different kernel to be used is to save and
restore the text sections.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

2007-07-12 Thread Jeremy Maitin-Shepard

Mark Lord [EMAIL PROTECTED] writes:

[snip]

 Whoops.. wrong half of the script.
 For TuxOnIce in 10 seconds, it does this:

[snip]

I'd argue that for most usage patterns, it doesn't matter all that much
how long it takes to hibernate and power off the system.  What really
matter is that it is extremely reliable, and how fast it takes to
resume.

The reason for this is as follows:

A typical usage pattern of hibernate on a laptop is to shut the lid,
causing the system to start to hibernate, and to place the machine in
the bag.  This is fine, as long as you aren't too rough moving it into
the bag, and the hibernation is extremely reliable (i.e. there is no
chance that it fails to hibernate, and remains powered on.)  Presumably
some additional userspace logic could help here, like start beeping
loudly if the hibernate fails, or perhaps just initiate a shut down, to
avoid the machine overheating in the bag.

Note that in this usage pattern, it doesn't matter how long it takes to
hibernate, because you don't actually wait for it to finish.  The only
waiting occurs when you turn it on, and the resume path should be
essentially exactly the same under kexec hibernate as with the existing
hibernate.

Thus, if kexec hibernate improves reliability (as it might, given that
it eliminates the need for the freezer), it may be worth the slightly
increased hibernate time.  I think the actual amount of extra time it
will take may be very small; a stripped down kernel may only take a
second or two to initialize.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-12 Thread Jeremy Maitin-Shepard

Al Boldi [EMAIL PROTECTED] writes:

 Mark Lord wrote:
 Jeremy Maitin-Shepard wrote:
  I'll certainly admit the kexec idea is vaporware currently,

 Your idea is starting to become a reality with this thread:
 [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

Someone else pointed out that the idea was actually proposed by Andrew
Morton over a year ago, but it didn't get very much consideration then.

It is good to see that quite a few people are thinking about it now,
though, and that Ying Huang has started writing some code.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sysrq-t dumps of s2ram/fuse deadlock

2007-07-11 Thread Jeremy Maitin-Shepard

Matthew Garrett <[EMAIL PROTECTED]> writes:

> On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote:
>> Hi!
>> 
>> Can we get them? They are neccessary for debugging 'what in suspend
>> calls fuse' problem. And yes, that problem is there even when you
>> remove freezer.

> I can produce them, but haven't managed to do that in any way that lets 
> me get them off the system yet.

If you can see them, then perhaps you could use a digital camera or just
copy the text manually.

[snip]


-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-11 Thread Jeremy Maitin-Shepard

Nigel Cunningham <[EMAIL PROTECTED]> writes:

[snip]

> No other _proper_ solutions have been proposed. Everyone who suggests 
> removing 
> the freezer also suggests implementing it all over again. It might be sending 
> SIGSTOP to everything. It might be shifting the desk chairs around and 
> creating a completely new kernel context, but they always have the same 
> goal - stopping the existing activity, and they all come with their own 
> issues (even if they're not obvious yet because the alternatives are 
> currently vapourware to one extent or another).

I'll certainly admit the kexec idea is vaporware currently, but it does
differ in a significant way from freezer-based approaches, such that I
don't think it should be referred to as just another implementation of a
freezer.  Specifically, it doesn't require that the "old kernel" be in a
"consistent" state to a greater extent than suspend to ram; it is the
case that all of the devices must be quiesced or shut down to some
extent, but doing this without races and deadlocks (and without the
freezer) is certainly very, very similar to what needs to be done for
suspend to ram, which will need to be solved anyway.  Unlike the
existing hibernate approaches, however, it will not be necessary to use
any of the driver infrastructure once switched to the "save image"
kernel, and thus it will not matter what locks are held, for instance.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-11 Thread Jeremy Maitin-Shepard

Nigel Cunningham [EMAIL PROTECTED] writes:

[snip]

 No other _proper_ solutions have been proposed. Everyone who suggests 
 removing 
 the freezer also suggests implementing it all over again. It might be sending 
 SIGSTOP to everything. It might be shifting the desk chairs around and 
 creating a completely new kernel context, but they always have the same 
 goal - stopping the existing activity, and they all come with their own 
 issues (even if they're not obvious yet because the alternatives are 
 currently vapourware to one extent or another).

I'll certainly admit the kexec idea is vaporware currently, but it does
differ in a significant way from freezer-based approaches, such that I
don't think it should be referred to as just another implementation of a
freezer.  Specifically, it doesn't require that the old kernel be in a
consistent state to a greater extent than suspend to ram; it is the
case that all of the devices must be quiesced or shut down to some
extent, but doing this without races and deadlocks (and without the
freezer) is certainly very, very similar to what needs to be done for
suspend to ram, which will need to be solved anyway.  Unlike the
existing hibernate approaches, however, it will not be necessary to use
any of the driver infrastructure once switched to the save image
kernel, and thus it will not matter what locks are held, for instance.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sysrq-t dumps of s2ram/fuse deadlock

2007-07-11 Thread Jeremy Maitin-Shepard

Matthew Garrett [EMAIL PROTECTED] writes:

 On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote:
 Hi!
 
 Can we get them? They are neccessary for debugging 'what in suspend
 calls fuse' problem. And yes, that problem is there even when you
 remove freezer.

 I can produce them, but haven't managed to do that in any way that lets 
 me get them off the system yet.

If you can see them, then perhaps you could use a digital camera or just
copy the text manually.

[snip]


-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-10 Thread Jeremy Maitin-Shepard

Al Boldi <[EMAIL PROTECTED]> writes:

[snip]

> Exactly, there may well be overlap between Xen and the kexec hibernate 
> approach, for which code structures should definitely be leveraged.

> And, I wasn't suggesting to use Xen as an HV, which wouldn't really solve 
> anything, but was trying to point out that there is no need to maintain two 
> separate kernels, much like Xen, which inlines two modes into the kernel:  
> host and guest.

With relocatable kernels, or by simply using the "backup the first 16 or
64 MB of physical memory" approach, the same kernel image could be used
both as the normal kernel as the "save image" kernel.  The actual
behavior of the system would likely depend on kernel command-line
parameters or an initrd, rather than being hard-coded into the kernel
image.  If it is made a requirement that the same kernel be used, then
as is done currently, the text sections need not be touched at all.
There is a significant advantage, however, to using a different kernel:
unneeded drivers can be compiled out, leading to faster load times.

> So kexec really seems the way to go, which mimics the way APM used to do it, 
> which is known to work flawlessly with minimal OS involvement.

Now all that is needed is someone with enough time and interest to
implement it.  :)

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-10 Thread Jeremy Maitin-Shepard

Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:

> Jeremy Maitin-Shepard wrote:
>> I don't know a whole lot about xen, but it seems that one issue with
>> this approach is that it requires you run your system under a hypervisor
>> at all times, which may introduce some overhead.
>> 

> No, I don't think that's what Al is proposing.  The kernel-internal interfaces
> we've put in place to make Xen work could be reused to do some of the things
> you're talking about.  In particular, a kernel running under Xen has to be 
> able
> to deal with non-contiguous physical pages, and reusing the same pagetable 
> hooks
> would allow a kexeced kernel to run happily out of any random assortment of
> pages you manage to allocate for it.

I suppose that would be an interesting thing to look into.  Another
possible approach for having the kernel run in non-contiguous memory is
to specify a memmap exactly to the kernel on the command-line, as I
believe is done for the crashdump kernels currently.  It would, of
course, require an extremely long and complicated memmap specification
in general.  I recall reading, though, that even with the relocatable
kernel support, there are still significant alignment requirements for
loading the kernel.  In particular, I seem to recall that it is
necessary to load an x86 kernel at maybe a 16MB boundary, and on other
platforms the alignment requirements may be even more restrictive.  In
addition, I recall that the Linux boot procedure on x86 and on some
other platforms necessarily uses certain low-address memory, like the
first 640K, which must be backed up regardless.

For these reasons, it seems that it would be easiest to simply backup
the first e.g. 16 or 64 MB of memory, and not have to worry about
loading the kernel at a non-standard address and specifying a
complicated exact memmap.  Someone might prove me wrong, though.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-10 Thread Jeremy Maitin-Shepard

Jeremy Fitzhardinge [EMAIL PROTECTED] writes:

 Jeremy Maitin-Shepard wrote:
 I don't know a whole lot about xen, but it seems that one issue with
 this approach is that it requires you run your system under a hypervisor
 at all times, which may introduce some overhead.
 

 No, I don't think that's what Al is proposing.  The kernel-internal interfaces
 we've put in place to make Xen work could be reused to do some of the things
 you're talking about.  In particular, a kernel running under Xen has to be 
 able
 to deal with non-contiguous physical pages, and reusing the same pagetable 
 hooks
 would allow a kexeced kernel to run happily out of any random assortment of
 pages you manage to allocate for it.

I suppose that would be an interesting thing to look into.  Another
possible approach for having the kernel run in non-contiguous memory is
to specify a memmap exactly to the kernel on the command-line, as I
believe is done for the crashdump kernels currently.  It would, of
course, require an extremely long and complicated memmap specification
in general.  I recall reading, though, that even with the relocatable
kernel support, there are still significant alignment requirements for
loading the kernel.  In particular, I seem to recall that it is
necessary to load an x86 kernel at maybe a 16MB boundary, and on other
platforms the alignment requirements may be even more restrictive.  In
addition, I recall that the Linux boot procedure on x86 and on some
other platforms necessarily uses certain low-address memory, like the
first 640K, which must be backed up regardless.

For these reasons, it seems that it would be easiest to simply backup
the first e.g. 16 or 64 MB of memory, and not have to worry about
loading the kernel at a non-standard address and specifying a
complicated exact memmap.  Someone might prove me wrong, though.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-10 Thread Jeremy Maitin-Shepard

Al Boldi [EMAIL PROTECTED] writes:

[snip]

 Exactly, there may well be overlap between Xen and the kexec hibernate 
 approach, for which code structures should definitely be leveraged.

 And, I wasn't suggesting to use Xen as an HV, which wouldn't really solve 
 anything, but was trying to point out that there is no need to maintain two 
 separate kernels, much like Xen, which inlines two modes into the kernel:  
 host and guest.

With relocatable kernels, or by simply using the backup the first 16 or
64 MB of physical memory approach, the same kernel image could be used
both as the normal kernel as the save image kernel.  The actual
behavior of the system would likely depend on kernel command-line
parameters or an initrd, rather than being hard-coded into the kernel
image.  If it is made a requirement that the same kernel be used, then
as is done currently, the text sections need not be touched at all.
There is a significant advantage, however, to using a different kernel:
unneeded drivers can be compiled out, leading to faster load times.

 So kexec really seems the way to go, which mimics the way APM used to do it, 
 which is known to work flawlessly with minimal OS involvement.

Now all that is needed is someone with enough time and interest to
implement it.  :)

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-09 Thread Jeremy Maitin-Shepard

Al Boldi <[EMAIL PROTECTED]> writes:

[snip]

> Who said we need two kernels?  You could inline it like Xen, which would give 
> you one kernel with two modes:  normal and hibernate.

I don't know a whole lot about xen, but it seems that one issue with
this approach is that it requires you run your system under a hypervisor
at all times, which may introduce some overhead.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks

2007-07-09 Thread Jeremy Maitin-Shepard

Alan Stern <[EMAIL PROTECTED]> writes:

> On Mon, 9 Jul 2007, Jeremy Maitin-Shepard wrote:
>> Pavel Machek <[EMAIL PROTECTED]> writes:
>> 
>> [snip]
>> 
>> > I don't know how to do that mechanism... but if we knew where to trap
>> > filesystem writes, we could simply freeze at that point, and at that
>> > point only, no?
>> 
>> Any operation at all that has an external effect must not occur after
>> the snapshot is made; otherwise, there will be random hard-to-find
>> corruptions and other problems occurring as a result.  Thus, for
>> example, any writes (either directly or indirectly through e.g. a
>> filesystem) to non-volatile storage, any network traffic, any
>> communication with hardware like a printer must be prevented after the
>> snapshot.

> You have forgotten one critical point: The writes to save the snapshot 
> image must be allowed.  That's what makes it really hard.

Well, I didn't forget about that, although my language may have been a
bit ambiguous.  I was referring only to the operations that are done by
normal (i.e. non-hibernate) portions of the system and which are not
explicitly for the purpose of hibernating the system.  It is very
difficult to maintain this guarantee while also attempting to reuse the
same infrastructure that is supposed to not be processing any "normal"
requests in order to write the snapshot.  The kdump approach handily
avoids this problem by *not* reusing the same infrastructure while still
allowing complete flexibility (i.e. not depending on a
drivers/suspend/ide-simple).

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-09 Thread Jeremy Maitin-Shepard

Oliver Neukum <[EMAIL PROTECTED]> writes:

> Am Montag, 9. Juli 2007 schrieb Jeremy Maitin-Shepard:
>> Oliver Neukum <[EMAIL PROTECTED]> writes:
>> 
>> [snip]
>> 
>> > Hm, once the new kernel is booted, this decision is irrevocable, isn't it?
>> > Is there any way to deal with errors by handing control back?
>> 
>> Returning to the old kernel can be done by telling drivers to set the
>> hardware to the appropriate state, then copying the backed up memory
>> back to the beginning of physical memory, and finally jumping to the old
>> kernel.  It would be much like what is done to resume from hibernation.

> If you can do that, why load a new kernel image?

The challenges in doing that are analogous to the challenges in
suspending to RAM, for which it has been agreed that drivers should be
fixed such that the freezer is not necessary.

The hard part of hibernate is not creating the snapshot; rather, the
hard part is writing the snapshot, and allowing the user some
flexibility in how and where the snapshot is written.  The kdump
approach allows complete flexibility in writing the snapshot
(essentially any kernel or user space facility can be used), while not
interfering at all with the snapshot state.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design

2007-07-09 Thread Jeremy Maitin-Shepard

Pavel Machek <[EMAIL PROTECTED]> writes:

[snip]

>> why do you say that neither would work for the "lets hibernate my 
>> notebook" case?

> Both would work. One would eat 8-64MB of your RAM, permanently;

As I have stated in other messages, the kdump approach would not waste
any RAM permanently.  The reason that kdump must reserve memory at boot
is that on panic, it cannot attempt to nicely stop drivers, and
consequently there might be ongoing DMAs that could clobber anything but
the reserved area; this reason does not apply to hibernate, though.
I'll quote a previous message in which I stated a solution that can be
used:

Immediately before jumping to the new kernel, the first X bytes (where X
is the amount of memory the new kernel will get, typically 16MB or 64MB)
of physical memory are backed up into the arbitrary discontiguous pages
that are made available.  This will not take very long, because copying
even 64MB of memory is extremely fast.  Then the new kernel is free to
use the first X bytes of contiguous physical memory.  Problem solved.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-09 Thread Jeremy Maitin-Shepard

Oliver Neukum <[EMAIL PROTECTED]> writes:

[snip]

> Hm, once the new kernel is booted, this decision is irrevocable, isn't it?
> Is there any way to deal with errors by handing control back?

Returning to the old kernel can be done by telling drivers to set the
hardware to the appropriate state, then copying the backed up memory
back to the beginning of physical memory, and finally jumping to the old
kernel.  It would be much like what is done to resume from hibernation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-09 Thread Jeremy Maitin-Shepard

Oliver Neukum [EMAIL PROTECTED] writes:

[snip]

 Hm, once the new kernel is booted, this decision is irrevocable, isn't it?
 Is there any way to deal with errors by handing control back?

Returning to the old kernel can be done by telling drivers to set the
hardware to the appropriate state, then copying the backed up memory
back to the beginning of physical memory, and finally jumping to the old
kernel.  It would be much like what is done to resume from hibernation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design

2007-07-09 Thread Jeremy Maitin-Shepard

Pavel Machek [EMAIL PROTECTED] writes:

[snip]

 why do you say that neither would work for the lets hibernate my 
 notebook case?

 Both would work. One would eat 8-64MB of your RAM, permanently;

As I have stated in other messages, the kdump approach would not waste
any RAM permanently.  The reason that kdump must reserve memory at boot
is that on panic, it cannot attempt to nicely stop drivers, and
consequently there might be ongoing DMAs that could clobber anything but
the reserved area; this reason does not apply to hibernate, though.
I'll quote a previous message in which I stated a solution that can be
used:

Immediately before jumping to the new kernel, the first X bytes (where X
is the amount of memory the new kernel will get, typically 16MB or 64MB)
of physical memory are backed up into the arbitrary discontiguous pages
that are made available.  This will not take very long, because copying
even 64MB of memory is extremely fast.  Then the new kernel is free to
use the first X bytes of contiguous physical memory.  Problem solved.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-09 Thread Jeremy Maitin-Shepard

Oliver Neukum [EMAIL PROTECTED] writes:

 Am Montag, 9. Juli 2007 schrieb Jeremy Maitin-Shepard:
 Oliver Neukum [EMAIL PROTECTED] writes:
 
 [snip]
 
  Hm, once the new kernel is booted, this decision is irrevocable, isn't it?
  Is there any way to deal with errors by handing control back?
 
 Returning to the old kernel can be done by telling drivers to set the
 hardware to the appropriate state, then copying the backed up memory
 back to the beginning of physical memory, and finally jumping to the old
 kernel.  It would be much like what is done to resume from hibernation.

 If you can do that, why load a new kernel image?

The challenges in doing that are analogous to the challenges in
suspending to RAM, for which it has been agreed that drivers should be
fixed such that the freezer is not necessary.

The hard part of hibernate is not creating the snapshot; rather, the
hard part is writing the snapshot, and allowing the user some
flexibility in how and where the snapshot is written.  The kdump
approach allows complete flexibility in writing the snapshot
(essentially any kernel or user space facility can be used), while not
interfering at all with the snapshot state.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks

2007-07-09 Thread Jeremy Maitin-Shepard

Alan Stern [EMAIL PROTECTED] writes:

 On Mon, 9 Jul 2007, Jeremy Maitin-Shepard wrote:
 Pavel Machek [EMAIL PROTECTED] writes:
 
 [snip]
 
  I don't know how to do that mechanism... but if we knew where to trap
  filesystem writes, we could simply freeze at that point, and at that
  point only, no?
 
 Any operation at all that has an external effect must not occur after
 the snapshot is made; otherwise, there will be random hard-to-find
 corruptions and other problems occurring as a result.  Thus, for
 example, any writes (either directly or indirectly through e.g. a
 filesystem) to non-volatile storage, any network traffic, any
 communication with hardware like a printer must be prevented after the
 snapshot.

 You have forgotten one critical point: The writes to save the snapshot 
 image must be allowed.  That's what makes it really hard.

Well, I didn't forget about that, although my language may have been a
bit ambiguous.  I was referring only to the operations that are done by
normal (i.e. non-hibernate) portions of the system and which are not
explicitly for the purpose of hibernating the system.  It is very
difficult to maintain this guarantee while also attempting to reuse the
same infrastructure that is supposed to not be processing any normal
requests in order to write the snapshot.  The kdump approach handily
avoids this problem by *not* reusing the same infrastructure while still
allowing complete flexibility (i.e. not depending on a
drivers/suspend/ide-simple).

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-09 Thread Jeremy Maitin-Shepard

Al Boldi [EMAIL PROTECTED] writes:

[snip]

 Who said we need two kernels?  You could inline it like Xen, which would give 
 you one kernel with two modes:  normal and hibernate.

I don't know a whole lot about xen, but it seems that one issue with
this approach is that it requires you run your system under a hypervisor
at all times, which may introduce some overhead.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Nick Piggin <[EMAIL PROTECTED]> writes:

> Jeremy Maitin-Shepard wrote:
>> Nick Piggin <[EMAIL PROTECTED]> writes:

>>> This is the Morton method, isn't it? :) I remember it sounding like a
>>> very good idea when he brought it up, but I can't remember the details
>>> of why it was rejected or what the problems were.
>> 
>> 
>> Perhaps he did bring it up before I did.  Please forward me a link to
>> the thread or other reference if you can find it, as I'd be interested
>> in reading it.

> Sent in the next mail.

Thanks.  I've started reading over the thread.

>>> I suspect that freeing memory on the fly for the new kernel
>>> would be non-trivial (but possible), however simply having a reserve
>>> RAM region for the new kernel would be fine for a first step.
>> 
>> 
>> Freeing memory on the fly should be extremely easy for the kernel (this
>> is precisely what it does when it needs to satisfy an allocation).  Note
>> that the memory allocated need not be contiguous.

> Yes, I have a rough idea about how page reclaim works. But I just
> mean it would not be trivial to load the new kernel into physically
> discontiguous memory. Possible of course, but I don't think kexec or
> the setup code could quite cope ATM.

It would indeed be a pain for the new kernel to be loaded and have to
use discontiguous memory.  The trick is, though, that this is not
necessary.  Immediately before jumping to the new kernel, the first X
bytes (where X is the amount of memory the new kernel will get,
typically 16MB or 64MB) of physical memory are backed up into the
arbitrary discontiguous pages that are made available.  This will not
take very long, because copying even 64MB of memory is extremely fast.
Then the new kernel is free to use the first X bytes of contiguous
physical memory.  Problem solved.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Nick Piggin <[EMAIL PROTECTED]> writes:

> Jeremy Maitin-Shepard wrote:
>> Al Boldi <[EMAIL PROTECTED]> writes:
>> 
>> 
>>> Pavel Machek wrote:
>>> 
>>>> We are stuck with refrigerator for now, and at least for hibernation,
>>>> I don't see any feasible alternative.
>> 
>> 
>>> Feasible alternative?
>> 
>> 
>> I posted such an alternative to the list a short time ago: hibenrating
>> from a *new* kernel space/user space that is created by loading a new
>> kernel in a manner similar to what is done for kexec crashdumps.  Unlike
>> kexec crashdumps, however, it would not require reserving any memory at
>> boot, because the necessary memory (maybe 16MB or 64MB) can be freed
>> just before hibernating, and device drivers can be properly stopped so
>> that DMAs don't stomp over certain memory.

> This is the Morton method, isn't it? :) I remember it sounding like a
> very good idea when he brought it up, but I can't remember the details
> of why it was rejected or what the problems were.

Perhaps he did bring it up before I did.  Please forward me a link to
the thread or other reference if you can find it, as I'd be interested
in reading it.


>> This approach eliminates the need for the freezer, as it would make
>> hibernate look a lot a bit like suspend to ram from the perspective of
>> the "old" kernel (the kernel being hibernated), as the hibernate
>> operation itself would be completely atomic from the perspective of the
>> "old" kernel.  That is not to say, of course, that any code paths would
>> actually be shared, or that the drivers would do the same things
>> (because they probably would not).

> Well it basically is suspend to RAM with the additional step that a
> new kernel gets booted and writes out the data from RAM to disk then
> shuts down.

There is the key difference, though, that the drivers should do rather
different things.  In particular, rather than place the hardware in a
low-power mode, it should place it in some state such that the new
kernel being loaded can handle it.

> I suspect that freeing memory on the fly for the new kernel
> would be non-trivial (but possible), however simply having a reserve
> RAM region for the new kernel would be fine for a first step.

Freeing memory on the fly should be extremely easy for the kernel (this
is precisely what it does when it needs to satisfy an allocation).  Note
that the memory allocated need not be contiguous.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks

2007-07-08 Thread Jeremy Maitin-Shepard

Pavel Machek <[EMAIL PROTECTED]> writes:

[snip]

> I don't know how to do that mechanism... but if we knew where to trap
> filesystem writes, we could simply freeze at that point, and at that
> point only, no?

Any operation at all that has an external effect must not occur after
the snapshot is made; otherwise, there will be random hard-to-find
corruptions and other problems occurring as a result.  Thus, for
example, any writes (either directly or indirectly through e.g. a
filesystem) to non-volatile storage, any network traffic, any
communication with hardware like a printer must be prevented after the
snapshot.  It seems, though, that in general the kernel will have no way
to know which operations are safe, and which are not safe.

(This is why the whole "proper filesystem snapshot support is the
solution" argument is bogus.)

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Al Boldi <[EMAIL PROTECTED]> writes:

> Pavel Machek wrote:
>> We are stuck with refrigerator for now, and at least for hibernation,
>> I don't see any feasible alternative.

> Feasible alternative?

I posted such an alternative to the list a short time ago: hibenrating
from a *new* kernel space/user space that is created by loading a new
kernel in a manner similar to what is done for kexec crashdumps.  Unlike
kexec crashdumps, however, it would not require reserving any memory at
boot, because the necessary memory (maybe 16MB or 64MB) can be freed
just before hibernating, and device drivers can be properly stopped so
that DMAs don't stomp over certain memory.

This approach eliminates the need for the freezer, as it would make
hibernate look a lot a bit like suspend to ram from the perspective of
the "old" kernel (the kernel being hibernated), as the hibernate
operation itself would be completely atomic from the perspective of the
"old" kernel.  That is not to say, of course, that any code paths would
actually be shared, or that the drivers would do the same things
(because they probably would not).

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Al Boldi [EMAIL PROTECTED] writes:

 Pavel Machek wrote:
 We are stuck with refrigerator for now, and at least for hibernation,
 I don't see any feasible alternative.

 Feasible alternative?

I posted such an alternative to the list a short time ago: hibenrating
from a *new* kernel space/user space that is created by loading a new
kernel in a manner similar to what is done for kexec crashdumps.  Unlike
kexec crashdumps, however, it would not require reserving any memory at
boot, because the necessary memory (maybe 16MB or 64MB) can be freed
just before hibernating, and device drivers can be properly stopped so
that DMAs don't stomp over certain memory.

This approach eliminates the need for the freezer, as it would make
hibernate look a lot a bit like suspend to ram from the perspective of
the old kernel (the kernel being hibernated), as the hibernate
operation itself would be completely atomic from the perspective of the
old kernel.  That is not to say, of course, that any code paths would
actually be shared, or that the drivers would do the same things
(because they probably would not).

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks

2007-07-08 Thread Jeremy Maitin-Shepard

Pavel Machek [EMAIL PROTECTED] writes:

[snip]

 I don't know how to do that mechanism... but if we knew where to trap
 filesystem writes, we could simply freeze at that point, and at that
 point only, no?

Any operation at all that has an external effect must not occur after
the snapshot is made; otherwise, there will be random hard-to-find
corruptions and other problems occurring as a result.  Thus, for
example, any writes (either directly or indirectly through e.g. a
filesystem) to non-volatile storage, any network traffic, any
communication with hardware like a printer must be prevented after the
snapshot.  It seems, though, that in general the kernel will have no way
to know which operations are safe, and which are not safe.

(This is why the whole proper filesystem snapshot support is the
solution argument is bogus.)

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Nick Piggin [EMAIL PROTECTED] writes:

 Jeremy Maitin-Shepard wrote:
 Al Boldi [EMAIL PROTECTED] writes:
 
 
 Pavel Machek wrote:
 
 We are stuck with refrigerator for now, and at least for hibernation,
 I don't see any feasible alternative.
 
 
 Feasible alternative?
 
 
 I posted such an alternative to the list a short time ago: hibenrating
 from a *new* kernel space/user space that is created by loading a new
 kernel in a manner similar to what is done for kexec crashdumps.  Unlike
 kexec crashdumps, however, it would not require reserving any memory at
 boot, because the necessary memory (maybe 16MB or 64MB) can be freed
 just before hibernating, and device drivers can be properly stopped so
 that DMAs don't stomp over certain memory.

 This is the Morton method, isn't it? :) I remember it sounding like a
 very good idea when he brought it up, but I can't remember the details
 of why it was rejected or what the problems were.

Perhaps he did bring it up before I did.  Please forward me a link to
the thread or other reference if you can find it, as I'd be interested
in reading it.


 This approach eliminates the need for the freezer, as it would make
 hibernate look a lot a bit like suspend to ram from the perspective of
 the old kernel (the kernel being hibernated), as the hibernate
 operation itself would be completely atomic from the perspective of the
 old kernel.  That is not to say, of course, that any code paths would
 actually be shared, or that the drivers would do the same things
 (because they probably would not).

 Well it basically is suspend to RAM with the additional step that a
 new kernel gets booted and writes out the data from RAM to disk then
 shuts down.

There is the key difference, though, that the drivers should do rather
different things.  In particular, rather than place the hardware in a
low-power mode, it should place it in some state such that the new
kernel being loaded can handle it.

 I suspect that freeing memory on the fly for the new kernel
 would be non-trivial (but possible), however simply having a reserve
 RAM region for the new kernel would be fine for a first step.

Freeing memory on the fly should be extremely easy for the kernel (this
is precisely what it does when it needs to satisfy an allocation).  Note
that the memory allocated need not be contiguous.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Nick Piggin [EMAIL PROTECTED] writes:

 Jeremy Maitin-Shepard wrote:
 Nick Piggin [EMAIL PROTECTED] writes:

 This is the Morton method, isn't it? :) I remember it sounding like a
 very good idea when he brought it up, but I can't remember the details
 of why it was rejected or what the problems were.
 
 
 Perhaps he did bring it up before I did.  Please forward me a link to
 the thread or other reference if you can find it, as I'd be interested
 in reading it.

 Sent in the next mail.

Thanks.  I've started reading over the thread.

 I suspect that freeing memory on the fly for the new kernel
 would be non-trivial (but possible), however simply having a reserve
 RAM region for the new kernel would be fine for a first step.
 
 
 Freeing memory on the fly should be extremely easy for the kernel (this
 is precisely what it does when it needs to satisfy an allocation).  Note
 that the memory allocated need not be contiguous.

 Yes, I have a rough idea about how page reclaim works. But I just
 mean it would not be trivial to load the new kernel into physically
 discontiguous memory. Possible of course, but I don't think kexec or
 the setup code could quite cope ATM.

It would indeed be a pain for the new kernel to be loaded and have to
use discontiguous memory.  The trick is, though, that this is not
necessary.  Immediately before jumping to the new kernel, the first X
bytes (where X is the amount of memory the new kernel will get,
typically 16MB or 64MB) of physical memory are backed up into the
arbitrary discontiguous pages that are made available.  This will not
take very long, because copying even 64MB of memory is extremely fast.
Then the new kernel is free to use the first X bytes of contiguous
physical memory.  Problem solved.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Jeremy Maitin-Shepard

Benjamin Herrenschmidt <[EMAIL PROTECTED]> writes:

[snip]

> At the end of the day, I stand my ground, the freezer cannot be made
> reliable without massive infrastructure changes or giving up on very
> useful features such as fuse among others. Besides, it only partially
> "hides" the problem of requests going to drivers, thus it's a bad
> solutions.

I agree that the freezer absolutely should not be used for suspend to
ram ("suspend"), since it is unnecessary with properly written drivers,
which are important to have anyway.  It seems that it is indeed the
consensus that it will be phased out sooner or later.

It does seem that the current device suspend interface does not tell the
drivers enough, since as discussed, they need to know whether to merely
block if they receive a request while suspended (as should be done while
initiating a suspend to ram), or if they should wake up the device (as
should be done if a suspend to ram is not in progress).  Clearly these
two cases need to be addressed by every driver supporting suspend/resume
(but possibly indirectly if the subsystem handles it for them).

The current hibernate approach used by all of the existing
implementations for Linux seems to depend fundamentally on the freezer,
though, in order to actually save the system state.  Thus, it will still
be necessary to fix all of the issues with the freezer, or adopt an
alternate hibernate approach (which is unlikely).  Unfortunately, even
leaving kernel threads and certain drivers running after the snapshot is
taken means that the saved image isn't completely correct, and the
freezer cannot help with these issues.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Jeremy Maitin-Shepard

Matthew Garrett <[EMAIL PROTECTED]> writes:

> On Thu, Jul 05, 2007 at 04:09:24PM +0200, Rafael J. Wysocki wrote:
>> On Thursday, 5 July 2007 15:46, Matthew Garrett wrote:
>> > I have a model for STD that avoids the need to freeze the entirity of 
>> > userspace, but I need to find some more time to flesh it out.
>> 
>> You can just describe it, as far as I'm concerned. :-)

[snip: new hibernate idea]

I think my kexec-based hibernate idea is simpler and more feasible than
this approach, and also avoids the freezer.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Jeremy Maitin-Shepard

Benjamin Herrenschmidt [EMAIL PROTECTED] writes:

[snip]

 At the end of the day, I stand my ground, the freezer cannot be made
 reliable without massive infrastructure changes or giving up on very
 useful features such as fuse among others. Besides, it only partially
 hides the problem of requests going to drivers, thus it's a bad
 solutions.

I agree that the freezer absolutely should not be used for suspend to
ram (suspend), since it is unnecessary with properly written drivers,
which are important to have anyway.  It seems that it is indeed the
consensus that it will be phased out sooner or later.

It does seem that the current device suspend interface does not tell the
drivers enough, since as discussed, they need to know whether to merely
block if they receive a request while suspended (as should be done while
initiating a suspend to ram), or if they should wake up the device (as
should be done if a suspend to ram is not in progress).  Clearly these
two cases need to be addressed by every driver supporting suspend/resume
(but possibly indirectly if the subsystem handles it for them).

The current hibernate approach used by all of the existing
implementations for Linux seems to depend fundamentally on the freezer,
though, in order to actually save the system state.  Thus, it will still
be necessary to fix all of the issues with the freezer, or adopt an
alternate hibernate approach (which is unlikely).  Unfortunately, even
leaving kernel threads and certain drivers running after the snapshot is
taken means that the saved image isn't completely correct, and the
freezer cannot help with these issues.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-05 Thread Jeremy Maitin-Shepard

Matthew Garrett [EMAIL PROTECTED] writes:

 On Thu, Jul 05, 2007 at 04:09:24PM +0200, Rafael J. Wysocki wrote:
 On Thursday, 5 July 2007 15:46, Matthew Garrett wrote:
  I have a model for STD that avoids the need to freeze the entirity of 
  userspace, but I need to find some more time to flesh it out.
 
 You can just describe it, as far as I'm concerned. :-)

[snip: new hibernate idea]

I think my kexec-based hibernate idea is simpler and more feasible than
this approach, and also avoids the freezer.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3

2007-06-14 Thread Jeremy Maitin-Shepard

Carlo Wood <[EMAIL PROTECTED]> writes:

> On Thu, Jun 14, 2007 at 01:09:46PM -0700, Linus Torvalds wrote:
>> I'm the original author, and I selected the GPLv2 for Linux.
> [...]
>> I'm not going to bother discussing this any more. You don't seem to 
>> respect my right to choose the license for my own code.

> This is the main reason I dislike GPLwhatever: there is no notion
> of "orginal author". You might have written 99% of the code, that
> doesn't matter. You have no rights whatsoever once you release
> something under the GPL (no more than ANYOne else).

You retain the copyright, and in particular the right to relicense.
Only if you make the mistake of including the "or any later version"
phrase do you allow others to redistribute the work under a different
version of the GPL.  Although this provision may seem slightly
convenient to authors, its effect is to grant a very large amount of
relicensing permission to the FSF.  It almost certainly doesn't make
sense to place that much trust in a single organization.

> The GPL is nice for the community, and for the users - but very,
> very bad towards it's authors (taking all and every right you might
> have). If John Doe wants to re-release the whole kernel under
> GPLv3, then all he needs is a website and some bandwidth.

Well, he also needs one tiny little extra thing: the permission of every
copyright holder in Linux.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3

2007-06-14 Thread Jeremy Maitin-Shepard

Alexandre Oliva <[EMAIL PROTECTED]> writes:

> On Jun 14, 2007, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>> On Thu, 14 Jun 2007, Alexandre Oliva wrote:
>>> 
>>> Hmm...  So, if someone takes one of the many GPLv2+ contributions and
>>> makes improvements under GPLv3+, you're going to make an effort to
>>> accept them, rather than rejecting them because they're under the
>>> GPLv3?

>> You *cannot* make GPLv3-only contributions to the kernel.

> I can make improvements to GPLv2+ files under GPLv3 (or rather will,
> after GPLv3 is published).

You can do that, but you won't be able to distribute those changes along
with the rest of the kernel.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3

2007-06-14 Thread Jeremy Maitin-Shepard

Alexandre Oliva [EMAIL PROTECTED] writes:

 On Jun 14, 2007, Linus Torvalds [EMAIL PROTECTED] wrote:
 On Thu, 14 Jun 2007, Alexandre Oliva wrote:
 
 Hmm...  So, if someone takes one of the many GPLv2+ contributions and
 makes improvements under GPLv3+, you're going to make an effort to
 accept them, rather than rejecting them because they're under the
 GPLv3?

 You *cannot* make GPLv3-only contributions to the kernel.

 I can make improvements to GPLv2+ files under GPLv3 (or rather will,
 after GPLv3 is published).

You can do that, but you won't be able to distribute those changes along
with the rest of the kernel.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3

2007-06-14 Thread Jeremy Maitin-Shepard

Carlo Wood [EMAIL PROTECTED] writes:

 On Thu, Jun 14, 2007 at 01:09:46PM -0700, Linus Torvalds wrote:
 I'm the original author, and I selected the GPLv2 for Linux.
 [...]
 I'm not going to bother discussing this any more. You don't seem to 
 respect my right to choose the license for my own code.

 This is the main reason I dislike GPLwhatever: there is no notion
 of orginal author. You might have written 99% of the code, that
 doesn't matter. You have no rights whatsoever once you release
 something under the GPL (no more than ANYOne else).

You retain the copyright, and in particular the right to relicense.
Only if you make the mistake of including the or any later version
phrase do you allow others to redistribute the work under a different
version of the GPL.  Although this provision may seem slightly
convenient to authors, its effect is to grant a very large amount of
relicensing permission to the FSF.  It almost certainly doesn't make
sense to place that much trust in a single organization.

 The GPL is nice for the community, and for the users - but very,
 very bad towards it's authors (taking all and every right you might
 have). If John Doe wants to re-release the whole kernel under
 GPLv3, then all he needs is a website and some bandwidth.

Well, he also needs one tiny little extra thing: the permission of every
copyright holder in Linux.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

Xavier Bestel <[EMAIL PROTECTED]> writes:

[snip]

> If I were helping you coding I'd suggest to only concentrate on having
> your project work on standard filesystems, and then when it works maybe
> think about suspending on crypto-over-loop-over-fuse-over-vpn-over-wifi.
> But talk is cheap so I'm shutting up. Right now. :)

Well, the whole idea of the kexec approach is that the hibernate system
doesn't need to know anything at all about filesystems or any particular
device.  So if it works at all, it will work for
crypto-over-loop-over-fuse-over-vpn-over-wifi
-over-pigeon-carrier-protocol-over-printer-and-scanner.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

Xavier Bestel <[EMAIL PROTECTED]> writes:

> On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote:
>> >> You might claim then that the solution is to simply keep the network
>> >> driver quiesced or stopped.  But then it is impossible to write the
>> >> image over the network.  The way to get around this problem is to write
>> >> the image over the network using a fresh network stack.
>> 
>> > Or teach the driver stack about the difference/reset it. Remember that
>> > even if you get a fresh network stack, you'll still be getting packets
>> > for the old stack. Getting a new ip (assuming one is available) won't
>> > stop other connections getting killed, either because we send resets
>> > from the kexec'd kernel, or because they timeout looking for the old
>> > ip.
>> 
>> I could be mistaken, but I think that bringing up the network interface
>> with a different IP address would prevent it from reseting existing TCP
>> connections, because it would never receive the packets for those
>> existing connections.  

> That can't work. There are networks where the client must have a fixed
> IP, or must accept the adress given by dhcp in order to talk to
> fileservers. And you still have the same mac adress, which may cause
> problems.

I wasn't suggesting that using a different IP address would be a general
solution.  It might be a solution for a few people.

In general, I'd imagine that most people would not bring up the network
interface at all, and most of the people that do would bring it up with
the same IP address, causing some existing TCP connections to possibly
be reset.

I think that causing connections to be reset is, however, far better
than acking packets that are then silently thrown away.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

Pavel Machek <[EMAIL PROTECTED]> writes:

[snip]

>> > If _I_ were willing to add some runtime overhead to make hibernation
>> > simpler, I'd just use some virtualization to do that... with added
>> > advantage of "hibernate here, resume on different hw".
>> 
>> I don't believe there is going to be any runtime overhead.

> 64MB less memory seems like runtime overhead for me. If you know how
> to do kexec without pre-reserving memory, I believe kexec/kdump team
> will be interested.

The main reason kdump needs to reserve memory at boot is that it needs
to preload the crashdump kernel into memory so that it will be available
on panic (and however much memory the crashdump kernel will need to run
will also need to be available at all times, since a panic can occur at
any time), and also because no attempt is made to shutdown devices on
panic, and consequently devices may clobber existing memory with ongoing
DMA, so a reserved area of memory must be used by the crashdump kernel.

For hibernate via kexec, however, these issues do not exist.  The
simplest solution would be to simply backup the first say 16MB or 64MB
(or however much is desired for the "save" kernel to have) of memory
into free pages just before copying the "save" kernel into the desired
position and jumping to it.

Due to the speed of memory copying, this should not add any significant
overhead.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

rrent approach.

> [snip]

>> To me, it seems a lot easier to get right than the current approaches.

> But you can't get what you said you wanted - a fully functional system
> with a fully functional userspace isn't possible. You're running a
> different kernel and can't safely mount filesystems that were mounted by
> the first kernel. You'll have to set up a limited userspace that runs
> from some sort of initrd/ramfs and will end up (so far as I can see now)
> with similar restrictions to what we have now with uswsusp or suspend2's
> userui. (Reads more... oh, I see you said that below :>)

Well, it is fully functional in the sense that everything works as
advertised.  I don't know exactly how uswsusp works, but the kexec
approach would have the advantage that you don't have to follow any
special rules like:

 - better not write to the mounted filesystems, or you'll corrupt things

 - better not try to talk to any other processes, because they're frozen
   and you'll just hang

 - better not fork any other processes, because only specially listed
   processes get to run (maybe this isn't the case, I don't know).

Essentially, with the current approaches, you end up with two
independent userspaces anyway, but you just try to run them under a
single kernel (and really it would be preferable to have two independent
kernel spaces as well in the case of certain device drivers, but of
course this cannot be done under one kernel, hence the reason for
kexec).

>> > Moreover, I think it would require some problems that we don't even
>> > anticipate to be solved.
>> 
>> Possibly.  The alternative, though, seems to be to add hack after hack
>> to get certain functionality to work.

> As I argued above, both systems involve some degree of 'hack'. Kexec
> only seems clean until you release that you wanted some of the context
> you just switched away from.

(Perhaps see my comments above.)

Also, perhaps see the reply to Pavel about the need to reserve memory,
which I'm about to write. ;)

Please don't take my comments in this thread too harshly.  I'm not
trying to undermine that work that you and the other hibernate
developers have done.  I just think this kexec approach is an
interesting idea, and I brought it up so that it might get explored.  I
still don't know if it actually makes sense (although I've managed to
mostly convince myself), and discussing it with you and the other
hibernate developers helps in figuring that out.  If I didn't strongly
advocate it, it wouldn't get any thought.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

 sort of initrd/ramfs and will end up (so far as I can see now)
 with similar restrictions to what we have now with uswsusp or suspend2's
 userui. (Reads more... oh, I see you said that below :)

Well, it is fully functional in the sense that everything works as
advertised.  I don't know exactly how uswsusp works, but the kexec
approach would have the advantage that you don't have to follow any
special rules like:

 - better not write to the mounted filesystems, or you'll corrupt things

 - better not try to talk to any other processes, because they're frozen
   and you'll just hang

 - better not fork any other processes, because only specially listed
   processes get to run (maybe this isn't the case, I don't know).

Essentially, with the current approaches, you end up with two
independent userspaces anyway, but you just try to run them under a
single kernel (and really it would be preferable to have two independent
kernel spaces as well in the case of certain device drivers, but of
course this cannot be done under one kernel, hence the reason for
kexec).

  Moreover, I think it would require some problems that we don't even
  anticipate to be solved.
 
 Possibly.  The alternative, though, seems to be to add hack after hack
 to get certain functionality to work.

 As I argued above, both systems involve some degree of 'hack'. Kexec
 only seems clean until you release that you wanted some of the context
 you just switched away from.

(Perhaps see my comments above.)

Also, perhaps see the reply to Pavel about the need to reserve memory,
which I'm about to write. ;)

Please don't take my comments in this thread too harshly.  I'm not
trying to undermine that work that you and the other hibernate
developers have done.  I just think this kexec approach is an
interesting idea, and I brought it up so that it might get explored.  I
still don't know if it actually makes sense (although I've managed to
mostly convince myself), and discussing it with you and the other
hibernate developers helps in figuring that out.  If I didn't strongly
advocate it, it wouldn't get any thought.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

Pavel Machek [EMAIL PROTECTED] writes:

[snip]

  If _I_ were willing to add some runtime overhead to make hibernation
  simpler, I'd just use some virtualization to do that... with added
  advantage of hibernate here, resume on different hw.
 
 I don't believe there is going to be any runtime overhead.

 64MB less memory seems like runtime overhead for me. If you know how
 to do kexec without pre-reserving memory, I believe kexec/kdump team
 will be interested.

The main reason kdump needs to reserve memory at boot is that it needs
to preload the crashdump kernel into memory so that it will be available
on panic (and however much memory the crashdump kernel will need to run
will also need to be available at all times, since a panic can occur at
any time), and also because no attempt is made to shutdown devices on
panic, and consequently devices may clobber existing memory with ongoing
DMA, so a reserved area of memory must be used by the crashdump kernel.

For hibernate via kexec, however, these issues do not exist.  The
simplest solution would be to simply backup the first say 16MB or 64MB
(or however much is desired for the save kernel to have) of memory
into free pages just before copying the save kernel into the desired
position and jumping to it.

Due to the speed of memory copying, this should not add any significant
overhead.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

Xavier Bestel [EMAIL PROTECTED] writes:

 On Mon, 2007-06-11 at 11:01 -0400, Jeremy Maitin-Shepard wrote:
  You might claim then that the solution is to simply keep the network
  driver quiesced or stopped.  But then it is impossible to write the
  image over the network.  The way to get around this problem is to write
  the image over the network using a fresh network stack.
 
  Or teach the driver stack about the difference/reset it. Remember that
  even if you get a fresh network stack, you'll still be getting packets
  for the old stack. Getting a new ip (assuming one is available) won't
  stop other connections getting killed, either because we send resets
  from the kexec'd kernel, or because they timeout looking for the old
  ip.
 
 I could be mistaken, but I think that bringing up the network interface
 with a different IP address would prevent it from reseting existing TCP
 connections, because it would never receive the packets for those
 existing connections.  

 That can't work. There are networks where the client must have a fixed
 IP, or must accept the adress given by dhcp in order to talk to
 fileservers. And you still have the same mac adress, which may cause
 problems.

I wasn't suggesting that using a different IP address would be a general
solution.  It might be a solution for a few people.

In general, I'd imagine that most people would not bring up the network
interface at all, and most of the people that do would bring it up with
the same IP address, causing some existing TCP connections to possibly
be reset.

I think that causing connections to be reset is, however, far better
than acking packets that are then silently thrown away.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-11 Thread Jeremy Maitin-Shepard

Xavier Bestel [EMAIL PROTECTED] writes:

[snip]

 If I were helping you coding I'd suggest to only concentrate on having
 your project work on standard filesystems, and then when it works maybe
 think about suspending on crypto-over-loop-over-fuse-over-vpn-over-wifi.
 But talk is cheap so I'm shutting up. Right now. :)

Well, the whole idea of the kexec approach is that the hibernate system
doesn't need to know anything at all about filesystems or any particular
device.  So if it works at all, it will work for
crypto-over-loop-over-fuse-over-vpn-over-wifi
-over-pigeon-carrier-protocol-over-printer-and-scanner.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-04 Thread Jeremy Maitin-Shepard

On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote:

> [snip]

> > You might claim then that the solution is to simply keep the network
> > driver quiesced or stopped.  But then it is impossible to write the
> > image over the network.  The way to get around this problem is to write
> > the image over the network using a fresh network stack.
> 
> The "fresh network stack" will RST any connections that were going,
> which is ugly, too.

It will only do this if you bring up the network device with the same IP 
address in the new kernel (which you would have no reason to do if you don't 
need to write the image over the network.)  Maybe the ideal behavior would be 
to tell the network stack to just ignore unexpected TCP packets, rather than 
send RST, while saving or reading the image, but that is probably not necessary 
for most uses and would be a hack.

I also think that sending RST is far better than sending ACK and then silently 
tossing out the data, which is what is currently done.  (Since I 
believe currently the network devices are brought back up along with 
all other devices after the atomic copy is made.)  Silently losing data is 
something that should only occur on a crash.  This is likely to actually be a 
somewhat serious problem for servers on which hibernate is used to move the 
server between rooms without losing connections.

Of course, you can get around this by adding a hack to not bring up network 
devices based on some option or other, but that just solves one specific 
case with an ugly solution.  In contrast, using the kexec approach, the network 
device or any other device would quite naturally not be brought back up unless 
it was needed for hibernate, and even if it is brought back up, no data are 
silently lost.

> [snip]

> > To me, it seems a lot easier to get right than the current approaches.
> 
> Well, you are certainly welcome to create the patch. "suspend3" name
> is still free, AFAICT.

I could be sneaky and call it "hibernate".  Probably nicer though to use the 
name "kexec hibernate" to be later simplified to just "hibernate".

I was hoping that everyone would like the idea so much that they would rush to 
implement it, so that I wouldn't have to try.  (I haven't written much kernel 
code before, and I have a number of other time-requiring projects to work on.) 
It looks like that is not too likely to happen though ;).

Maybe I'll try implementing it though, and find that it isn't very much work.

It would be very convenient if the current work being done to improve the 
driver interfaces for hibernate also results in the proper interfaces needed 
for this approach.  It looks like the resume path should be exactly the same 
with this approach as with the existing approaches, but the hibernate path is 
not exactly the same.  In particular, it seems that all devices should be shut 
down to a greater extent that merely the quiescing neccessary for the current 
approaches while making an atomic copy, but also they should not be completely 
shut down to the extent that they cannot be restored to the desired state when 
resuming or aborting.

> 
> If _I_ were willing to add some runtime overhead to make hibernation
> simpler, I'd just use some virtualization to do that... with added
> advantage of "hibernate here, resume on different hw".

I don't believe there is going to be any runtime overhead.

To some extent, (see some of the explanations I gave in the other e-mail I
sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be
viewed as a cleaner variant of userspace hibernate.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-04 Thread Jeremy Maitin-Shepard

re.  The only real impact would be that the user would need to somehow 
specify how to access the "save image kernel" and the additional kernel 
command-line arguments to include.  If an initrd is to be used instead of an 
initramfs, then that would have to be specified as well.  I don't think this 
setup requirement is significantly more taxing than having to specify the 
path to the user interface program, for instance.

> * adding interfaces to tell kexec/dump/whatever what pages need to be
> saved and reloaded

Any hibernation mechanism needs to know which pages to save.  This approach is 
no different.  The "interface" could likely be one of the following:

1. Just before jumping to the new kernel, with interrupts disabled and devices 
already stopped, the original kernel prepares a list of pages to write 
somewhere in memory.  The old kernel passes the address of this list as a 
kernel command-line argument to the new kernel.  The initramfs or initrd 
userspace (or the kernel itself, although there would be no advantage in doing 
this in the kernel) gets this address from the kernel command-line and then 
reads that list to determine which pages to write.  Presumably preparing the 
list would be a small amount of code, and presumably both suspend2 and the 
in-kernel swsusp already need to do something like this.

2. The old kernel prepares no new data structures, and simply provides a few 
pointers as kernel command-line arguments to the new kernel to the existing 
data structures that describe the pages that are used.  The code running under 
the new kernel responsible for writing the hibernation image simply accesses 
these data structures using the pointers from the kernel command-line to 
determine which pages to write.

> * adding convolutions in which at resume time we boot one kernel, switch
> to another kernel to do the loading and then switch back again to the
> resumed kernel (assuming I understand what you're suggesting).

This shouldn't actually be necessary.  It should be possible to do the resume 
in exactly the same way the in-kernel swsusp resumes currently (except that 
userspace could be used to actually load the image into memory, and then tells 
the kernel to do the necessary manipulations to stop devices, shuffle the 
pages around so they are in the right positions, and then jump to the resumed 
kernel).

> 
> It all sounds terribly complicated and confusing to me, and that's
> before I even begin to think about how this second kernel could possibly
> write the image to an encrypted device or LVM or such like that the
> first kernel knows about and might use now.

I find in some ways it is much simpler than the current approaches.  The "save 
kernel" has to re-initialize device mapper devices that are needed to write the
image in exactly the same way that the resume kernel needs to reinitialize those
devices.  In fact, it could probably use the very same initramfs/initrd code to 
do it.  The fact that it imposes this symmetry is arguably an advantage.

> Can't we just get the freezer right and be done with it?

The question is: can the freezer ever be right?  As far as I can see, no level 
of correctness of the freezer is going to allow you to save the hibernation 
image to something on a fuse filesystem, because essentially any code that is 
run while writing the image needs to live in an special box that is totally 
isolated from the rest of the system in order to avoid problems; thus, it seems 
like it makes sense to implement this box by simply using a separate kernel, 
rather than adding hacks.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-04 Thread Jeremy Maitin-Shepard

 to include.  If an initrd is to be used instead of an 
initramfs, then that would have to be specified as well.  I don't think this 
setup requirement is significantly more taxing than having to specify the 
path to the user interface program, for instance.

 * adding interfaces to tell kexec/dump/whatever what pages need to be
 saved and reloaded

Any hibernation mechanism needs to know which pages to save.  This approach is 
no different.  The interface could likely be one of the following:

1. Just before jumping to the new kernel, with interrupts disabled and devices 
already stopped, the original kernel prepares a list of pages to write 
somewhere in memory.  The old kernel passes the address of this list as a 
kernel command-line argument to the new kernel.  The initramfs or initrd 
userspace (or the kernel itself, although there would be no advantage in doing 
this in the kernel) gets this address from the kernel command-line and then 
reads that list to determine which pages to write.  Presumably preparing the 
list would be a small amount of code, and presumably both suspend2 and the 
in-kernel swsusp already need to do something like this.

2. The old kernel prepares no new data structures, and simply provides a few 
pointers as kernel command-line arguments to the new kernel to the existing 
data structures that describe the pages that are used.  The code running under 
the new kernel responsible for writing the hibernation image simply accesses 
these data structures using the pointers from the kernel command-line to 
determine which pages to write.

 * adding convolutions in which at resume time we boot one kernel, switch
 to another kernel to do the loading and then switch back again to the
 resumed kernel (assuming I understand what you're suggesting).

This shouldn't actually be necessary.  It should be possible to do the resume 
in exactly the same way the in-kernel swsusp resumes currently (except that 
userspace could be used to actually load the image into memory, and then tells 
the kernel to do the necessary manipulations to stop devices, shuffle the 
pages around so they are in the right positions, and then jump to the resumed 
kernel).

 
 It all sounds terribly complicated and confusing to me, and that's
 before I even begin to think about how this second kernel could possibly
 write the image to an encrypted device or LVM or such like that the
 first kernel knows about and might use now.

I find in some ways it is much simpler than the current approaches.  The save 
kernel has to re-initialize device mapper devices that are needed to write the
image in exactly the same way that the resume kernel needs to reinitialize those
devices.  In fact, it could probably use the very same initramfs/initrd code to 
do it.  The fact that it imposes this symmetry is arguably an advantage.

 Can't we just get the freezer right and be done with it?

The question is: can the freezer ever be right?  As far as I can see, no level 
of correctness of the freezer is going to allow you to save the hibernation 
image to something on a fuse filesystem, because essentially any code that is 
run while writing the image needs to live in an special box that is totally 
isolated from the rest of the system in order to avoid problems; thus, it seems 
like it makes sense to implement this box by simply using a separate kernel, 
rather than adding hacks.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-04 Thread Jeremy Maitin-Shepard

On Mon, Jun 04, 2007 at 12:46:21PM +0200, Pavel Machek wrote:

 [snip]

  You might claim then that the solution is to simply keep the network
  driver quiesced or stopped.  But then it is impossible to write the
  image over the network.  The way to get around this problem is to write
  the image over the network using a fresh network stack.
 
 The fresh network stack will RST any connections that were going,
 which is ugly, too.

It will only do this if you bring up the network device with the same IP 
address in the new kernel (which you would have no reason to do if you don't 
need to write the image over the network.)  Maybe the ideal behavior would be 
to tell the network stack to just ignore unexpected TCP packets, rather than 
send RST, while saving or reading the image, but that is probably not necessary 
for most uses and would be a hack.

I also think that sending RST is far better than sending ACK and then silently 
tossing out the data, which is what is currently done.  (Since I 
believe currently the network devices are brought back up along with 
all other devices after the atomic copy is made.)  Silently losing data is 
something that should only occur on a crash.  This is likely to actually be a 
somewhat serious problem for servers on which hibernate is used to move the 
server between rooms without losing connections.

Of course, you can get around this by adding a hack to not bring up network 
devices based on some option or other, but that just solves one specific 
case with an ugly solution.  In contrast, using the kexec approach, the network 
device or any other device would quite naturally not be brought back up unless 
it was needed for hibernate, and even if it is brought back up, no data are 
silently lost.

 [snip]

  To me, it seems a lot easier to get right than the current approaches.
 
 Well, you are certainly welcome to create the patch. suspend3 name
 is still free, AFAICT.

I could be sneaky and call it hibernate.  Probably nicer though to use the 
name kexec hibernate to be later simplified to just hibernate.

I was hoping that everyone would like the idea so much that they would rush to 
implement it, so that I wouldn't have to try.  (I haven't written much kernel 
code before, and I have a number of other time-requiring projects to work on.) 
It looks like that is not too likely to happen though ;).

Maybe I'll try implementing it though, and find that it isn't very much work.

It would be very convenient if the current work being done to improve the 
driver interfaces for hibernate also results in the proper interfaces needed 
for this approach.  It looks like the resume path should be exactly the same 
with this approach as with the existing approaches, but the hibernate path is 
not exactly the same.  In particular, it seems that all devices should be shut 
down to a greater extent that merely the quiescing neccessary for the current 
approaches while making an atomic copy, but also they should not be completely 
shut down to the extent that they cannot be restored to the desired state when 
resuming or aborting.

 
 If _I_ were willing to add some runtime overhead to make hibernation
 simpler, I'd just use some virtualization to do that... with added
 advantage of hibernate here, resume on different hw.

I don't believe there is going to be any runtime overhead.

To some extent, (see some of the explanations I gave in the other e-mail I
sent a few minutes ago in reply to Nigel) I think the kexec appraoch can be
viewed as a cleaner variant of userspace hibernate.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A kexec approach to hibernation

2007-06-01 Thread Jeremy Maitin-Shepard

ady in place. :-)

I suppose you do that by using more sophisticated logic to atomically
copy the pages to their final location after loading them from disk.  In
particular, I suppose you must order the page copies carefully to avoid
clobbering pages that have not yet been copied.  Seems reasonable.  In
that case, there is indeed probably no reason to not use that approach
for resuming.

[snip]

>> The whole reason to want to checkpoint filesystems was so that the
>> original kernel would remain a fully-functional system with a
>> fully-functional userspace that can continue to access the filesystems
>> while the hibernate image is being written.  In addition to the lack of
>> checkpoint support, however, there are a number of other issues that
>> this would create: Even if you can checkpoint filesystems, you can't
>> checkpoint the entire world.  The kernel will keep acking network
>> packets, and userspace as well will send any normal replies.  If a
>> document was sent off to be printed right before the checkpoint, it
>> might end up printing while the image is being saved, and then printed
>> again when the system resumes.

> That's correct.

>> Fundamentally, I don't think checkpointing is the right answer.  What is
>> desired is a fully functional system with a fully functional userspace
>> during the image writing.  But we don't want this to be the _same_
>> system that is actually being imaged.
>> 
>> That is why I think the kexec solution is the elegant solution.

> Frankly, I think it's tricky. ;-)

To me, it seems a lot easier to get right than the current approaches.

> Moreover, I think it would require some problems that we don't even
> anticipate to be solved.

Possibly.  The alternative, though, seems to be to add hack after hack
to get certain functionality to work.

>> > I see two basic advantages of your approach:
>> > 1) We don't need to freeze tasks.
>> > 2) We can create images larger than 50% of RAM.
>> 
>> There is also the key benefit of allowing an arbitrary userspace in a
>> fully functional system to be used to both save and load the image.  As
>> far as I understand, uswsusp allows a single userspace processes to run
>> to handle the loading and saving, but the processes runs in a rather
>> fragile userspace with most things disabled; in particular, this
>> userspace process can't access a fuse filesystem and probably can't do
>> other things like fork.

> The user space running on top of the new kernel would be limited by the
> fact that the old kernel's filesystems would be inaccessible to it.  That
> would, effectively, require the user to have special filesystems for the
> image-saving kernel and its user space, which isn't very realistic
> IMO.

Fundamentally, saving of the image can't access any of the normal
filesystems anyway.  The userspace would likely be provided as an
initramfs or initrd, exactly as is done for userspace resume from
hibernate currently.  The same initramfs could probably be used for both
saving the image and restoring the image, since exactly the same
procedure would be used to set up the necessary devices for both the
save and restore case, and the GUI that is used might also be the same.

>> > Still, I don't think we could implement it quickly and easily.
>> 
>> It is hard to say how hard it would be.  I think a lot of the existing
>> kexec and hibernate code could be leveraged.

> Yes, I think so, but at least we need to fix the quiescing of devices before
> we think of implementing that.

It seems like fixing of device stopping/suspend/quiescing is an
orthogonal issue to the actual hibernate implementation.  It would
probably be most reliable and simplest if on every jump between kernels,
all devices are fully stopped by the jumping kernel, and then fully
reinitialized by the jumped-to kernel.  Presumably the time spent doing
this initialization will not be very significant compared to the time
required to write the image.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 123 matches

Mail list logo