Re: Back to the future.
Pavel Machek <[EMAIL PROTECTED]> writes: > Hi! > >> > While that would certainly be nifty, I think we're arguably starting >> > from the wrong point here. Why are we booting a kernel, trying to poke >> > the hardware back into some sort of mock-quiescent state, freeing memory >> > and then (finally) overwriting the entire contents of RAM rather than >> > just doing all of this from the bootloader? > > Doing it from the bootloader sounds attractive... but it is lot of > work. I'm essentially using linux as a bootloader. > > Patch for grub welcome. Well. We actually have first class support for using linux as a bootloader. So you could use linux and do whatever dance you are doing from a bootloader if you felt the desire. That might make the dance a little easier. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Pavel Machek [EMAIL PROTECTED] writes: Hi! While that would certainly be nifty, I think we're arguably starting from the wrong point here. Why are we booting a kernel, trying to poke the hardware back into some sort of mock-quiescent state, freeing memory and then (finally) overwriting the entire contents of RAM rather than just doing all of this from the bootloader? Doing it from the bootloader sounds attractive... but it is lot of work. I'm essentially using linux as a bootloader. Patch for grub welcome. Well. We actually have first class support for using linux as a bootloader. So you could use linux and do whatever dance you are doing from a bootloader if you felt the desire. That might make the dance a little easier. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
We used it (with great success) to replace bad UPSs on single-PSU database servers under (light) load. No need for scheduled downtime, etc. The whole point of hibernation (or suspend to disk, or whatever you call it) is that the system goes to a zero-power state and then can be brought back to its original state. Closing in-progress network connections has nothing to do with pausing a machine any more than setting IM clients to 'away' would, or locking an X session. That sort of side-effect needs to be handled outside the core of "put state out to disk and read it back". On 5/7/07, Pavel Machek <[EMAIL PROTECTED]> wrote: Hi! > I don't dispute that it sometimes works today. > > what I dispute is that makeing it work should be a contraint on a cleaner > design that happens to cause tcp connections to fail on suspend-to-disk > (hibernate). > > if you are dong suspend-to-disk for such a short period that TCP > connections are able to recover (typically <15 min for most firewalls, in > some cases <2 min for connections with keep-alive) is it really > worth it? People were using swsusp to move server from one room to another. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
We used it (with great success) to replace bad UPSs on single-PSU database servers under (light) load. No need for scheduled downtime, etc. The whole point of hibernation (or suspend to disk, or whatever you call it) is that the system goes to a zero-power state and then can be brought back to its original state. Closing in-progress network connections has nothing to do with pausing a machine any more than setting IM clients to 'away' would, or locking an X session. That sort of side-effect needs to be handled outside the core of put state out to disk and read it back. On 5/7/07, Pavel Machek [EMAIL PROTECTED] wrote: Hi! I don't dispute that it sometimes works today. what I dispute is that makeing it work should be a contraint on a cleaner design that happens to cause tcp connections to fail on suspend-to-disk (hibernate). if you are dong suspend-to-disk for such a short period that TCP connections are able to recover (typically 15 min for most firewalls, in some cases 2 min for connections with keep-alive) is it really worth it? People were using swsusp to move server from one room to another. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > I don't dispute that it sometimes works today. > > what I dispute is that makeing it work should be a contraint on a cleaner > design that happens to cause tcp connections to fail on suspend-to-disk > (hibernate). > > if you are dong suspend-to-disk for such a short period that TCP > connections are able to recover (typically <15 min for most firewalls, in > some cases <2 min for connections with keep-alive) is it really > worth it? People were using swsusp to move server from one room to another. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Mon, 7 May 2007, Pavel Machek wrote: Really? It works today... if the suspend is short enough. And that's how it should be. If we get very good at Wake-on-Lan it should work for any length of time. for suspend-to-ram this would work, I stand corrected. for hibernate this would almost certinly not work, and I don't think that it's worth raising false hopes. Check the facts. It used to work, and it should work today. I don't dispute that it sometimes works today. what I dispute is that makeing it work should be a contraint on a cleaner design that happens to cause tcp connections to fail on suspend-to-disk (hibernate). if you are dong suspend-to-disk for such a short period that TCP connections are able to recover (typically <15 min for most firewalls, in some cases <2 min for connections with keep-alive) is it really worth it? and once you pass the timeframes where the connections are still alive then it shouldn't matter, and in fact the server should gracefully close the connections to be nice to other devices and servers on the network. I dispute the idea that doing a suspend-to-disk and expecting that your network connections will recover when you wake up is a sane expectation. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > >>Really? It works today... if the suspend is short > >>enough. And that's > >>how it should be. > > > >If we get very good at Wake-on-Lan it should work for > >any length > >of time. > > for suspend-to-ram this would work, I stand corrected. > > for hibernate this would almost certinly not work, and I > don't think that it's worth raising false hopes. Check the facts. It used to work, and it should work today. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Mon, 7 May 2007, Oliver Neukum wrote: Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek: Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. Really? It works today... if the suspend is short enough. And that's how it should be. If we get very good at Wake-on-Lan it should work for any length of time. for suspend-to-ram this would work, I stand corrected. for hibernate this would almost certinly not work, and I don't think that it's worth raising false hopes. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > nobody is suggesting that you leave peocesses running > while you do the snapshot, what is being proposed is > > 1. pause userspace (prevent scheduling) > 2. make snapshot image of memory > 3. make mounted filesystems read-only (possibly with > snapshot/checkpoint) > 4. unpause > 5. save image (with full userspace available, including > network) Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > The "let's stop all kernel threads" is superstition. It's the same kind of > > superstition that made people write "sync" three times before turning off > > the power in the olden times. It's the kind of superstition that comes > > from "we don't do things right, so let's be vewy vewy quiet and _pray_ > > that it works when we are beign quiet". > > Side note: while I think things should probably *work* even with user > processes going full bore while a snapshot it taken, I'll freely admit > that I'll follow that superstition far enough that I think it's probably a > good idea to try to quiesce the system to _some_ degree, and that stopping > user programs is a good idea. Partly because the whole memory shrinking > thing, and partly just because we should do the snapshot with hw IO queues > empty. > > But I don't think it would necessarily be wrong (and in many ways it would > probably be *right*) to do that IO queue stopping at the queue level > rather than at a process level. Why stop processes just becasue you want > to clean out IO queues? They are two totally different things! Actually, I'd like to stop I/O queues; if there was easy way to do that, I'll happily switch. Notice that we'll need to stop 'I/O queues' of the char devices, too... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek: > > >Including network? Your tcp peers will be really confused, then, if > > >you ACK packets then claim you did not get them. No, you do not want > > >to start network. > > > > anyone who is doing a hibernate or suspend who expect all the network > > connections to be working afterwords is dreaming or smokeing > >something. > > Really? It works today... if the suspend is short enough. And that's > how it should be. If we get very good at Wake-on-Lan it should work for any length of time. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > >>nobody is suggesting that you leave peocesses running > >>while you do the snapshot, what is being proposed is > >> > >>1. pause userspace (prevent scheduling) > >>2. make snapshot image of memory > >>3. make mounted filesystems read-only (possibly with > >>snapshot/checkpoint) > >>4. unpause > >>5. save image (with full userspace available, including > >>network) > > > >Including network? Your tcp peers will be really confused, then, if > >you ACK packets then claim you did not get them. No, you do not want > >to start network. > > anyone who is doing a hibernate or suspend who expect all the network > connections to be working afterwords is dreaming or smokeing >something. Really? It works today... if the suspend is short enough. And that's how it should be. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! Really? It works today... if the suspend is short enough. And that's how it should be. If we get very good at Wake-on-Lan it should work for any length of time. for suspend-to-ram this would work, I stand corrected. for hibernate this would almost certinly not work, and I don't think that it's worth raising false hopes. Check the facts. It used to work, and it should work today. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Mon, 7 May 2007, Pavel Machek wrote: Really? It works today... if the suspend is short enough. And that's how it should be. If we get very good at Wake-on-Lan it should work for any length of time. for suspend-to-ram this would work, I stand corrected. for hibernate this would almost certinly not work, and I don't think that it's worth raising false hopes. Check the facts. It used to work, and it should work today. I don't dispute that it sometimes works today. what I dispute is that makeing it work should be a contraint on a cleaner design that happens to cause tcp connections to fail on suspend-to-disk (hibernate). if you are dong suspend-to-disk for such a short period that TCP connections are able to recover (typically 15 min for most firewalls, in some cases 2 min for connections with keep-alive) is it really worth it? and once you pass the timeframes where the connections are still alive then it shouldn't matter, and in fact the server should gracefully close the connections to be nice to other devices and servers on the network. I dispute the idea that doing a suspend-to-disk and expecting that your network connections will recover when you wake up is a sane expectation. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! I don't dispute that it sometimes works today. what I dispute is that makeing it work should be a contraint on a cleaner design that happens to cause tcp connections to fail on suspend-to-disk (hibernate). if you are dong suspend-to-disk for such a short period that TCP connections are able to recover (typically 15 min for most firewalls, in some cases 2 min for connections with keep-alive) is it really worth it? People were using swsusp to move server from one room to another. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! nobody is suggesting that you leave peocesses running while you do the snapshot, what is being proposed is 1. pause userspace (prevent scheduling) 2. make snapshot image of memory 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. Really? It works today... if the suspend is short enough. And that's how it should be. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek: Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. Really? It works today... if the suspend is short enough. And that's how it should be. If we get very good at Wake-on-Lan it should work for any length of time. Regards Oliver - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! nobody is suggesting that you leave peocesses running while you do the snapshot, what is being proposed is 1. pause userspace (prevent scheduling) 2. make snapshot image of memory 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! The let's stop all kernel threads is superstition. It's the same kind of superstition that made people write sync three times before turning off the power in the olden times. It's the kind of superstition that comes from we don't do things right, so let's be vewy vewy quiet and _pray_ that it works when we are beign quiet. Side note: while I think things should probably *work* even with user processes going full bore while a snapshot it taken, I'll freely admit that I'll follow that superstition far enough that I think it's probably a good idea to try to quiesce the system to _some_ degree, and that stopping user programs is a good idea. Partly because the whole memory shrinking thing, and partly just because we should do the snapshot with hw IO queues empty. But I don't think it would necessarily be wrong (and in many ways it would probably be *right*) to do that IO queue stopping at the queue level rather than at a process level. Why stop processes just becasue you want to clean out IO queues? They are two totally different things! Actually, I'd like to stop I/O queues; if there was easy way to do that, I'll happily switch. Notice that we'll need to stop 'I/O queues' of the char devices, too... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Mon, 7 May 2007, Oliver Neukum wrote: Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek: Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. Really? It works today... if the suspend is short enough. And that's how it should be. If we get very good at Wake-on-Lan it should work for any length of time. for suspend-to-ram this would work, I stand corrected. for hibernate this would almost certinly not work, and I don't think that it's worth raising false hopes. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On May 06, 2007, at 22:13:51, David Lang wrote: anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. this is just another way that the failure can show up. in fact, I would say that it would probalby be a nice thing to do for intervening firewalls and external servers if a suspend closed all external TCP connections rather then leaving them dangling (eating up resources until they time out) if you software can't tolorate the network connection going away on you it will have problems in normal operation anyway, let alone when you suspend/hibernate your machine. Yeah, for suspend-to-ram+resume and for snapshot+restore you probably want userspace to support some kind of initscript-like mechanism which is triggered by the lid-switch or something before calling into the kernel. That way it can close network connections mostly-nicely and down network interfaces before suspending, then re-run DHCP/ 802.11/whatever configuration after resume/restore. That might not be a bad place to handle NFS mounts and such too. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Thu, 3 May 2007, Pavel Machek wrote: Hi! nobody is suggesting that you leave peocesses running while you do the snapshot, what is being proposed is 1. pause userspace (prevent scheduling) 2. make snapshot image of memory 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. this is just another way that the failure can show up. in fact, I would say that it would probalby be a nice thing to do for intervening firewalls and external servers if a suspend closed all external TCP connections rather then leaving them dangling (eating up resources until they time out) if you software can't tolorate the network connection going away on you it will have problems in normal operation anyway, let alone when you suspend/hibernate your machine. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Thu, 3 May 2007, Pavel Machek wrote: Hi! nobody is suggesting that you leave peocesses running while you do the snapshot, what is being proposed is 1. pause userspace (prevent scheduling) 2. make snapshot image of memory 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. this is just another way that the failure can show up. in fact, I would say that it would probalby be a nice thing to do for intervening firewalls and external servers if a suspend closed all external TCP connections rather then leaving them dangling (eating up resources until they time out) if you software can't tolorate the network connection going away on you it will have problems in normal operation anyway, let alone when you suspend/hibernate your machine. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On May 06, 2007, at 22:13:51, David Lang wrote: anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. this is just another way that the failure can show up. in fact, I would say that it would probalby be a nice thing to do for intervening firewalls and external servers if a suspend closed all external TCP connections rather then leaving them dangling (eating up resources until they time out) if you software can't tolorate the network connection going away on you it will have problems in normal operation anyway, let alone when you suspend/hibernate your machine. Yeah, for suspend-to-ram+resume and for snapshot+restore you probably want userspace to support some kind of initscript-like mechanism which is triggered by the lid-switch or something before calling into the kernel. That way it can close network connections mostly-nicely and down network interfaces before suspending, then re-run DHCP/ 802.11/whatever configuration after resume/restore. That might not be a bad place to handle NFS mounts and such too. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hello, On Sat, May 5, 2007 11:16, Pavel Machek wrote: >> But the same functionality can be achieved by doing: >> >> 1) Define a user password (e.g. /etc/shadow thing). (Once) >> >> 2) When a user logs in: get random data and encrypt it with the password, >> this becomes the AES key. Store both the data and key in a secure way in >> memory, e.g. using the existing kernel key infrastructure. > > > >> Advantage of this scheme is that it only need AES and can be done (mostly) >> in kernel space. It's also faster and simpler than the current RSA scheme. >> Disadvantage is that it wastes at least 32 bytes of memory when the system >> is running, to store the data and key. > > Another disadvantage is that you need to hack into PAM infrastructure, > that your suspend password needs to be same as someone's login > password, and that it will really only work with single-user machine. The first two are only true if you want to integrate it with user login, so that a user only needs to sign in once, which seems like a convenient thing. But if you don't want to integrate with the existing login infrastructure, then just don't. And those disadvantages are true for any system that wants users to login once. Then the disadvantage is reduced to a user needing to provide the password at suspend if the system wasn't booted from a snapshot. But no need for users to generate any files, just to choose a resume password. If the resume key is stored per user instead of a single global instance, it will work with a multi-user system too. A more interesting question is what should happen when one user did the suspend and the other wants to resume. Throw away the snapshot? Refuse booting? Or boot and switch "active user"? If you don't want people to resume each other's suspends then a key per user works. If you want them to, then it becomes a bit tricky, especially if you don't integrate with the login system. You don't want that a user can resume someone else's snapshot and have access to everything that other user left open. Nor do you want users to give a password twice. If you want users to be able to resume each other's snapshots, you probably also want the system to switch users after the resume. No matter what scheme is used, this becomes hairy and hard to get watertight. (Perhaps "impossible" is more realistic: how to be able to read the suspend image and copying it to RAM again, without having access to all data within?) But if it's an "us" against "them" case, and you want users to resume each other's snapshots, you're right that the scheme I proposed will fall apart. In which case it needs to be adjusted a bit to handle this case: Have one global suspend/resume key, and for each user store it on disk, encrypted with that user's password. Also store the key in memory as before. Now when the system is suspended any user needs to have provided his password once for everyone to be able to suspend without giving a password. Also everyone can resume, if they have access to the file with the list of encrypted keys and provide the right password. (Notice that this looks more like the current scheme, where the private part of the RSA key is encrypted with a passphrase and all stored in a file.) Though it seems that using suspend to disk on a real multi-user system is always asking for problems, because the suspend image may contain valuable data which shouldn't be thrown away, but easily can by other users. Nor do you want users to claim the machine, so it's a lose/lose situation. Also with resume every user effectively gets root access, because of all the memory access. So inter-user security is down the drain anyway. Only sane usage I can see is when the users trust each other, in which case they can as well agree on one resume password. ;-) Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > But the same functionality can be achieved by doing: > > 1) Define a user password (e.g. /etc/shadow thing). (Once) > > 2) When a user logs in: get random data and encrypt it with the password, > this becomes the AES key. Store both the data and key in a secure way in > memory, e.g. using the existing kernel key infrastructure. > Advantage of this scheme is that it only need AES and can be done (mostly) > in kernel space. It's also faster and simpler than the current RSA scheme. > Disadvantage is that it wastes at least 32 bytes of memory when the system > is running, to store the data and key. Another disadvantage is that you need to hack into PAM infrastructure, that your suspend password needs to be same as someone's login password, and that it will really only work with single-user machine. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! But the same functionality can be achieved by doing: 1) Define a user password (e.g. /etc/shadow thing). (Once) 2) When a user logs in: get random data and encrypt it with the password, this becomes the AES key. Store both the data and key in a secure way in memory, e.g. using the existing kernel key infrastructure. Advantage of this scheme is that it only need AES and can be done (mostly) in kernel space. It's also faster and simpler than the current RSA scheme. Disadvantage is that it wastes at least 32 bytes of memory when the system is running, to store the data and key. Another disadvantage is that you need to hack into PAM infrastructure, that your suspend password needs to be same as someone's login password, and that it will really only work with single-user machine. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hello, On Sat, May 5, 2007 11:16, Pavel Machek wrote: But the same functionality can be achieved by doing: 1) Define a user password (e.g. /etc/shadow thing). (Once) 2) When a user logs in: get random data and encrypt it with the password, this becomes the AES key. Store both the data and key in a secure way in memory, e.g. using the existing kernel key infrastructure. Advantage of this scheme is that it only need AES and can be done (mostly) in kernel space. It's also faster and simpler than the current RSA scheme. Disadvantage is that it wastes at least 32 bytes of memory when the system is running, to store the data and key. Another disadvantage is that you need to hack into PAM infrastructure, that your suspend password needs to be same as someone's login password, and that it will really only work with single-user machine. The first two are only true if you want to integrate it with user login, so that a user only needs to sign in once, which seems like a convenient thing. But if you don't want to integrate with the existing login infrastructure, then just don't. And those disadvantages are true for any system that wants users to login once. Then the disadvantage is reduced to a user needing to provide the password at suspend if the system wasn't booted from a snapshot. But no need for users to generate any files, just to choose a resume password. If the resume key is stored per user instead of a single global instance, it will work with a multi-user system too. A more interesting question is what should happen when one user did the suspend and the other wants to resume. Throw away the snapshot? Refuse booting? Or boot and switch active user? If you don't want people to resume each other's suspends then a key per user works. If you want them to, then it becomes a bit tricky, especially if you don't integrate with the login system. You don't want that a user can resume someone else's snapshot and have access to everything that other user left open. Nor do you want users to give a password twice. If you want users to be able to resume each other's snapshots, you probably also want the system to switch users after the resume. No matter what scheme is used, this becomes hairy and hard to get watertight. (Perhaps impossible is more realistic: how to be able to read the suspend image and copying it to RAM again, without having access to all data within?) But if it's an us against them case, and you want users to resume each other's snapshots, you're right that the scheme I proposed will fall apart. In which case it needs to be adjusted a bit to handle this case: Have one global suspend/resume key, and for each user store it on disk, encrypted with that user's password. Also store the key in memory as before. Now when the system is suspended any user needs to have provided his password once for everyone to be able to suspend without giving a password. Also everyone can resume, if they have access to the file with the list of encrypted keys and provide the right password. (Notice that this looks more like the current scheme, where the private part of the RSA key is encrypted with a passphrase and all stored in a file.) Though it seems that using suspend to disk on a real multi-user system is always asking for problems, because the suspend image may contain valuable data which shouldn't be thrown away, but easily can by other users. Nor do you want users to claim the machine, so it's a lose/lose situation. Also with resume every user effectively gets root access, because of all the memory access. So inter-user security is down the drain anyway. Only sane usage I can see is when the users trust each other, in which case they can as well agree on one resume password. ;-) Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Thu, May 3, 2007 14:06, Pavel Machek wrote: >> > > The kernel can already do compression and encryption. >> > >> > Yes, if we all could agree on _which_ compression and encryption >> >> Any of those available in the kernel. Where's the problem? > > gzip is too slow for this. lzf works okay. Oh and swsusp wants rsa > crypto. Then port lzf to the kernel, or help with the lzo port. Swsusp might want RSA crypto, but it doesn't really need it. Currently it only uses it to be able to suspend without asking for a passphrase. So the current sequence is: 1) Generate RSA keys + ask for a passphrase. (Once) ... 2) Suspend. (Encrypt snapshot with public RSA key). ... 3) Ask for the passphrase. 4) Resume. RSA is used so that the passphrase can be thrown away between 1 and 2. But the same functionality can be achieved by doing: 1) Define a user password (e.g. /etc/shadow thing). (Once) 2) When a user logs in: get random data and encrypt it with the password, this becomes the AES key. Store both the data and key in a secure way in memory, e.g. using the existing kernel key infrastructure. ... 3) Suspend. (Encrypt snapshot with the AES key and store the random data.) ... 3) Ask for the passphrase. (To get the AES key, encrypt the stored random data.) 4) Resume. Variants are possible of course, but this is the main idea. This is secure because the key infrastructure is secure, and even if it isn't the system must be compromised to get the suspend key before the suspend is done. But at that point the attacker already has all information that can be found in the suspend image, and could have done all kind of things to inflict damage (like installing a key logger). Advantage of this scheme is that it only need AES and can be done (mostly) in kernel space. It's also faster and simpler than the current RSA scheme. Disadvantage is that it wastes at least 32 bytes of memory when the system is running, to store the data and key. Only thing that needs to be done in userspace is setting the random data and AES key, but there exist a suitable interface for that (the key system). As user login is already done in user space, this can be integrated with that in a nice way. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On May 04, 2007, at 03:52:03, David Greaves wrote: Kyle Moffett wrote: On May 03, 2007, at 11:10:47, Pavel Machek wrote: What happens if you try to boot and filesystems are frozen from previous run? If you're just doing a fresh boot then the filesystem is already clean due to the dm freeze and so it mounts up normally. All you need to do then is have a little startup script which purges the saved image before you fsck or remount things read-write since either case means the image is no longer safe to resume. Wouldn't it be better if freeze wrote a freeze-ID to the fs and returned it? This would naturally be kept in the image and a UUID mismatch would be detectable - seems safer and more flexible than 'a script'. "This isn't the freeze you're looking for, move along" Possibly, but I was referring to the _current_ behavior of the device- mapper freezing. While perhaps not ideal, it's currently very easily usable. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Kyle Moffett wrote: > On May 03, 2007, at 11:10:47, Pavel Machek wrote: >> How mature is freezing filesystems -- will it work on at least ext2/3 >> and vfat? > > I'm pretty sure it works on ext2/3 and xfs and possibly others, I don't > know either way about VFAT though. Essentially the "freeze" part > involves telling the filesystem to sync all data, flush the journal, and > mark the filesystem clean. The intent under dm/LVM was to allow you to > make snapshots without having to fsck the just-created snapshot before > you mounted it. > >> What happens if you try to boot and filesystems are frozen from >> previous run? > > If you're just doing a fresh boot then the filesystem is already clean > due to the dm freeze and so it mounts up normally. All you need to do > then is have a little startup script which purges the saved image before > you fsck or remount things read-write since either case means the image > is no longer safe to resume. Wouldn't it be better if freeze wrote a freeze-ID to the fs and returned it? This would naturally be kept in the image and a UUID mismatch would be detectable - seems safer and more flexible than 'a script'. "This isn't the freeze you're looking for, move along" David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Kyle Moffett wrote: On May 03, 2007, at 11:10:47, Pavel Machek wrote: How mature is freezing filesystems -- will it work on at least ext2/3 and vfat? I'm pretty sure it works on ext2/3 and xfs and possibly others, I don't know either way about VFAT though. Essentially the freeze part involves telling the filesystem to sync all data, flush the journal, and mark the filesystem clean. The intent under dm/LVM was to allow you to make snapshots without having to fsck the just-created snapshot before you mounted it. What happens if you try to boot and filesystems are frozen from previous run? If you're just doing a fresh boot then the filesystem is already clean due to the dm freeze and so it mounts up normally. All you need to do then is have a little startup script which purges the saved image before you fsck or remount things read-write since either case means the image is no longer safe to resume. Wouldn't it be better if freeze wrote a freeze-ID to the fs and returned it? This would naturally be kept in the image and a UUID mismatch would be detectable - seems safer and more flexible than 'a script'. This isn't the freeze you're looking for, move along David - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On May 04, 2007, at 03:52:03, David Greaves wrote: Kyle Moffett wrote: On May 03, 2007, at 11:10:47, Pavel Machek wrote: What happens if you try to boot and filesystems are frozen from previous run? If you're just doing a fresh boot then the filesystem is already clean due to the dm freeze and so it mounts up normally. All you need to do then is have a little startup script which purges the saved image before you fsck or remount things read-write since either case means the image is no longer safe to resume. Wouldn't it be better if freeze wrote a freeze-ID to the fs and returned it? This would naturally be kept in the image and a UUID mismatch would be detectable - seems safer and more flexible than 'a script'. This isn't the freeze you're looking for, move along Possibly, but I was referring to the _current_ behavior of the device- mapper freezing. While perhaps not ideal, it's currently very easily usable. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Thu, May 3, 2007 14:06, Pavel Machek wrote: The kernel can already do compression and encryption. Yes, if we all could agree on _which_ compression and encryption Any of those available in the kernel. Where's the problem? gzip is too slow for this. lzf works okay. Oh and swsusp wants rsa crypto. Then port lzf to the kernel, or help with the lzo port. Swsusp might want RSA crypto, but it doesn't really need it. Currently it only uses it to be able to suspend without asking for a passphrase. So the current sequence is: 1) Generate RSA keys + ask for a passphrase. (Once) ... 2) Suspend. (Encrypt snapshot with public RSA key). ... 3) Ask for the passphrase. 4) Resume. RSA is used so that the passphrase can be thrown away between 1 and 2. But the same functionality can be achieved by doing: 1) Define a user password (e.g. /etc/shadow thing). (Once) 2) When a user logs in: get random data and encrypt it with the password, this becomes the AES key. Store both the data and key in a secure way in memory, e.g. using the existing kernel key infrastructure. ... 3) Suspend. (Encrypt snapshot with the AES key and store the random data.) ... 3) Ask for the passphrase. (To get the AES key, encrypt the stored random data.) 4) Resume. Variants are possible of course, but this is the main idea. This is secure because the key infrastructure is secure, and even if it isn't the system must be compromised to get the suspend key before the suspend is done. But at that point the attacker already has all information that can be found in the suspend image, and could have done all kind of things to inflict damage (like installing a key logger). Advantage of this scheme is that it only need AES and can be done (mostly) in kernel space. It's also faster and simpler than the current RSA scheme. Disadvantage is that it wastes at least 32 bytes of memory when the system is running, to store the data and key. Only thing that needs to be done in userspace is setting the random data and AES key, but there exist a suitable interface for that (the key system). As user login is already done in user space, this can be integrated with that in a nice way. Greetings, Indan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On May 03, 2007, at 11:10:47, Pavel Machek wrote: How mature is freezing filesystems -- will it work on at least ext2/3 and vfat? I'm pretty sure it works on ext2/3 and xfs and possibly others, I don't know either way about VFAT though. Essentially the "freeze" part involves telling the filesystem to sync all data, flush the journal, and mark the filesystem clean. The intent under dm/LVM was to allow you to make snapshots without having to fsck the just- created snapshot before you mounted it. What happens if you try to boot and filesystems are frozen from previous run? If you're just doing a fresh boot then the filesystem is already clean due to the dm freeze and so it mounts up normally. All you need to do then is have a little startup script which purges the saved image before you fsck or remount things read-write since either case means the image is no longer safe to resume. If the kernel is later modified to purge all filesystem data (dcache/ pagecache) during snapshot and effectively remount and reopen all the files by path during restore then you could remove that requirement. You'd just need to make sure that the restore-from-disk scripts did an fsck or journal-restore before reloading the old kernel data. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > 1) if the kernel threads are frozen, we know that they don't hold any locks > > that could interfere with the freezing of device drivers, > > 2) if they are frozen, we know, for example, that they won't call user mode > > helpers or do similar things, > > 3) if they are frozen, we know that they won't submit I/O to disks and > > potentially damage filesystems (suspend2 has much more problems with that > > than swsusp, but still. And yes, there have been bug reports related to it, > > so it's not just my fantasy). > > NONE of these are valid explanations at all. You're listing totally > theoretical problems, and ignoring all the _real_ problems that trying to > freeze kernel threads has _caused_. xfs problem was real. And I do not see that many problems caused by freezing kernel threads: at least you get deadlocks, not silent fs corruption. > And no, kernel threads do not submit IO to disks on their own. You just > made that up. Yes, they can be involved in that whole disk submission > thing, but in a good way - they can be required in order to make disk > writing work! Yep, so we have md doing io while we are doing atomic copy. That probably means it will continue when atomic copy is done... getting image out of sync with disk. (Plus we used to have bdflush, doing periodic writes to disk). > The problem that suspend has had is that it's done everything totally the > wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. > For example, kernel threads can be involved in md etc, but that's a *good* > thing. The way to shut them up is not to freeze the threads, but to freeze > the *disk*. Well, if freezing the disk was available, I'd gladly do it. Is there easy way to implement that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > >>It makes it harder to debug (wouldn't it be *nice* to > >>just ssh in, and do > >>gdb -p > > > >Make the machine being suspended a VM and you can > >already do that. > > >>when something goes wrong?) but we also *depend* on > >>user space for various things (the same way we depend > >>on kernel threads, and why it has been such a total > >>disaster to try to freeze the kernel threads too!). > >>For example, if you want to do graphical stuff, just > >>using X would be quite nice, wouldn't it? > > > >But in doing so you make the contents of the disk > >inconsistent with the state you've just snapshotted, > >leading to filesystem corruption. Even if you modify > >filesystems to do checkpointing (which is what we're > >really talking about), you still also have the problem > >that your snapshot has to be stored somewhere before > >you write it to disk, so you also have to either [snip] > > Actually, it's a lot simpler than that. We can just > combine the device-mapper snapshot with a VM+kernel > snapshot system call and be almost done: > > sys_snapshot(dev_t snapblockdev, int __user > *snapshotfd); > > When sys_snapshot is run, the kernel does: > > 1) Sequentially freeze mounted filesystems using > blockdev freezing. If it's an fs that doesn't support > freezing then either fail or force- remount-ro that fs > and downgrade all its filedescriptors to RO. Doesn't > need extra locking since process which try to do IO > either succeed before the freeze call returns for that > blockdev or sleep on the unfreeze of that blockdev. > Filesystems are synchronized and made clean. How mature is freezing filesystems -- will it work on at least ext2/3 and vfat? What happens if you try to boot and filesystems are frozen from previous run? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > > The kernel can already do compression and encryption. > > > > Yes, if we all could agree on _which_ compression and encryption > > Any of those available in the kernel. Where's the problem? gzip is too slow for this. lzf works okay. Oh and swsusp wants rsa crypto. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > While that would certainly be nifty, I think we're arguably starting > > from the wrong point here. Why are we booting a kernel, trying to poke > > the hardware back into some sort of mock-quiescent state, freeing memory > > and then (finally) overwriting the entire contents of RAM rather than > > just doing all of this from the bootloader? Doing it from the bootloader sounds attractive... but it is lot of work. I'm essentially using linux as a bootloader. Patch for grub welcome. > Sure, you could make suspend generate a complete bootable kernel image > containing all RAM. Doesn't sound too hard to me. You know, from over > here on the sidelines. Ah, so we have a volunteer :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! The kernel can already do compression and encryption. Yes, if we all could agree on _which_ compression and encryption Any of those available in the kernel. Where's the problem? gzip is too slow for this. lzf works okay. Oh and swsusp wants rsa crypto. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! 1) if the kernel threads are frozen, we know that they don't hold any locks that could interfere with the freezing of device drivers, 2) if they are frozen, we know, for example, that they won't call user mode helpers or do similar things, 3) if they are frozen, we know that they won't submit I/O to disks and potentially damage filesystems (suspend2 has much more problems with that than swsusp, but still. And yes, there have been bug reports related to it, so it's not just my fantasy). NONE of these are valid explanations at all. You're listing totally theoretical problems, and ignoring all the _real_ problems that trying to freeze kernel threads has _caused_. xfs problem was real. And I do not see that many problems caused by freezing kernel threads: at least you get deadlocks, not silent fs corruption. And no, kernel threads do not submit IO to disks on their own. You just made that up. Yes, they can be involved in that whole disk submission thing, but in a good way - they can be required in order to make disk writing work! Yep, so we have md doing io while we are doing atomic copy. That probably means it will continue when atomic copy is done... getting image out of sync with disk. (Plus we used to have bdflush, doing periodic writes to disk). The problem that suspend has had is that it's done everything totally the wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. For example, kernel threads can be involved in md etc, but that's a *good* thing. The way to shut them up is not to freeze the threads, but to freeze the *disk*. Well, if freezing the disk was available, I'd gladly do it. Is there easy way to implement that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do gdb -p snapshotter Make the machine being suspended a VM and you can already do that. when something goes wrong?) but we also *depend* on user space for various things (the same way we depend on kernel threads, and why it has been such a total disaster to try to freeze the kernel threads too!). For example, if you want to do graphical stuff, just using X would be quite nice, wouldn't it? But in doing so you make the contents of the disk inconsistent with the state you've just snapshotted, leading to filesystem corruption. Even if you modify filesystems to do checkpointing (which is what we're really talking about), you still also have the problem that your snapshot has to be stored somewhere before you write it to disk, so you also have to either [snip] Actually, it's a lot simpler than that. We can just combine the device-mapper snapshot with a VM+kernel snapshot system call and be almost done: sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); When sys_snapshot is run, the kernel does: 1) Sequentially freeze mounted filesystems using blockdev freezing. If it's an fs that doesn't support freezing then either fail or force- remount-ro that fs and downgrade all its filedescriptors to RO. Doesn't need extra locking since process which try to do IO either succeed before the freeze call returns for that blockdev or sleep on the unfreeze of that blockdev. Filesystems are synchronized and made clean. How mature is freezing filesystems -- will it work on at least ext2/3 and vfat? What happens if you try to boot and filesystems are frozen from previous run? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! While that would certainly be nifty, I think we're arguably starting from the wrong point here. Why are we booting a kernel, trying to poke the hardware back into some sort of mock-quiescent state, freeing memory and then (finally) overwriting the entire contents of RAM rather than just doing all of this from the bootloader? Doing it from the bootloader sounds attractive... but it is lot of work. I'm essentially using linux as a bootloader. Patch for grub welcome. Sure, you could make suspend generate a complete bootable kernel image containing all RAM. Doesn't sound too hard to me. You know, from over here on the sidelines. Ah, so we have a volunteer :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On May 03, 2007, at 11:10:47, Pavel Machek wrote: How mature is freezing filesystems -- will it work on at least ext2/3 and vfat? I'm pretty sure it works on ext2/3 and xfs and possibly others, I don't know either way about VFAT though. Essentially the freeze part involves telling the filesystem to sync all data, flush the journal, and mark the filesystem clean. The intent under dm/LVM was to allow you to make snapshots without having to fsck the just- created snapshot before you mounted it. What happens if you try to boot and filesystems are frozen from previous run? If you're just doing a fresh boot then the filesystem is already clean due to the dm freeze and so it mounts up normally. All you need to do then is have a little startup script which purges the saved image before you fsck or remount things read-write since either case means the image is no longer safe to resume. If the kernel is later modified to purge all filesystem data (dcache/ pagecache) during snapshot and effectively remount and reopen all the files by path during restore then you could remove that requirement. You'd just need to make sure that the restore-from-disk scripts did an fsck or journal-restore before reloading the old kernel data. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sunday, 29 April 2007 10:59, Pavel Machek wrote: > Hi! > > > > Ie we do have history of _not_ freezing things. The freezing came later, > > > and came with the subsystem that had more problems.. > > > > It doesn't have that many problems as you are trying to suggest. At > > present, > > the only problems with it happen if someone tries to "improve" it in the way > > I did with the workqueues. > > > > Anyway, the freezing of tasks, including kernel threads, is one of the few > > things on which Pavel, Nigel and me completely agree that they should be > > done, > > so perhaps you could accept that? > > Actually, if we want to support OLPC _nicely_, we'll need to get rid > of freezer from suspend-to-RAM. Of course, that _will_ put more > pressure at the drivers -- and break few of them... I think the removal of sys_sync() from freeze_processes() in the s2ram case might help. I'm really afraid of dropping the freezing of kernel threads from the hibernation/suspend altogether before we know we won't break drivers, because we can introduce some very subtle and difficult to debug problems this way. Moreover, apart from speeding up the suspend slightly (kernel threads are frozen very quickly) this won't buy us anything, since kprobes uses the freezer and all of the infrastructure is needed anyway. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sunday, 29 April 2007 10:23, Pavel Machek wrote: > Hi! > > > > > The freezer has *caused* those deadlocks (eg by stopping threads that > > > > were > > > > needed for the suspend writeouts to succeed!), not solved them. > > > > > > I can't remember anything like this, but I believe you have a specific > > > test > > > case in mind. > > > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? > > > > Rafael, you really don't know what you're talking about, do you? > > > > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > > exactly *because* they do IO. You claim that kernel threads shouldn't do > > IO, but that's the point: if you cannot do IO when snapshotting to disk, > > here's a damn big clue for you: how do you think that snapshot is going to > > get written? > > > > I *guarantee* you that we've had a lot more problems with threads that > > should *not* have been frozen than with those hypothetical threads that > > you think should have been frozen. > > Well, we had nasty corruption on XFS, caused by thread that was not > frozen and should be. (While the other case leads "only" to deadlocks, > so it is easier to debug.) > > The locking point.. when I added freezing to swsusp, I knew very > little about kernel locking, so I "simply" decided to avoid the > problem altogether... using the freezer. > > You may be right that locks are not a big problem for the hibernation > after all; I just do not know. Still, I think, if a kernel thread is a part of a device driver, then _in_ _principle_ it needs _some_ synchronization with the driver's suspend/freeze and resume/thaw callbacks. For example, it's reasonable to assume that the thread should be quiet between suspend/freeze and resume/thaw. With the freezing of kernel threads we provide a simple means of such synchronization: use try_to_freeze() in a suitable place of your kernel thread and you're done. [Well, there should be a second part for making the thread die if the thaw callback doesn't find the device, but that's in the works.] Without it, there may be race conditions that we are not even aware of and that may trigger in, say, 1 in 10 suspends or so and I wish you good luck with debugging such things. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > Ie we do have history of _not_ freezing things. The freezing came later, > > and came with the subsystem that had more problems.. > > It doesn't have that many problems as you are trying to suggest. At present, > the only problems with it happen if someone tries to "improve" it in the way > I did with the workqueues. > > Anyway, the freezing of tasks, including kernel threads, is one of the few > things on which Pavel, Nigel and me completely agree that they should be done, > so perhaps you could accept that? Actually, if we want to support OLPC _nicely_, we'll need to get rid of freezer from suspend-to-RAM. Of course, that _will_ put more pressure at the drivers -- and break few of them... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sunday, 29 April 2007 01:45, Linus Torvalds wrote: > > On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: > > > > OK, more precisely: fs-related threads should not try to process their > > queues, > > etc., after the snapshot is done, because that may cause some fs data to be > > written at that time and then the fs in question may be corrupted after the > > restore. Not all of the I/O in general, fs data. > > But that's not true _either_. That's only true because right now I think > we cannot even suspend to a swapfile (I might be wrong). You are. > If you have a swapfile on a filesystem, you'd need those fs queues > running! No, I don't. It's done by bmapping the file and writing directly to the underlying blockdev. Otherwise we'd have corrupted filesystems after the restore. Swapfiles are handled this way anyway, so we just use the same code. > > Well, I'm not sure whether or not that still would have been the case if we > > had > > stopped to freeze kernel threads for the hibernation/suspend. > > Did you miss the email where Paul pointed out that Mac/PowerPC didn't use > to do any of this? No, I didn't. > And apparently never had any issues with it? On one platform with a limited subset of device drivers. > And probably worked more reliably several years ago than suspend/hibernation > does _today_? I have no problems with the hibernation on my test boxes (six of them), except for one network driver that doesn't bother to define a .suspend() callback. There are problems with the suspend (s2ram), but they are _not_ related to the freezing of kernel threads. Some of them are related to the other issue that you have risen, which is that the same callbacks should not be used for the suspend and hibernation, and which I think is absolutely valid. The remaining ones are related to the fact that graphic card vendors don't care for us at all. > Ie we do have history of _not_ freezing things. The freezing came later, > and came with the subsystem that had more problems.. It doesn't have that many problems as you are trying to suggest. At present, the only problems with it happen if someone tries to "improve" it in the way I did with the workqueues. Anyway, the freezing of tasks, including kernel threads, is one of the few things on which Pavel, Nigel and me completely agree that they should be done, so perhaps you could accept that? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > > The freezer has *caused* those deadlocks (eg by stopping threads that > > > were > > > needed for the suspend writeouts to succeed!), not solved them. > > > > I can't remember anything like this, but I believe you have a specific test > > case in mind. > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? > > Rafael, you really don't know what you're talking about, do you? > > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > exactly *because* they do IO. You claim that kernel threads shouldn't do > IO, but that's the point: if you cannot do IO when snapshotting to disk, > here's a damn big clue for you: how do you think that snapshot is going to > get written? > > I *guarantee* you that we've had a lot more problems with threads that > should *not* have been frozen than with those hypothetical threads that > you think should have been frozen. Well, we had nasty corruption on XFS, caused by thread that was not frozen and should be. (While the other case leads "only" to deadlocks, so it is easier to debug.) The locking point.. when I added freezing to swsusp, I knew very little about kernel locking, so I "simply" decided to avoid the problem altogether... using the freezer. You may be right that locks are not a big problem for the hibernation after all; I just do not know. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Rafael, you really don't know what you're talking about, do you? Just _look_ at them. It's the IO threads etc that shouldn't be frozen, exactly *because* they do IO. You claim that kernel threads shouldn't do IO, but that's the point: if you cannot do IO when snapshotting to disk, here's a damn big clue for you: how do you think that snapshot is going to get written? I *guarantee* you that we've had a lot more problems with threads that should *not* have been frozen than with those hypothetical threads that you think should have been frozen. Well, we had nasty corruption on XFS, caused by thread that was not frozen and should be. (While the other case leads only to deadlocks, so it is easier to debug.) The locking point.. when I added freezing to swsusp, I knew very little about kernel locking, so I simply decided to avoid the problem altogether... using the freezer. You may be right that locks are not a big problem for the hibernation after all; I just do not know. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sunday, 29 April 2007 01:45, Linus Torvalds wrote: On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: OK, more precisely: fs-related threads should not try to process their queues, etc., after the snapshot is done, because that may cause some fs data to be written at that time and then the fs in question may be corrupted after the restore. Not all of the I/O in general, fs data. But that's not true _either_. That's only true because right now I think we cannot even suspend to a swapfile (I might be wrong). You are. If you have a swapfile on a filesystem, you'd need those fs queues running! No, I don't. It's done by bmapping the file and writing directly to the underlying blockdev. Otherwise we'd have corrupted filesystems after the restore. Swapfiles are handled this way anyway, so we just use the same code. Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? No, I didn't. And apparently never had any issues with it? On one platform with a limited subset of device drivers. And probably worked more reliably several years ago than suspend/hibernation does _today_? I have no problems with the hibernation on my test boxes (six of them), except for one network driver that doesn't bother to define a .suspend() callback. There are problems with the suspend (s2ram), but they are _not_ related to the freezing of kernel threads. Some of them are related to the other issue that you have risen, which is that the same callbacks should not be used for the suspend and hibernation, and which I think is absolutely valid. The remaining ones are related to the fact that graphic card vendors don't care for us at all. Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. It doesn't have that many problems as you are trying to suggest. At present, the only problems with it happen if someone tries to improve it in the way I did with the workqueues. Anyway, the freezing of tasks, including kernel threads, is one of the few things on which Pavel, Nigel and me completely agree that they should be done, so perhaps you could accept that? Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. It doesn't have that many problems as you are trying to suggest. At present, the only problems with it happen if someone tries to improve it in the way I did with the workqueues. Anyway, the freezing of tasks, including kernel threads, is one of the few things on which Pavel, Nigel and me completely agree that they should be done, so perhaps you could accept that? Actually, if we want to support OLPC _nicely_, we'll need to get rid of freezer from suspend-to-RAM. Of course, that _will_ put more pressure at the drivers -- and break few of them... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sunday, 29 April 2007 10:23, Pavel Machek wrote: Hi! The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Rafael, you really don't know what you're talking about, do you? Just _look_ at them. It's the IO threads etc that shouldn't be frozen, exactly *because* they do IO. You claim that kernel threads shouldn't do IO, but that's the point: if you cannot do IO when snapshotting to disk, here's a damn big clue for you: how do you think that snapshot is going to get written? I *guarantee* you that we've had a lot more problems with threads that should *not* have been frozen than with those hypothetical threads that you think should have been frozen. Well, we had nasty corruption on XFS, caused by thread that was not frozen and should be. (While the other case leads only to deadlocks, so it is easier to debug.) The locking point.. when I added freezing to swsusp, I knew very little about kernel locking, so I simply decided to avoid the problem altogether... using the freezer. You may be right that locks are not a big problem for the hibernation after all; I just do not know. Still, I think, if a kernel thread is a part of a device driver, then _in_ _principle_ it needs _some_ synchronization with the driver's suspend/freeze and resume/thaw callbacks. For example, it's reasonable to assume that the thread should be quiet between suspend/freeze and resume/thaw. With the freezing of kernel threads we provide a simple means of such synchronization: use try_to_freeze() in a suitable place of your kernel thread and you're done. [Well, there should be a second part for making the thread die if the thaw callback doesn't find the device, but that's in the works.] Without it, there may be race conditions that we are not even aware of and that may trigger in, say, 1 in 10 suspends or so and I wish you good luck with debugging such things. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sunday, 29 April 2007 10:59, Pavel Machek wrote: Hi! Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. It doesn't have that many problems as you are trying to suggest. At present, the only problems with it happen if someone tries to improve it in the way I did with the workqueues. Anyway, the freezing of tasks, including kernel threads, is one of the few things on which Pavel, Nigel and me completely agree that they should be done, so perhaps you could accept that? Actually, if we want to support OLPC _nicely_, we'll need to get rid of freezer from suspend-to-RAM. Of course, that _will_ put more pressure at the drivers -- and break few of them... I think the removal of sys_sync() from freeze_processes() in the s2ram case might help. I'm really afraid of dropping the freezing of kernel threads from the hibernation/suspend altogether before we know we won't break drivers, because we can introduce some very subtle and difficult to debug problems this way. Moreover, apart from speeding up the suspend slightly (kernel threads are frozen very quickly) this won't buy us anything, since kprobes uses the freezer and all of the infrastructure is needed anyway. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Apr 28, 2007, at 19:45:01, Linus Torvalds wrote: On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? And apparently never had any issues with it? And probably worked more reliably several years ago than suspend/hibernation does _today_? Still works pretty reliably; the last time my PowerBook G4 was rebooted was 6 weeks ago. Once every 60 suspends or so the kernel USB driver gets really confused and doesn't wake up the USB controller properly, leading to dead keyboard/mouse, but other than that I never have problems. I wouldn't be surprised if I could comment out 90% of the "suspend" code and still have it work, the hardware in is is incredibly robust. I can even swap batteries while it's in suspend-to-RAM, as long as I do it in less than 45 sec or so; I get around 6-7 days of suspend-to-RAM time on a full charge. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi. On Sat, 2007-04-28 at 16:45 -0700, Linus Torvalds wrote: > > On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: > > > > OK, more precisely: fs-related threads should not try to process their > > queues, > > etc., after the snapshot is done, because that may cause some fs data to be > > written at that time and then the fs in question may be corrupted after the > > restore. Not all of the I/O in general, fs data. > > But that's not true _either_. That's only true because right now I think > we cannot even suspend to a swapfile (I might be wrong). > > If you have a swapfile on a filesystem, you'd need those fs queues > running! For Suspend2, and I think for swsusp too, we bmap the locations when allocating the storage, and then submit our own bios. Even if swsusp isn't using this method, I'm pretty sure the swap code does bmapping at swapon time to avoid raciness later. > > Well, I'm not sure whether or not that still would have been the case if we > > had > > stopped to freeze kernel threads for the hibernation/suspend. > > Did you miss the email where Paul pointed out that Mac/PowerPC didn't use > to do any of this? And apparently never had any issues with it? And > probably worked more reliably several years ago than suspend/hibernation > does _today_? > > Ie we do have history of _not_ freezing things. The freezing came later, > and came with the subsystem that had more problems.. It also came because of problems. Not working perfectly isn't necessarily a sign of a faulty reason for being added in the first place. I should also add, not freezing things is fine if you're happy with getting half an image at most. If you want a full just-as-if-I'd-never-turned-the-power-off image, you need freezing so that you can have some pages which can be saved before others are atomically copied, to ensure the whole image is consistent. Nigel signature.asc Description: This is a digitally signed message part
Re: Back to the future.
On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: > > OK, more precisely: fs-related threads should not try to process their queues, > etc., after the snapshot is done, because that may cause some fs data to be > written at that time and then the fs in question may be corrupted after the > restore. Not all of the I/O in general, fs data. But that's not true _either_. That's only true because right now I think we cannot even suspend to a swapfile (I might be wrong). If you have a swapfile on a filesystem, you'd need those fs queues running! > Well, I'm not sure whether or not that still would have been the case if we > had > stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? And apparently never had any issues with it? And probably worked more reliably several years ago than suspend/hibernation does _today_? Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 23:25, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > > The freezer has *caused* those deadlocks (eg by stopping threads that > > > were > > > needed for the suspend writeouts to succeed!), not solved them. > > > > I can't remember anything like this, but I believe you have a specific test > > case in mind. > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Well, I don't know why exactly it had been originally introduced. Currently, it is used by the threads that should be running after the snapshot is done (they are not only I/O threads). > Rafael, you really don't know what you're talking about, do you? I think I know. > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > exactly *because* they do IO. You claim that kernel threads shouldn't do > IO, but that's the point: if you cannot do IO when snapshotting to disk, > here's a damn big clue for you: how do you think that snapshot is going to > get written? OK, more precisely: fs-related threads should not try to process their queues, etc., after the snapshot is done, because that may cause some fs data to be written at that time and then the fs in question may be corrupted after the restore. Not all of the I/O in general, fs data. Still, that alone probably is not a good enough reason for freezing all kernel threads. > I *guarantee* you that we've had a lot more problems with threads that > should *not* have been frozen than with those hypothetical threads that > you think should have been frozen. Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. I just see potential problems that I've mentioned in the previous message and I don't see any evidence that they cannot occur. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > The freezer has *caused* those deadlocks (eg by stopping threads that were > > needed for the suspend writeouts to succeed!), not solved them. > > I can't remember anything like this, but I believe you have a specific test > case in mind. Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Rafael, you really don't know what you're talking about, do you? Just _look_ at them. It's the IO threads etc that shouldn't be frozen, exactly *because* they do IO. You claim that kernel threads shouldn't do IO, but that's the point: if you cannot do IO when snapshotting to disk, here's a damn big clue for you: how do you think that snapshot is going to get written? I *guarantee* you that we've had a lot more problems with threads that should *not* have been frozen than with those hypothetical threads that you think should have been frozen. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 20:43, David Lang wrote: > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: > >> The problem with writing in the kernel is obvious: we need to add new code > >> to the kernel for compression, encryption, and userspace interaction > >> (graphical progress bar) that are important for user experience. > > > > Yes, and that's why we wanted to introduce the userland part. The problem > > with this approach, as it's turned out, is that the userland part must be a > > very specialized piece of software, really careful of what it's doing, > > mainly > > because of the inability to checkpoint filesystems. If we could checkpoint > > filesystems and were able to unfreeze the user space after creating the > > snapshot without the risk of corrupting filesystems in the restore phase, > > the userland part could be much simpler (even as simple as Linus suggested). > > this sounds like a really good argument for having a useable userspace > running. > we already have the LVM snapshot code in the kernel, so we have the pieces > available to protect the filesystems, we just need to figure out how to put > them > togeather. (the simpliest way would be to make a new suspend package that > required the user to use LVM so that snapshots are available, but this is > also > the most disruptive approach) Yes. I personally know very little about the LVM snapshot code and I wasn't aware of its capabilities. If we can make it possible to run the user space safely after we've created the memory snapshot, I'm all for it. As far as the package is concerned, we can just add the new user space tools to the suspend package containing our existing userland part. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: On Saturday, 28 April 2007 20:32, David Lang wrote: On Sat, 28 Apr 2007, Pavel Machek wrote: We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. remember that we are doing suspend-to-disk, after we do the snapshot we will be doing a shutdown. that should simplify the locking issues That's assuming that we won't need to cancel the hibernation. true, but if we cancel the hibernation then why are the locks an issue? they are appropriate for the system state. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. Yes, and that's why we wanted to introduce the userland part. The problem with this approach, as it's turned out, is that the userland part must be a very specialized piece of software, really careful of what it's doing, mainly because of the inability to checkpoint filesystems. If we could checkpoint filesystems and were able to unfreeze the user space after creating the snapshot without the risk of corrupting filesystems in the restore phase, the userland part could be much simpler (even as simple as Linus suggested). this sounds like a really good argument for having a useable userspace running. we already have the LVM snapshot code in the kernel, so we have the pieces available to protect the filesystems, we just need to figure out how to put them togeather. (the simpliest way would be to make a new suspend package that required the user to use LVM so that snapshots are available, but this is also the most disruptive approach) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 20:32, David Lang wrote: > On Sat, 28 Apr 2007, Pavel Machek wrote: > > >> > >> We freeze user space processes for the reasons that you have quoted above. > >> > >> Why we freeze kernel threads in there too is a good question, but not for > >> me to > >> answer. I don't know. Pavel should know, I think. > > > > We do not want kernel threads running: > > > > a) they may hold some locks and deadlock suspend > > > > b) they may do some writes to disk, leading to corruption > > > > We could solve a) by carefully auditing suspend lock usage to make > > sure deadlocks are impossible even with kernel threads running. > > remember that we are doing suspend-to-disk, after we do the snapshot we will > be > doing a shutdown. that should simplify the locking issues That's assuming that we won't need to cancel the hibernation. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Pavel Machek wrote: We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. remember that we are doing suspend-to-disk, after we do the snapshot we will be doing a shutdown. that should simplify the locking issues David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Nigel Cunningham wrote: Please, go apply that logic elsewhere, then cut out (or at least stop adding) support for users with less common needs in other areas. I fully acknowledge that most users have only one place to store their image and it's a swap device. But that doesn't mean one size fits all. I think to some extent that's part of the problem. Consider for a moment that a /dev/hibernate would be required, and that it must be (a) a disk, or (b) a partition, or (c) other devices in the future, like an nbd, USB flash or DVD. Don't have a device like that, then can't hibernate. Stop trying to be smart and use swap for two different things. Stop trying to have an interface between user space and kernel which does things not required to preserve the system. A progress indicator is not needed, power off is my progress indicator, and should be the sole valid end of a hibernate. A full image implies that you need to figure out what's not going to change while you're writing it and save that separately. At the moment, I'm treating most of the LRU contents as that list. If we're going to start trying to let every man and his dog run while we're trying to snapshot the system, that's not going to work anymore - or the logic will get a lot more complicated. Sorry. I never thought I'd say this, but I think you're being naive about how simple the process of snapshotting a system is. Hibernate is useful to avoid complex boot, it's useful as the UPS gets tired, and putting features in the process beyond saving the snap (possibly compressed and/or encrypted) just adds complexity. Put it all in the kernel and use /sys/power/state as the user interface. Stop oversolving the problem. No, that doesn't avoid other hard issues, but for the most part suspend2 has addressed them. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Oliver Neukum wrote: Am Samstag, 28. April 2007 01:50 schrieb David Lang: 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) 6. shutdown system (throw away all userspace memory, no need to do graceful shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if needed) And then you'll have people wonder why the server which sent out all those files has no log entries. You'd have to selectively unfreeze user space, which is a cure worse than the desease. Simply throwing away user space work is a bug. And no, you cannot say that it'll be redone away, as you are throwing away accepted input, too. when you are doing a suspend-to-disk I disagree with you. whoever is doing the suspend knows what is going on, and they can decide what needs to be done. the only case where you have 'unexpected' work being thrown away is if you are suspending a network server, and the process of suspending it is going to cut all the network connections anyway so it's not a seamless process. In this case it's fair to let the sysadmin choose between loosing some logs or doing some other step to prevent this from happening (which could be to shutdown the network service, or load a iptables rule to block the service) however, most of the uses of suspend-to-disk are going to be single-user machines and in that case telling the user that anything that they do after issuing the suspend is going to be lost is a perfectly sane thing to do. and for that matter, if the snapshot is cheap enough, some people may choose to cron the snapshot portion of a suspend-to-disk evvery few min as a safety net for something going wrong. In this case they really do want all of userspace to keep working after the snapshot. David Lang
Re: Back to the future.
On Saturday, 28 April 2007 18:28, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Pavel Machek wrote: > > > > We do not want kernel threads running: > > > > a) they may hold some locks and deadlock suspend > > > > b) they may do some writes to disk, leading to corruption > > You're really just making both of those up. > > If a kernel thread holds a lock and deadlocks suspend, that would deadlock > anythign else _too_. Suspend isn't *that* special. Everything it does are > things other people do too. > > And no, kernel threads do not write to disk on their own. Name one. xfssyncd , or at least it seems so at a quick look. > They help *others* write to disk, but those disk writes need to happen. > > The freezer has *caused* those deadlocks (eg by stopping threads that were > needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. > So stop making these totally bogus arguments up. Well, they may be bogus, but there's something else. I have reviewed some kernel threads used by device drivers that currently are frozen to see if it would be safe not to freeze them, and I'm worried. What, for example, if such a thread schedules a timeout and waits for something to happen (eg. the airo driver does something like this), but instead the hibernation/suspend happens and the device is frozen/suspended under it? Shouldn't the thread be notified by the driver's freeze/suspend callback? Moreover, what if after the restore the device is not present (for example, it may be a pcmcia card that the user has removed) and the thread is scheduled before the device's unfreeze callback has a chance to run? Shouldn't the thread check that the device is present? In that case it would have to be notified by someone that the check is necessary, but who can do that? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Pavel Machek wrote: > > We do not want kernel threads running: > > a) they may hold some locks and deadlock suspend > > b) they may do some writes to disk, leading to corruption You're really just making both of those up. If a kernel thread holds a lock and deadlocks suspend, that would deadlock anythign else _too_. Suspend isn't *that* special. Everything it does are things other people do too. And no, kernel threads do not write to disk on their own. Name one. They help *others* write to disk, but those disk writes need to happen. The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. So stop making these totally bogus arguments up. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Am Samstag, 28. April 2007 11:22 schrieb Pekka Enberg: > Hi Oliver, > > Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: > > > The problem with writing in the kernel is obvious: we need to add new code > > > to the kernel for compression, encryption, and userspace interaction > > > (graphical progress bar) that are important for user experience. > > On 4/27/07, Oliver Neukum <[EMAIL PROTECTED]> wrote: > > The kernel can already do compression and encryption. > > Yes, if we all could agree on _which_ compression and encryption Any of those available in the kernel. Where's the problem? > algorithm(s) we want to use. It goes beyond that too, where do you > want to save the image? In the swap device or a regular file? And A swap device is doubtlessly easier. But isn't the problem of using a swap file already fixed? The writeout seems the easiest part of hibernation. > don't forget about debuggability either. It's faster to do a > snapshot/resume without shutdown/restart in the middle or just do a > snapshot, and examine its contents. Then use a "fake reboot" option and save the image to a ramdisk. It isn't that hard. You must be able to survive that, as io errors during write out are possible. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Pavel Machek <[EMAIL PROTECTED]> wrote: >> I also don't like the idea of storing this in the swap partition for a >> couple of reasons. >> >> 1. on many modern linux systems the swap partition is not large enough. >> >> for example, on my boxes with 16G or ram I only allocate 2G of swap >> space > > WTF? So allocate larger swap partition. You just told me disks are big > enough. 1) Repartitioning is sometimes not an option. 2) What happens, if the swap space gets used? I want to be sure I can suspend my {server,laptop} in case of power running out. Using swap is only an option for desktops. >> 2. it's too easy for other things to stomp on your swap partition. >> >> for example: booting from a live CD that finds and uses swap >> partitions > > That's a feature. If you are booting from live CD, you _want_ to erase > any hibernation image. NACK. You want to keep all partitions related to the hibernated system read-only. That's completely different from destroying all your unsafed data and possibly long-running tasks. -- Top 100 things you don't want the sysadmin to say: 51. YEEEHA!!! What a CRASH!!! Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: > Am Freitag, 27. April 2007 08:18 schrieb Pekka J Enberg: > > > No. The snapshot is just that. A snapshot in time. From kernel point of > > > view, it doesn't matter one bit what when you did it or if the state has > > > changed before you resume. It's up to userspace to make sure the user > > > doesn't do real work while the snapshot is being written to disk and > > > machine is shut down. > > On Fri, 27 Apr 2007, Oliver Neukum wrote: > > And where is the benefit in that? How is such user space freezing logic > > simpler than having the kernel do the write? > > > > What can you do in user space if all filesystems are r/o that is worth the > > hassle? > > I am talking about snapshot_system() here. It's not given that the > filesystems need to be read-only (you can snapshot them too). The benefit > here is that you can do whatever you want with the snapshot (encrypt, > compress, send over the network) and have a clean well-defined interface > in the kernel. In addition, aborting the snapshot is simpler, simply > munmap() the snapshot. Well, swsusp currently does almost the same, except that you can read the image from the kernel as a stream of bytes, using read() and, during the restore phase, upload the same image using write(). The advantage of this is that the interface is symmetrical from the user space's point of view. [You're cancelling the hibernation by closing /dev/snapshot, which also is quite natural.] If you look at the interface in user.c, there are only two ioctls really needed for that in there, SNAPSHOT_ATOMIC_SNAPSHOT and SNAPSHOT_ATOMIC_RESTORE. Two more are handy for freezing tasks, SNAPSHOT_FREEZE and SNAPSHOT_UNFREEZE. The others were added later, to make the user space part simpler or capable of doing some fancy stuff, which I am ready to admit was a mistake. > The problem with writing in the kernel is obvious: we need to add new code > to the kernel for compression, encryption, and userspace interaction > (graphical progress bar) that are important for user experience. Yes, and that's why we wanted to introduce the userland part. The problem with this approach, as it's turned out, is that the userland part must be a very specialized piece of software, really careful of what it's doing, mainly because of the inability to checkpoint filesystems. If we could checkpoint filesystems and were able to unfreeze the user space after creating the snapshot without the risk of corrupting filesystems in the restore phase, the userland part could be much simpler (even as simple as Linus suggested). Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi Oliver, Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: > The problem with writing in the kernel is obvious: we need to add new code > to the kernel for compression, encryption, and userspace interaction > (graphical progress bar) that are important for user experience. On 4/27/07, Oliver Neukum <[EMAIL PROTECTED]> wrote: The kernel can already do compression and encryption. Yes, if we all could agree on _which_ compression and encryption algorithm(s) we want to use. It goes beyond that too, where do you want to save the image? In the swap device or a regular file? And don't forget about debuggability either. It's faster to do a snapshot/resume without shutdown/restart in the middle or just do a snapshot, and examine its contents. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 10:50, Pavel Machek wrote: > Hi! > > > > In many ways, "at all". > > > > > > I _do_ realize the IO request queue issues, and that we cannot actually > > > do > > > s2ram with some devices in the middle of a DMA. So we want to be able to > > > avoid *that*, there's no question about that. And I suspect that stopping > > > user threads and then waiting for a sync is practically one of the easier > > > ways to do so. > > > > > > So in practice, the "at all" may become a "why freeze kernel threads?" > > > and > > > freezing user threads I don't find really objectionable. > > > > > > But as Paul pointed out, Linux on the old powerpc Mac hardware was > > > actually rather famous for having working (and reliable) suspend long > > > before it worked even remotely reliably on PC's. And they didn't do even > > > that. > > > > > > (They didn't have ACPI, and they had a much more limited set of devices, > > > but the whole process freezer is really about neither of those issues. > > > The > > > wild and wacky PC hardware has its problems, but that's _one_ thing we > > > can't blame PC hardware for ;) > > > > We freeze user space processes for the reasons that you have quoted above. > > > > Why we freeze kernel threads in there too is a good question, but not for > > me to > > answer. I don't know. Pavel should know, I think. > > We do not want kernel threads running: > > a) they may hold some locks and deadlock suspend Yeah, the same issue as with the hibernation and I do think it's _real_. > b) they may do some writes to disk, leading to corruption Hmm, is that an issue in the suspend (aka s2ram) case? > We could solve a) by carefully auditing suspend lock usage to make > sure deadlocks are impossible even with kernel threads running. Yes, we can, but for now it's not been done yet. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Oliver Neukum wrote: > And then you'll have people wonder why the server which sent out all > those files has no log entries. You'd have to selectively unfreeze user > space, which is a cure worse than the desease. > > Simply throwing away user space work is a bug. And no, you cannot say that > it'll be redone away, as you are throwing away accepted input, too. It's not a bug, it's a feature =). While I totally agree with you that for the common case, you probably do want to avoid work in the userspace after taking the snapshot, it is something that should be solved separately. There is absolutely nothing wrong with taking a snapshot, doing some work, and then resuming to the snapshot and thus "losing" some the work (this is useful for debugging, for example). Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Am Samstag, 28. April 2007 01:50 schrieb David Lang: > 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) > 4. unpause > 5. save image (with full userspace available, including network) > 6. shutdown system (throw away all userspace memory, no need to do graceful > shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if > needed) And then you'll have people wonder why the server which sent out all those files has no log entries. You'd have to selectively unfreeze user space, which is a cure worse than the desease. Simply throwing away user space work is a bug. And no, you cannot say that it'll be redone away, as you are throwing away accepted input, too. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! > > In many ways, "at all". > > > > I _do_ realize the IO request queue issues, and that we cannot actually do > > s2ram with some devices in the middle of a DMA. So we want to be able to > > avoid *that*, there's no question about that. And I suspect that stopping > > user threads and then waiting for a sync is practically one of the easier > > ways to do so. > > > > So in practice, the "at all" may become a "why freeze kernel threads?" and > > freezing user threads I don't find really objectionable. > > > > But as Paul pointed out, Linux on the old powerpc Mac hardware was > > actually rather famous for having working (and reliable) suspend long > > before it worked even remotely reliably on PC's. And they didn't do even > > that. > > > > (They didn't have ACPI, and they had a much more limited set of devices, > > but the whole process freezer is really about neither of those issues. The > > wild and wacky PC hardware has its problems, but that's _one_ thing we > > can't blame PC hardware for ;) > > We freeze user space processes for the reasons that you have quoted above. > > Why we freeze kernel threads in there too is a good question, but not for me > to > answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
progress meter in s2disk (was Re: Back to the future.)
Hi! > > > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > > > > another thing that really makes no sense at all - and we do it not > > > > > just for snapshotting, but for s2ram too. Can you tell me *why*? > > > > > > > > Why we freeze tasks at all or why we freeze kernel threads? > > > > > > In many ways, "at all". > > > > > > I _do_ realize the IO request queue issues, and that we cannot actually > > > do s2ram with some devices in the middle of a DMA. So we want to be able > > > to avoid *that*, there's no question about that. And I suspect that > > > stopping user threads and then waiting for a sync is practically one of > > > the easier ways to do so. > > > > > > Apparently I *CANNOT* wrap my head around this - if just because my laptop, > running a vendor 2.6.17 kernel does s2ram perfectly, at least, it does when > using the "Upstart" init system rather than the classical SysV init system. I > have tried it with the classical init and the suspend isn't triggered by the > buttons that used to do it. I didn't try 'echo ram > /sys/power/state', but I > have a feeling that would have worked as well. I have problems with s2disk, > but thats because I keep my swap partition small - I try to keep it at or > around 256M when I have more than half a gig of Ram in a system. Perhaps one > of these days I'll grab a multi-gig flash disk, set it up as a swap partition > and try it again. (every time I've tried s2disk I wind up running out of disk > space - and this is with nothing but X running. Any kind of progress meter > for when the system is doing s2disk would be nice - every time I've tried it > all I see for the nearly 2 minutes before the s2disk attempt ends is a black > screen. I say 2 minutes because thats how long it takes for it to learn that > there isn't enough space on the swap-partition to save the image) Just turn up console loglevel to see the messages. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
progress meter in s2disk (was Re: Back to the future.)
Hi! It's doubly bad, because that idiocy has also infected s2ram. Again, another thing that really makes no sense at all - and we do it not just for snapshotting, but for s2ram too. Can you tell me *why*? Why we freeze tasks at all or why we freeze kernel threads? In many ways, at all. I _do_ realize the IO request queue issues, and that we cannot actually do s2ram with some devices in the middle of a DMA. So we want to be able to avoid *that*, there's no question about that. And I suspect that stopping user threads and then waiting for a sync is practically one of the easier ways to do so. snip Apparently I *CANNOT* wrap my head around this - if just because my laptop, running a vendor 2.6.17 kernel does s2ram perfectly, at least, it does when using the Upstart init system rather than the classical SysV init system. I have tried it with the classical init and the suspend isn't triggered by the buttons that used to do it. I didn't try 'echo ram /sys/power/state', but I have a feeling that would have worked as well. I have problems with s2disk, but thats because I keep my swap partition small - I try to keep it at or around 256M when I have more than half a gig of Ram in a system. Perhaps one of these days I'll grab a multi-gig flash disk, set it up as a swap partition and try it again. (every time I've tried s2disk I wind up running out of disk space - and this is with nothing but X running. Any kind of progress meter for when the system is doing s2disk would be nice - every time I've tried it all I see for the nearly 2 minutes before the s2disk attempt ends is a black screen. I say 2 minutes because thats how long it takes for it to learn that there isn't enough space on the swap-partition to save the image) Just turn up console loglevel to see the messages. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi! In many ways, at all. I _do_ realize the IO request queue issues, and that we cannot actually do s2ram with some devices in the middle of a DMA. So we want to be able to avoid *that*, there's no question about that. And I suspect that stopping user threads and then waiting for a sync is practically one of the easier ways to do so. So in practice, the at all may become a why freeze kernel threads? and freezing user threads I don't find really objectionable. But as Paul pointed out, Linux on the old powerpc Mac hardware was actually rather famous for having working (and reliable) suspend long before it worked even remotely reliably on PC's. And they didn't do even that. (They didn't have ACPI, and they had a much more limited set of devices, but the whole process freezer is really about neither of those issues. The wild and wacky PC hardware has its problems, but that's _one_ thing we can't blame PC hardware for ;) We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Am Samstag, 28. April 2007 01:50 schrieb David Lang: 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) 6. shutdown system (throw away all userspace memory, no need to do graceful shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if needed) And then you'll have people wonder why the server which sent out all those files has no log entries. You'd have to selectively unfreeze user space, which is a cure worse than the desease. Simply throwing away user space work is a bug. And no, you cannot say that it'll be redone away, as you are throwing away accepted input, too. Regards Oliver - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Oliver Neukum wrote: And then you'll have people wonder why the server which sent out all those files has no log entries. You'd have to selectively unfreeze user space, which is a cure worse than the desease. Simply throwing away user space work is a bug. And no, you cannot say that it'll be redone away, as you are throwing away accepted input, too. It's not a bug, it's a feature =). While I totally agree with you that for the common case, you probably do want to avoid work in the userspace after taking the snapshot, it is something that should be solved separately. There is absolutely nothing wrong with taking a snapshot, doing some work, and then resuming to the snapshot and thus losing some the work (this is useful for debugging, for example). Pekka - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 10:50, Pavel Machek wrote: Hi! In many ways, at all. I _do_ realize the IO request queue issues, and that we cannot actually do s2ram with some devices in the middle of a DMA. So we want to be able to avoid *that*, there's no question about that. And I suspect that stopping user threads and then waiting for a sync is practically one of the easier ways to do so. So in practice, the at all may become a why freeze kernel threads? and freezing user threads I don't find really objectionable. But as Paul pointed out, Linux on the old powerpc Mac hardware was actually rather famous for having working (and reliable) suspend long before it worked even remotely reliably on PC's. And they didn't do even that. (They didn't have ACPI, and they had a much more limited set of devices, but the whole process freezer is really about neither of those issues. The wild and wacky PC hardware has its problems, but that's _one_ thing we can't blame PC hardware for ;) We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend Yeah, the same issue as with the hibernation and I do think it's _real_. b) they may do some writes to disk, leading to corruption Hmm, is that an issue in the suspend (aka s2ram) case? We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. Yes, we can, but for now it's not been done yet. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi Oliver, Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. On 4/27/07, Oliver Neukum [EMAIL PROTECTED] wrote: The kernel can already do compression and encryption. Yes, if we all could agree on _which_ compression and encryption algorithm(s) we want to use. It goes beyond that too, where do you want to save the image? In the swap device or a regular file? And don't forget about debuggability either. It's faster to do a snapshot/resume without shutdown/restart in the middle or just do a snapshot, and examine its contents. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: Am Freitag, 27. April 2007 08:18 schrieb Pekka J Enberg: No. The snapshot is just that. A snapshot in time. From kernel point of view, it doesn't matter one bit what when you did it or if the state has changed before you resume. It's up to userspace to make sure the user doesn't do real work while the snapshot is being written to disk and machine is shut down. On Fri, 27 Apr 2007, Oliver Neukum wrote: And where is the benefit in that? How is such user space freezing logic simpler than having the kernel do the write? What can you do in user space if all filesystems are r/o that is worth the hassle? I am talking about snapshot_system() here. It's not given that the filesystems need to be read-only (you can snapshot them too). The benefit here is that you can do whatever you want with the snapshot (encrypt, compress, send over the network) and have a clean well-defined interface in the kernel. In addition, aborting the snapshot is simpler, simply munmap() the snapshot. Well, swsusp currently does almost the same, except that you can read the image from the kernel as a stream of bytes, using read() and, during the restore phase, upload the same image using write(). The advantage of this is that the interface is symmetrical from the user space's point of view. [You're cancelling the hibernation by closing /dev/snapshot, which also is quite natural.] If you look at the interface in user.c, there are only two ioctls really needed for that in there, SNAPSHOT_ATOMIC_SNAPSHOT and SNAPSHOT_ATOMIC_RESTORE. Two more are handy for freezing tasks, SNAPSHOT_FREEZE and SNAPSHOT_UNFREEZE. The others were added later, to make the user space part simpler or capable of doing some fancy stuff, which I am ready to admit was a mistake. The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. Yes, and that's why we wanted to introduce the userland part. The problem with this approach, as it's turned out, is that the userland part must be a very specialized piece of software, really careful of what it's doing, mainly because of the inability to checkpoint filesystems. If we could checkpoint filesystems and were able to unfreeze the user space after creating the snapshot without the risk of corrupting filesystems in the restore phase, the userland part could be much simpler (even as simple as Linus suggested). Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Pavel Machek [EMAIL PROTECTED] wrote: I also don't like the idea of storing this in the swap partition for a couple of reasons. 1. on many modern linux systems the swap partition is not large enough. for example, on my boxes with 16G or ram I only allocate 2G of swap space WTF? So allocate larger swap partition. You just told me disks are big enough. 1) Repartitioning is sometimes not an option. 2) What happens, if the swap space gets used? I want to be sure I can suspend my {server,laptop} in case of power running out. Using swap is only an option for desktops. 2. it's too easy for other things to stomp on your swap partition. for example: booting from a live CD that finds and uses swap partitions That's a feature. If you are booting from live CD, you _want_ to erase any hibernation image. NACK. You want to keep all partitions related to the hibernated system read-only. That's completely different from destroying all your unsafed data and possibly long-running tasks. -- Top 100 things you don't want the sysadmin to say: 51. YEEEHA!!! What a CRASH!!! Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Am Samstag, 28. April 2007 11:22 schrieb Pekka Enberg: Hi Oliver, Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. On 4/27/07, Oliver Neukum [EMAIL PROTECTED] wrote: The kernel can already do compression and encryption. Yes, if we all could agree on _which_ compression and encryption Any of those available in the kernel. Where's the problem? algorithm(s) we want to use. It goes beyond that too, where do you want to save the image? In the swap device or a regular file? And A swap device is doubtlessly easier. But isn't the problem of using a swap file already fixed? The writeout seems the easiest part of hibernation. don't forget about debuggability either. It's faster to do a snapshot/resume without shutdown/restart in the middle or just do a snapshot, and examine its contents. Then use a fake reboot option and save the image to a ramdisk. It isn't that hard. You must be able to survive that, as io errors during write out are possible. Regards Oliver - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Pavel Machek wrote: We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption You're really just making both of those up. If a kernel thread holds a lock and deadlocks suspend, that would deadlock anythign else _too_. Suspend isn't *that* special. Everything it does are things other people do too. And no, kernel threads do not write to disk on their own. Name one. They help *others* write to disk, but those disk writes need to happen. The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. So stop making these totally bogus arguments up. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 18:28, Linus Torvalds wrote: On Sat, 28 Apr 2007, Pavel Machek wrote: We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption You're really just making both of those up. If a kernel thread holds a lock and deadlocks suspend, that would deadlock anythign else _too_. Suspend isn't *that* special. Everything it does are things other people do too. And no, kernel threads do not write to disk on their own. Name one. xfssyncd , or at least it seems so at a quick look. They help *others* write to disk, but those disk writes need to happen. The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. So stop making these totally bogus arguments up. Well, they may be bogus, but there's something else. I have reviewed some kernel threads used by device drivers that currently are frozen to see if it would be safe not to freeze them, and I'm worried. What, for example, if such a thread schedules a timeout and waits for something to happen (eg. the airo driver does something like this), but instead the hibernation/suspend happens and the device is frozen/suspended under it? Shouldn't the thread be notified by the driver's freeze/suspend callback? Moreover, what if after the restore the device is not present (for example, it may be a pcmcia card that the user has removed) and the thread is scheduled before the device's unfreeze callback has a chance to run? Shouldn't the thread check that the device is present? In that case it would have to be notified by someone that the check is necessary, but who can do that? Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Oliver Neukum wrote: Am Samstag, 28. April 2007 01:50 schrieb David Lang: 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) 6. shutdown system (throw away all userspace memory, no need to do graceful shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if needed) And then you'll have people wonder why the server which sent out all those files has no log entries. You'd have to selectively unfreeze user space, which is a cure worse than the desease. Simply throwing away user space work is a bug. And no, you cannot say that it'll be redone away, as you are throwing away accepted input, too. when you are doing a suspend-to-disk I disagree with you. whoever is doing the suspend knows what is going on, and they can decide what needs to be done. the only case where you have 'unexpected' work being thrown away is if you are suspending a network server, and the process of suspending it is going to cut all the network connections anyway so it's not a seamless process. In this case it's fair to let the sysadmin choose between loosing some logs or doing some other step to prevent this from happening (which could be to shutdown the network service, or load a iptables rule to block the service) however, most of the uses of suspend-to-disk are going to be single-user machines and in that case telling the user that anything that they do after issuing the suspend is going to be lost is a perfectly sane thing to do. and for that matter, if the snapshot is cheap enough, some people may choose to cron the snapshot portion of a suspend-to-disk evvery few min as a safety net for something going wrong. In this case they really do want all of userspace to keep working after the snapshot. David Lang
Re: Back to the future.
On Sat, 28 Apr 2007, Pavel Machek wrote: We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. remember that we are doing suspend-to-disk, after we do the snapshot we will be doing a shutdown. that should simplify the locking issues David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Nigel Cunningham wrote: Please, go apply that logic elsewhere, then cut out (or at least stop adding) support for users with less common needs in other areas. I fully acknowledge that most users have only one place to store their image and it's a swap device. But that doesn't mean one size fits all. I think to some extent that's part of the problem. Consider for a moment that a /dev/hibernate would be required, and that it must be (a) a disk, or (b) a partition, or (c) other devices in the future, like an nbd, USB flash or DVD. Don't have a device like that, then can't hibernate. Stop trying to be smart and use swap for two different things. Stop trying to have an interface between user space and kernel which does things not required to preserve the system. A progress indicator is not needed, power off is my progress indicator, and should be the sole valid end of a hibernate. A full image implies that you need to figure out what's not going to change while you're writing it and save that separately. At the moment, I'm treating most of the LRU contents as that list. If we're going to start trying to let every man and his dog run while we're trying to snapshot the system, that's not going to work anymore - or the logic will get a lot more complicated. Sorry. I never thought I'd say this, but I think you're being naive about how simple the process of snapshotting a system is. Hibernate is useful to avoid complex boot, it's useful as the UPS gets tired, and putting features in the process beyond saving the snap (possibly compressed and/or encrypted) just adds complexity. Put it all in the kernel and use /sys/power/state as the user interface. Stop oversolving the problem. No, that doesn't avoid other hard issues, but for the most part suspend2 has addressed them. -- Bill Davidsen [EMAIL PROTECTED] We have more to fear from the bungling of the incompetent than from the machinations of the wicked. - from Slashdot - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 20:32, David Lang wrote: On Sat, 28 Apr 2007, Pavel Machek wrote: We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. remember that we are doing suspend-to-disk, after we do the snapshot we will be doing a shutdown. that should simplify the locking issues That's assuming that we won't need to cancel the hibernation. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. Yes, and that's why we wanted to introduce the userland part. The problem with this approach, as it's turned out, is that the userland part must be a very specialized piece of software, really careful of what it's doing, mainly because of the inability to checkpoint filesystems. If we could checkpoint filesystems and were able to unfreeze the user space after creating the snapshot without the risk of corrupting filesystems in the restore phase, the userland part could be much simpler (even as simple as Linus suggested). this sounds like a really good argument for having a useable userspace running. we already have the LVM snapshot code in the kernel, so we have the pieces available to protect the filesystems, we just need to figure out how to put them togeather. (the simpliest way would be to make a new suspend package that required the user to use LVM so that snapshots are available, but this is also the most disruptive approach) David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: On Saturday, 28 April 2007 20:32, David Lang wrote: On Sat, 28 Apr 2007, Pavel Machek wrote: We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. remember that we are doing suspend-to-disk, after we do the snapshot we will be doing a shutdown. that should simplify the locking issues That's assuming that we won't need to cancel the hibernation. true, but if we cancel the hibernation then why are the locks an issue? they are appropriate for the system state. David Lang - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 20:43, David Lang wrote: On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. Yes, and that's why we wanted to introduce the userland part. The problem with this approach, as it's turned out, is that the userland part must be a very specialized piece of software, really careful of what it's doing, mainly because of the inability to checkpoint filesystems. If we could checkpoint filesystems and were able to unfreeze the user space after creating the snapshot without the risk of corrupting filesystems in the restore phase, the userland part could be much simpler (even as simple as Linus suggested). this sounds like a really good argument for having a useable userspace running. we already have the LVM snapshot code in the kernel, so we have the pieces available to protect the filesystems, we just need to figure out how to put them togeather. (the simpliest way would be to make a new suspend package that required the user to use LVM so that snapshots are available, but this is also the most disruptive approach) Yes. I personally know very little about the LVM snapshot code and I wasn't aware of its capabilities. If we can make it possible to run the user space safely after we've created the memory snapshot, I'm all for it. As far as the package is concerned, we can just add the new user space tools to the suspend package containing our existing userland part. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Rafael, you really don't know what you're talking about, do you? Just _look_ at them. It's the IO threads etc that shouldn't be frozen, exactly *because* they do IO. You claim that kernel threads shouldn't do IO, but that's the point: if you cannot do IO when snapshotting to disk, here's a damn big clue for you: how do you think that snapshot is going to get written? I *guarantee* you that we've had a lot more problems with threads that should *not* have been frozen than with those hypothetical threads that you think should have been frozen. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Saturday, 28 April 2007 23:25, Linus Torvalds wrote: On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Well, I don't know why exactly it had been originally introduced. Currently, it is used by the threads that should be running after the snapshot is done (they are not only I/O threads). Rafael, you really don't know what you're talking about, do you? I think I know. Just _look_ at them. It's the IO threads etc that shouldn't be frozen, exactly *because* they do IO. You claim that kernel threads shouldn't do IO, but that's the point: if you cannot do IO when snapshotting to disk, here's a damn big clue for you: how do you think that snapshot is going to get written? OK, more precisely: fs-related threads should not try to process their queues, etc., after the snapshot is done, because that may cause some fs data to be written at that time and then the fs in question may be corrupted after the restore. Not all of the I/O in general, fs data. Still, that alone probably is not a good enough reason for freezing all kernel threads. I *guarantee* you that we've had a lot more problems with threads that should *not* have been frozen than with those hypothetical threads that you think should have been frozen. Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. I just see potential problems that I've mentioned in the previous message and I don't see any evidence that they cannot occur. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: OK, more precisely: fs-related threads should not try to process their queues, etc., after the snapshot is done, because that may cause some fs data to be written at that time and then the fs in question may be corrupted after the restore. Not all of the I/O in general, fs data. But that's not true _either_. That's only true because right now I think we cannot even suspend to a swapfile (I might be wrong). If you have a swapfile on a filesystem, you'd need those fs queues running! Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? And apparently never had any issues with it? And probably worked more reliably several years ago than suspend/hibernation does _today_? Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Back to the future.
Hi. On Sat, 2007-04-28 at 16:45 -0700, Linus Torvalds wrote: On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: OK, more precisely: fs-related threads should not try to process their queues, etc., after the snapshot is done, because that may cause some fs data to be written at that time and then the fs in question may be corrupted after the restore. Not all of the I/O in general, fs data. But that's not true _either_. That's only true because right now I think we cannot even suspend to a swapfile (I might be wrong). If you have a swapfile on a filesystem, you'd need those fs queues running! For Suspend2, and I think for swsusp too, we bmap the locations when allocating the storage, and then submit our own bios. Even if swsusp isn't using this method, I'm pretty sure the swap code does bmapping at swapon time to avoid raciness later. Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? And apparently never had any issues with it? And probably worked more reliably several years ago than suspend/hibernation does _today_? Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. It also came because of problems. Not working perfectly isn't necessarily a sign of a faulty reason for being added in the first place. I should also add, not freezing things is fine if you're happy with getting half an image at most. If you want a full just-as-if-I'd-never-turned-the-power-off image, you need freezing so that you can have some pages which can be saved before others are atomically copied, to ensure the whole image is consistent. Nigel signature.asc Description: This is a digitally signed message part
Re: Back to the future.
On Apr 28, 2007, at 19:45:01, Linus Torvalds wrote: On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? And apparently never had any issues with it? And probably worked more reliably several years ago than suspend/hibernation does _today_? Still works pretty reliably; the last time my PowerBook G4 was rebooted was 6 weeks ago. Once every 60 suspends or so the kernel USB driver gets really confused and doesn't wake up the USB controller properly, leading to dead keyboard/mouse, but other than that I never have problems. I wouldn't be surprised if I could comment out 90% of the suspend code and still have it work, the hardware in is is incredibly robust. I can even swap batteries while it's in suspend-to-RAM, as long as I do it in less than 45 sec or so; I get around 6-7 days of suspend-to-RAM time on a full charge. Cheers, Kyle Moffett - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/