Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri 2015-07-10 01:22:55, Rafael J. Wysocki wrote: > On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote: > > On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote: > > > > > Nothing and I'm not discussing that (I've said that already at least once > > > in > > > this thread). > > > > > > What I'm questioning is the "why" of calling sys_sync() from the kernel. > > > > That's strictly speaking two questions > > > > 1. Why do it in the kernel > > > > That is easy. It closes the window of a race condition. > > > > 2. Why do it at all > > > > In essence because the system becomes inactive. For example we say that > > data hits the disk after 30s maximum. We cannot meet such a limit unless > > we sync. > > Absolute deadlines are not guaranteed to be met at all in general when > system suspend is used, at least from the user space perspective, so the > above is quite a bit of an overstretch. Well, this particular deadline _was_ guaranteed before Len's patch, and it is useful one, too: "if notebook is not on, it is ok to pull the USB stick". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri 2015-07-10 01:22:55, Rafael J. Wysocki wrote: On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote: On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote: Nothing and I'm not discussing that (I've said that already at least once in this thread). What I'm questioning is the why of calling sys_sync() from the kernel. That's strictly speaking two questions 1. Why do it in the kernel That is easy. It closes the window of a race condition. 2. Why do it at all In essence because the system becomes inactive. For example we say that data hits the disk after 30s maximum. We cannot meet such a limit unless we sync. Absolute deadlines are not guaranteed to be met at all in general when system suspend is used, at least from the user space perspective, so the above is quite a bit of an overstretch. Well, this particular deadline _was_ guaranteed before Len's patch, and it is useful one, too: if notebook is not on, it is ok to pull the USB stick. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote: > On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote: > > > Nothing and I'm not discussing that (I've said that already at least once in > > this thread). > > > > What I'm questioning is the "why" of calling sys_sync() from the kernel. > > That's strictly speaking two questions > > 1. Why do it in the kernel > > That is easy. It closes the window of a race condition. > > 2. Why do it at all > > In essence because the system becomes inactive. For example we say that > data hits the disk after 30s maximum. We cannot meet such a limit unless > we sync. Absolute deadlines are not guaranteed to be met at all in general when system suspend is used, at least from the user space perspective, so the above is quite a bit of an overstretch. > There are additional issues, such as a system appearing inactive during > suspension and frankly the far greater likelihood of a crash. I prefer the Alan's point, to be honest: it's a tradeoff between some extra safety and some extra suspend overhead. Unfortunately, though, we don't have any numbers allowing us to estimate how much extra safety it provides and so to justify it quantitatively, so to speak. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote: > Nothing and I'm not discussing that (I've said that already at least once in > this thread). > > What I'm questioning is the "why" of calling sys_sync() from the kernel. That's strictly speaking two questions 1. Why do it in the kernel That is easy. It closes the window of a race condition. 2. Why do it at all In essence because the system becomes inactive. For example we say that data hits the disk after 30s maximum. We cannot meet such a limit unless we sync. There are additional issues, such as a system appearing inactive during suspension and frankly the far greater likelihood of a crash. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote: Nothing and I'm not discussing that (I've said that already at least once in this thread). What I'm questioning is the why of calling sys_sync() from the kernel. That's strictly speaking two questions 1. Why do it in the kernel That is easy. It closes the window of a race condition. 2. Why do it at all In essence because the system becomes inactive. For example we say that data hits the disk after 30s maximum. We cannot meet such a limit unless we sync. There are additional issues, such as a system appearing inactive during suspension and frankly the far greater likelihood of a crash. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote: On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote: Nothing and I'm not discussing that (I've said that already at least once in this thread). What I'm questioning is the why of calling sys_sync() from the kernel. That's strictly speaking two questions 1. Why do it in the kernel That is easy. It closes the window of a race condition. 2. Why do it at all In essence because the system becomes inactive. For example we say that data hits the disk after 30s maximum. We cannot meet such a limit unless we sync. Absolute deadlines are not guaranteed to be met at all in general when system suspend is used, at least from the user space perspective, so the above is quite a bit of an overstretch. There are additional issues, such as a system appearing inactive during suspension and frankly the far greater likelihood of a crash. I prefer the Alan's point, to be honest: it's a tradeoff between some extra safety and some extra suspend overhead. Unfortunately, though, we don't have any numbers allowing us to estimate how much extra safety it provides and so to justify it quantitatively, so to speak. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wednesday, July 08, 2015 10:40:00 AM Alan Stern wrote: > On Wed, 8 Jul 2015, Pavel Machek wrote: > > > > well, that depends on what the purpose of the sync is supposed to be. > > > > > > If it is there to prevent users from corrupting their filesystems as a > > > result > > > of a mistake, it is insufficient. If it's there for other reasons, I'm > > > wondering > > > what those reasons are (on systems that suspend and resume reliably, > > > because the > > > original reason to put it in there was to reduce the damage from > > > suspend/resume > > > crashes). > > > > I put it there, and there were more reasons than "crashes" to put it > > there. > > > > 1) crashes. > > > > 2) battery is quite likely to run out in suspended machine. > > > > 3) if someone pulls the stick and puts it in other machine, I wanted > > consistent filesystem at least after journal replay. > > I was going to make the same points. > > From my point of view, whether to issue a sync is a tradeoff. I can't > remember any time in the last several years where lack of a sync would > have caused a problem for my computers, but the possibility still > exists. > > So on one hand, issuing the sync can help prevent a low-probability > problem. On the other hand, issuing the sync takes a small amount of > time (negligible for my purposes but not negligible for Len and > others). > > I prefer to pay a very small cost to prevent a low-probability problem. > Others may not want to pay, because to them the cost is larger or the > probability is lower. > > _That_ is the justification for not eliminating the sync completely but > making it optional. Agreed. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wednesday, July 08, 2015 09:51:13 AM Oliver Neukum wrote: > On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote: > > On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote: > > > > > That is a tough nut. But that's not a reason to make it worse. > > > I'd say there's no reason not to use a secondary interface to > > > suspend without syncing or to extend or introduce such an interface > > > if the API is deficient. > > > > Well, the point here is that the sync we have doesn't prevent all > > potentially > > possible bad things from happening. It's a partial measure at best in that > > respect. > > Well, removed hardware doesn't work. That is a very basic limitation. > But can we guarantee that any returned syscall actually wrote to disk? > Yes, but it must be done in kernel space. So doing a sync has a true > benefit. > I don't see why you would want to generally remove it. What is wrong > with an interface allowing a selection there to those who don't care > about additional guarantees? Nothing and I'm not discussing that (I've said that already at least once in this thread). What I'm questioning is the "why" of calling sys_sync() from the kernel. If we had a good answer to why we do that to start with, the whole discussion wouldn't be necessary. So the answer I'm getting from this thread so far is something like "It is a safety measure to prevent users from losing data if they pull removable storage out of the system while suspended". Your point about the returned syscall guaranees is a good one, but still. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed, 8 Jul 2015, Pavel Machek wrote: > > well, that depends on what the purpose of the sync is supposed to be. > > > > If it is there to prevent users from corrupting their filesystems as a > > result > > of a mistake, it is insufficient. If it's there for other reasons, I'm > > wondering > > what those reasons are (on systems that suspend and resume reliably, > > because the > > original reason to put it in there was to reduce the damage from > > suspend/resume > > crashes). > > I put it there, and there were more reasons than "crashes" to put it > there. > > 1) crashes. > > 2) battery is quite likely to run out in suspended machine. > > 3) if someone pulls the stick and puts it in other machine, I wanted > consistent filesystem at least after journal replay. I was going to make the same points. >From my point of view, whether to issue a sync is a tradeoff. I can't remember any time in the last several years where lack of a sync would have caused a problem for my computers, but the possibility still exists. So on one hand, issuing the sync can help prevent a low-probability problem. On the other hand, issuing the sync takes a small amount of time (negligible for my purposes but not negligible for Len and others). I prefer to pay a very small cost to prevent a low-probability problem. Others may not want to pay, because to them the cost is larger or the probability is lower. _That_ is the justification for not eliminating the sync completely but making it optional. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed 2015-07-08 00:20:05, Rafael J. Wysocki wrote: > On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote: > > On Tue, 7 Jul 2015, Oliver Neukum wrote: > > > > > > he (or she) pulls the storage device out of the system, moves it to > > > > another > > > > system, makes changes (say removes the file written to by the process > > > > above, > > > > so the blocks previously occupied by that file are now used for some > > > > metadata) > > > > and moves the storage back to the suspended system. The system is > > > > resumed > > > > and the writing process continues writing possibly to the wrong blocks > > > > and > > > > corrupts the filesystem. > > > > > > That is a tough nut. But that's not a reason to make it worse. > > > I'd say there's no reason not to use a secondary interface to > > > suspend without syncing or to extend or introduce such an interface > > > if the API is deficient. > > > > Indeed, the problem Rafael outlined always exists whether or not the > > kernel does a sync. Even if no I/O is in progress when the system goes > > to sleep, if the user moves a portable storage device with a mounted > > filesystem to another computer and updates it before waking the system > > up, corruption is highly likely. > > > > In principle this could be solved by adding suspend/resume callbacks to > > filesystems. For example, the resume callback could verify that the > > superblock had not been changed since the suspend occurred. Or there > > could be some other simple way of determining that the filesystem had > > not been remounted and changed. > > > > Either way, this is irrelevant to the question of whether the kernel > > should issue a sync when suspending. > > well, that depends on what the purpose of the sync is supposed to be. > > If it is there to prevent users from corrupting their filesystems as a result > of a mistake, it is insufficient. If it's there for other reasons, I'm > wondering > what those reasons are (on systems that suspend and resume reliably, because > the > original reason to put it in there was to reduce the damage from > suspend/resume > crashes). I put it there, and there were more reasons than "crashes" to put it there. 1) crashes. 2) battery is quite likely to run out in suspended machine. 3) if someone pulls the stick and puts it in other machine, I wanted consistent filesystem at least after journal replay. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue 2015-07-07 16:32:08, Rafael J. Wysocki wrote: > On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: > > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: > > > For example, on desktop systems I use user space syncs filesystems > > > before > > > writing to /sys/power/state, so the additional sys_sync() in the > > > kernel doesn't > > > seem to serve any purpose. > > > > There is a race you cannot close in user space. > > Yes, there is, but I'm not sure how much of a help the sync in the kernel > provides here anyway. > > Say this happens. There is a process writing to a file running in parallel > with the suspend process. Suspend starts and that process is frozen. The > sync is called and causes all of the outstanding data to be written back. > The user doesn't realize that the write is technically still in progress, so > he (or she) pulls the storage device out of the system, moves it to another > system, makes changes (say removes the file written to by the process above, > so the blocks previously occupied by that file are now used for some metadata) > and moves the storage back to the suspended system. The system is resumed > and the writing process continues writing possibly to the wrong blocks and > corrupts the filesystem. > > Is this possible? If not, why not? Of course it is possible; don't do it. Some warnings, first. * BIG FAT WARNING * * * If you touch anything on disk between suspend and resume... * ...kiss your data goodbye. Correct option is to set up machine so that USB unplug awakes it. Or assume USB was always removed during suspend (we actually have an option for that). Still, if you don't see difference from pulling USB from running machine, and from "off" machine.. I do. And most users I know do. And some of my machines don't indicate whether they are "off" or "sleeping". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote: > On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote: > > > That is a tough nut. But that's not a reason to make it worse. > > I'd say there's no reason not to use a secondary interface to > > suspend without syncing or to extend or introduce such an interface > > if the API is deficient. > > Well, the point here is that the sync we have doesn't prevent all potentially > possible bad things from happening. It's a partial measure at best in that > respect. Well, removed hardware doesn't work. That is a very basic limitation. But can we guarantee that any returned syscall actually wrote to disk? Yes, but it must be done in kernel space. So doing a sync has a true benefit. I don't see why you would want to generally remove it. What is wrong with an interface allowing a selection there to those who don't care about additional guarantees? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote: On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote: That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Well, the point here is that the sync we have doesn't prevent all potentially possible bad things from happening. It's a partial measure at best in that respect. Well, removed hardware doesn't work. That is a very basic limitation. But can we guarantee that any returned syscall actually wrote to disk? Yes, but it must be done in kernel space. So doing a sync has a true benefit. I don't see why you would want to generally remove it. What is wrong with an interface allowing a selection there to those who don't care about additional guarantees? Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue 2015-07-07 16:32:08, Rafael J. Wysocki wrote: On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. There is a race you cannot close in user space. Yes, there is, but I'm not sure how much of a help the sync in the kernel provides here anyway. Say this happens. There is a process writing to a file running in parallel with the suspend process. Suspend starts and that process is frozen. The sync is called and causes all of the outstanding data to be written back. The user doesn't realize that the write is technically still in progress, so he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. Is this possible? If not, why not? Of course it is possible; don't do it. Some warnings, first. * BIG FAT WARNING * * * If you touch anything on disk between suspend and resume... * ...kiss your data goodbye. Correct option is to set up machine so that USB unplug awakes it. Or assume USB was always removed during suspend (we actually have an option for that). Still, if you don't see difference from pulling USB from running machine, and from off machine.. I do. And most users I know do. And some of my machines don't indicate whether they are off or sleeping. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed 2015-07-08 00:20:05, Rafael J. Wysocki wrote: On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote: On Tue, 7 Jul 2015, Oliver Neukum wrote: he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Indeed, the problem Rafael outlined always exists whether or not the kernel does a sync. Even if no I/O is in progress when the system goes to sleep, if the user moves a portable storage device with a mounted filesystem to another computer and updates it before waking the system up, corruption is highly likely. In principle this could be solved by adding suspend/resume callbacks to filesystems. For example, the resume callback could verify that the superblock had not been changed since the suspend occurred. Or there could be some other simple way of determining that the filesystem had not been remounted and changed. Either way, this is irrelevant to the question of whether the kernel should issue a sync when suspending. well, that depends on what the purpose of the sync is supposed to be. If it is there to prevent users from corrupting their filesystems as a result of a mistake, it is insufficient. If it's there for other reasons, I'm wondering what those reasons are (on systems that suspend and resume reliably, because the original reason to put it in there was to reduce the damage from suspend/resume crashes). I put it there, and there were more reasons than crashes to put it there. 1) crashes. 2) battery is quite likely to run out in suspended machine. 3) if someone pulls the stick and puts it in other machine, I wanted consistent filesystem at least after journal replay. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed, 8 Jul 2015, Pavel Machek wrote: well, that depends on what the purpose of the sync is supposed to be. If it is there to prevent users from corrupting their filesystems as a result of a mistake, it is insufficient. If it's there for other reasons, I'm wondering what those reasons are (on systems that suspend and resume reliably, because the original reason to put it in there was to reduce the damage from suspend/resume crashes). I put it there, and there were more reasons than crashes to put it there. 1) crashes. 2) battery is quite likely to run out in suspended machine. 3) if someone pulls the stick and puts it in other machine, I wanted consistent filesystem at least after journal replay. I was going to make the same points. From my point of view, whether to issue a sync is a tradeoff. I can't remember any time in the last several years where lack of a sync would have caused a problem for my computers, but the possibility still exists. So on one hand, issuing the sync can help prevent a low-probability problem. On the other hand, issuing the sync takes a small amount of time (negligible for my purposes but not negligible for Len and others). I prefer to pay a very small cost to prevent a low-probability problem. Others may not want to pay, because to them the cost is larger or the probability is lower. _That_ is the justification for not eliminating the sync completely but making it optional. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wednesday, July 08, 2015 09:51:13 AM Oliver Neukum wrote: On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote: On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote: That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Well, the point here is that the sync we have doesn't prevent all potentially possible bad things from happening. It's a partial measure at best in that respect. Well, removed hardware doesn't work. That is a very basic limitation. But can we guarantee that any returned syscall actually wrote to disk? Yes, but it must be done in kernel space. So doing a sync has a true benefit. I don't see why you would want to generally remove it. What is wrong with an interface allowing a selection there to those who don't care about additional guarantees? Nothing and I'm not discussing that (I've said that already at least once in this thread). What I'm questioning is the why of calling sys_sync() from the kernel. If we had a good answer to why we do that to start with, the whole discussion wouldn't be necessary. So the answer I'm getting from this thread so far is something like It is a safety measure to prevent users from losing data if they pull removable storage out of the system while suspended. Your point about the returned syscall guaranees is a good one, but still. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wednesday, July 08, 2015 10:40:00 AM Alan Stern wrote: On Wed, 8 Jul 2015, Pavel Machek wrote: well, that depends on what the purpose of the sync is supposed to be. If it is there to prevent users from corrupting their filesystems as a result of a mistake, it is insufficient. If it's there for other reasons, I'm wondering what those reasons are (on systems that suspend and resume reliably, because the original reason to put it in there was to reduce the damage from suspend/resume crashes). I put it there, and there were more reasons than crashes to put it there. 1) crashes. 2) battery is quite likely to run out in suspended machine. 3) if someone pulls the stick and puts it in other machine, I wanted consistent filesystem at least after journal replay. I was going to make the same points. From my point of view, whether to issue a sync is a tradeoff. I can't remember any time in the last several years where lack of a sync would have caused a problem for my computers, but the possibility still exists. So on one hand, issuing the sync can help prevent a low-probability problem. On the other hand, issuing the sync takes a small amount of time (negligible for my purposes but not negligible for Len and others). I prefer to pay a very small cost to prevent a low-probability problem. Others may not want to pay, because to them the cost is larger or the probability is lower. _That_ is the justification for not eliminating the sync completely but making it optional. Agreed. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote: > On Tue, 7 Jul 2015, Oliver Neukum wrote: > > > > he (or she) pulls the storage device out of the system, moves it to > > > another > > > system, makes changes (say removes the file written to by the process > > > above, > > > so the blocks previously occupied by that file are now used for some > > > metadata) > > > and moves the storage back to the suspended system. The system is resumed > > > and the writing process continues writing possibly to the wrong blocks and > > > corrupts the filesystem. > > > > That is a tough nut. But that's not a reason to make it worse. > > I'd say there's no reason not to use a secondary interface to > > suspend without syncing or to extend or introduce such an interface > > if the API is deficient. > > Indeed, the problem Rafael outlined always exists whether or not the > kernel does a sync. Even if no I/O is in progress when the system goes > to sleep, if the user moves a portable storage device with a mounted > filesystem to another computer and updates it before waking the system > up, corruption is highly likely. > > In principle this could be solved by adding suspend/resume callbacks to > filesystems. For example, the resume callback could verify that the > superblock had not been changed since the suspend occurred. Or there > could be some other simple way of determining that the filesystem had > not been remounted and changed. > > Either way, this is irrelevant to the question of whether the kernel > should issue a sync when suspending. well, that depends on what the purpose of the sync is supposed to be. If it is there to prevent users from corrupting their filesystems as a result of a mistake, it is insufficient. If it's there for other reasons, I'm wondering what those reasons are (on systems that suspend and resume reliably, because the original reason to put it in there was to reduce the damage from suspend/resume crashes). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote: > On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote: > > On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: > > > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: > > > > For example, on desktop systems I use user space syncs filesystems > > > > before > > > > writing to /sys/power/state, so the additional sys_sync() in the > > > > kernel doesn't > > > > seem to serve any purpose. > > > > > > There is a race you cannot close in user space. > > > > Yes, there is, but I'm not sure how much of a help the sync in the kernel > > provides here anyway. > > > > Say this happens. There is a process writing to a file running in parallel > > with the suspend process. Suspend starts and that process is frozen. The > > sync is called and causes all of the outstanding data to be written back. > > The user doesn't realize that the write is technically still in progress, so > > Well, in that case the user never got the feedback that the write is > finished. That is a race that always exists, like sending SIGKILL to a > running task. > What you describe is in principle unsolvable every time under > any circumstances. > > > he (or she) pulls the storage device out of the system, moves it to another > > system, makes changes (say removes the file written to by the process above, > > so the blocks previously occupied by that file are now used for some > > metadata) > > and moves the storage back to the suspended system. The system is resumed > > and the writing process continues writing possibly to the wrong blocks and > > corrupts the filesystem. > > That is a tough nut. But that's not a reason to make it worse. > I'd say there's no reason not to use a secondary interface to > suspend without syncing or to extend or introduce such an interface > if the API is deficient. Well, the point here is that the sync we have doesn't prevent all potentially possible bad things from happening. It's a partial measure at best in that respect. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, 7 Jul 2015, Oliver Neukum wrote: > > he (or she) pulls the storage device out of the system, moves it to another > > system, makes changes (say removes the file written to by the process above, > > so the blocks previously occupied by that file are now used for some > > metadata) > > and moves the storage back to the suspended system. The system is resumed > > and the writing process continues writing possibly to the wrong blocks and > > corrupts the filesystem. > > That is a tough nut. But that's not a reason to make it worse. > I'd say there's no reason not to use a secondary interface to > suspend without syncing or to extend or introduce such an interface > if the API is deficient. Indeed, the problem Rafael outlined always exists whether or not the kernel does a sync. Even if no I/O is in progress when the system goes to sleep, if the user moves a portable storage device with a mounted filesystem to another computer and updates it before waking the system up, corruption is highly likely. In principle this could be solved by adding suspend/resume callbacks to filesystems. For example, the resume callback could verify that the superblock had not been changed since the suspend occurred. Or there could be some other simple way of determining that the filesystem had not been remounted and changed. Either way, this is irrelevant to the question of whether the kernel should issue a sync when suspending. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote: > On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: > > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: > > > For example, on desktop systems I use user space syncs filesystems > > > before > > > writing to /sys/power/state, so the additional sys_sync() in the > > > kernel doesn't > > > seem to serve any purpose. > > > > There is a race you cannot close in user space. > > Yes, there is, but I'm not sure how much of a help the sync in the kernel > provides here anyway. > > Say this happens. There is a process writing to a file running in parallel > with the suspend process. Suspend starts and that process is frozen. The > sync is called and causes all of the outstanding data to be written back. > The user doesn't realize that the write is technically still in progress, so Well, in that case the user never got the feedback that the write is finished. That is a race that always exists, like sending SIGKILL to a running task. What you describe is in principle unsolvable every time under any circumstances. > he (or she) pulls the storage device out of the system, moves it to another > system, makes changes (say removes the file written to by the process above, > so the blocks previously occupied by that file are now used for some metadata) > and moves the storage back to the suspended system. The system is resumed > and the writing process continues writing possibly to the wrong blocks and > corrupts the filesystem. That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: > > For example, on desktop systems I use user space syncs filesystems > > before > > writing to /sys/power/state, so the additional sys_sync() in the > > kernel doesn't > > seem to serve any purpose. > > There is a race you cannot close in user space. Yes, there is, but I'm not sure how much of a help the sync in the kernel provides here anyway. Say this happens. There is a process writing to a file running in parallel with the suspend process. Suspend starts and that process is frozen. The sync is called and causes all of the outstanding data to be written back. The user doesn't realize that the write is technically still in progress, so he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. Is this possible? If not, why not? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
At Tue, 7 Jul 2015 11:17:56 +1000, Dave Chinner wrote: > > > > > Moreover, question is if we really need to carry out the sync on *every* > > > > suspend even if it is not pointless overall. That shouldn't really be > > > > necessary if we suspend and resume often enough or if we resume only for > > > > a while and then suspend again. Maybe it should be rate limited somehow > > > > at least? > > > > > > If you suspend and resume frequently, then the cost of the sync > > > shoul dbe negliable because the amount of data dirtied between > > > resume/suspend shoul dbe negliable. hence my questions about where > > > sync is spending too much time, and whether we've actually fixed > > > those problems or not. If sync speed on clean filesystems is a > > > problem then we need to fix sync, not work around it. > > > > Well, say you never suspend, but you also only shut down the system when you > > need to replace the kernel on it. How often would you invoke global sync on > > that system? > > Never, because: > > - the kernel does background writeback of dirty data so you > don't need to run sync while the system is running; and > - unmount during shutdown runs sync_filesystem() internally > (just like sys_sync does) to ensure the filesystem is > clean and no data is lost. > > Seriously, stop being making ignorant arguments to justify removing > sys_sync(). *If* there's a problem sys_sync() we need to *fix it*, > not ignore it. This thread remembered me of an old problem I faced: suspend fails while copying a large file. So I tried again with 4.2-rc1. I put a slow USB stick with ext4, started copying a 4GB ISO file, and triggered the suspend. It failed. Then removed sys_sync(), and retested. It still failed. The log shows like: PM: Preparing system for sleep (freeze) Freezing user space processes ... (elapsed 0.002 seconds) done. Freezing remaining freezable tasks ... Freezing of tasks failed after 20.003 seconds (0 tasks refusing to freeze, wq_busy=1): Restarting kernel threads ... done. Restarting tasks ... done. So, removing sys_sync() will save a small amount of time, but it won't help always... Takashi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: > For example, on desktop systems I use user space syncs filesystems > before > writing to /sys/power/state, so the additional sys_sync() in the > kernel doesn't > seem to serve any purpose. There is a race you cannot close in user space. Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 12:25:07 PM Pavel Machek wrote: > On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote: > > On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote: > > > On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: > > > > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: > > > > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: > > > > > > > > > > > The only argument against dropping sys_sync() from the suspend code > > > > > > path > > > > > > I've seen in this thread that I entirely agree with is that it may > > > > > > lead to > > > > > > regressions, because we've done it practically forever and it may > > > > > > hide latent > > > > > > bugs somewhere in block drivers etc. Dropping it, though, is the > > > > > > only way > > > > > > to see those bugs, if any, and if we want to ever fix them, we need > > > > > > to see > > > > > > them. That's why I think that it may be a good idea to allow > > > > > > people to > > > > > > drop it if they are willing to accept some extra risk (via the > > > > > > kernel > > > > > > command line, for example). > > > > > > > > > > I'd be perfectly happy to have the sync selectable at runtime, one > > > > > way > > > > > or another. The three most reasonable options seem to be: > > > > > > > > > > kernel command line > > > > > > > > > > sysfs file > > > > > > > > > > sysctl setting > > > > > > > > > > The command line is less flexible (it can't be changed after > > > > > booting). > > > > > Either of the other two would be fine with me. > > > > > > > > We'll probably use a sysfs file (possibly plus a Kconfig option to set > > > > the > > > > boot time default). > > > > > > Android people can already do sync-less s2ram using existing > > > interface. IMO they should just do it. > > > > > > In any case, sysfs file + Kconfig is an overkill. We already have too > > > many Kconfig options. > > > > I don't think we can reach a general agreement on what's the *right* > > approach > > with respect to the sys_sync() in the suspend code path, so the only way out > > of this situation I can see is to make it configurable. > > So first: not having general agreement does not mean we should > introduce Kconfig + sysfs file. Second: your proposal of "lets sync if > runtime was shorter than xxx" is over complex, but at least should not > need Kconfig support... The Kconfig part is only about setting the default at compile time, so I'm not sure why you have any problems with it. > Third: we have ioctl() based interface, and I guess android should use that > one; it already has "s2ram without sync" method. And that is tied to hibernation and therefore useless to people who don't build that in. > > > There's not a single Android phone supported by mainline > > > kernel. I'm sure they have bigger problems than Android setting > > > default sysfs values... > > > > But perhaps we'd like to change that? > > We'd like to, but lets start with the real hard stuff (merging support > for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks > that would be 1-line change, but we pollute code with Kconfig+sysfs > making it 100.. It is suboptimal, but this is the only way I can see allowing us to make any forward progress here. Evidently, some people are not happy with the status quo and they also should be able to cover their use cases with the mainline kernel, shouldn't they? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 11:17:56 AM Dave Chinner wrote: > On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote: > > On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote: > > > On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: > > > > > No, your observation was that "sync is slow". Your *solution* is "we > > > > > need to remove sync". > > > > > > > > Not only slow, but pointless too. The argument goes: "It is slow and > > > > pointless and so it may be dropped." > > > > > > > > Now, I can agree that it wasn't clearly demonstrated that the > > > > unconditional > > > > sys_sync() in the suspend code path was pointless, but it also has never > > > > been clearly shown why it is not pointless on systems that suspend and > > > > resume > > > > reliably. > > > > > > I just gave you an example of why sync is needed: Suspend, pull out > > > USB drive when putting laptop in bag. > > > > Exactly the same thing can happen while not suspended. What does make > > suspend > > special with respect to that? > > Stop being obtuse. Stop being rude. > If you remember that you need to pull the USB > stick before you suspend, then you damn well remember to "safely > eject" the USB stick and that syncs, unmounts and flushes the drive > caches before telling you it is safe to pull the stick. All of that only means "the kernel needs to protect users from consequences of silly mistakes" which I'm not sure I agree with. For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. > > > > Moreover, question is if we really need to carry out the sync on *every* > > > > suspend even if it is not pointless overall. That shouldn't really be > > > > necessary if we suspend and resume often enough or if we resume only for > > > > a while and then suspend again. Maybe it should be rate limited somehow > > > > at least? > > > > > > If you suspend and resume frequently, then the cost of the sync > > > shoul dbe negliable because the amount of data dirtied between > > > resume/suspend shoul dbe negliable. hence my questions about where > > > sync is spending too much time, and whether we've actually fixed > > > those problems or not. If sync speed on clean filesystems is a > > > problem then we need to fix sync, not work around it. > > > > Well, say you never suspend, but you also only shut down the system when you > > need to replace the kernel on it. How often would you invoke global sync on > > that system? > > Never, because: > > - the kernel does background writeback of dirty data so you > don't need to run sync while the system is running; and OK, and that surely happens between suspend-resume cycles too, so why does the sync in the suspend code path make a difference, really (except for avoiding the consequences of possible user mistakes)? > - unmount during shutdown runs sync_filesystem() internally > (just like sys_sync does) to ensure the filesystem is > clean and no data is lost. And that still will be the case regardless of whether or not suspend is used. > Seriously, stop being making ignorant arguments to justify removing > sys_sync(). *If* there's a problem sys_sync() we need to *fix it*, > not ignore it. I'm not advocating for dropping the sys_sync(). I'd just have applied the Len's patch had I thought that had been the way to go. I'm not agreeing with your argumentation about why the sync is needed, though (except for the one point that removing it might uncover some latent bugs). For now, I've seen two arguments: the one about the possible latent bug (that I agree with) and the one about how users may be hurt if they unplug storage devices holding unmounted filesystems while suspended (which may be addressed by a sync in user space just fine). Are there any other reasons? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote: > On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote: > > On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: > > > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: > > > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: > > > > > > > > > The only argument against dropping sys_sync() from the suspend code > > > > > path > > > > > I've seen in this thread that I entirely agree with is that it may > > > > > lead to > > > > > regressions, because we've done it practically forever and it may > > > > > hide latent > > > > > bugs somewhere in block drivers etc. Dropping it, though, is the > > > > > only way > > > > > to see those bugs, if any, and if we want to ever fix them, we need > > > > > to see > > > > > them. That's why I think that it may be a good idea to allow people > > > > > to > > > > > drop it if they are willing to accept some extra risk (via the kernel > > > > > command line, for example). > > > > > > > > I'd be perfectly happy to have the sync selectable at runtime, one way > > > > or another. The three most reasonable options seem to be: > > > > > > > > kernel command line > > > > > > > > sysfs file > > > > > > > > sysctl setting > > > > > > > > The command line is less flexible (it can't be changed after booting). > > > > Either of the other two would be fine with me. > > > > > > We'll probably use a sysfs file (possibly plus a Kconfig option to set the > > > boot time default). > > > > Android people can already do sync-less s2ram using existing > > interface. IMO they should just do it. > > > > In any case, sysfs file + Kconfig is an overkill. We already have too > > many Kconfig options. > > I don't think we can reach a general agreement on what's the *right* approach > with respect to the sys_sync() in the suspend code path, so the only way out > of this situation I can see is to make it configurable. So first: not having general agreement does not mean we should introduce Kconfig + sysfs file. Second: your proposal of "lets sync if runtime was shorter than xxx" is over complex, but at least should not need Kconfig support... Third: we have ioctl() based interface, and I guess android should use that one; it already has "s2ram without sync" method. > > There's not a single Android phone supported by mainline > > kernel. I'm sure they have bigger problems than Android setting > > default sysfs values... > > But perhaps we'd like to change that? We'd like to, but lets start with the real hard stuff (merging support for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks that would be 1-line change, but we pollute code with Kconfig+sysfs making it 100.. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote: On Tue, 7 Jul 2015, Oliver Neukum wrote: he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Indeed, the problem Rafael outlined always exists whether or not the kernel does a sync. Even if no I/O is in progress when the system goes to sleep, if the user moves a portable storage device with a mounted filesystem to another computer and updates it before waking the system up, corruption is highly likely. In principle this could be solved by adding suspend/resume callbacks to filesystems. For example, the resume callback could verify that the superblock had not been changed since the suspend occurred. Or there could be some other simple way of determining that the filesystem had not been remounted and changed. Either way, this is irrelevant to the question of whether the kernel should issue a sync when suspending. well, that depends on what the purpose of the sync is supposed to be. If it is there to prevent users from corrupting their filesystems as a result of a mistake, it is insufficient. If it's there for other reasons, I'm wondering what those reasons are (on systems that suspend and resume reliably, because the original reason to put it in there was to reduce the damage from suspend/resume crashes). Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote: On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote: On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. There is a race you cannot close in user space. Yes, there is, but I'm not sure how much of a help the sync in the kernel provides here anyway. Say this happens. There is a process writing to a file running in parallel with the suspend process. Suspend starts and that process is frozen. The sync is called and causes all of the outstanding data to be written back. The user doesn't realize that the write is technically still in progress, so Well, in that case the user never got the feedback that the write is finished. That is a race that always exists, like sending SIGKILL to a running task. What you describe is in principle unsolvable every time under any circumstances. he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Well, the point here is that the sync we have doesn't prevent all potentially possible bad things from happening. It's a partial measure at best in that respect. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 11:17:56 AM Dave Chinner wrote: On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote: On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote: On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: No, your observation was that sync is slow. Your *solution* is we need to remove sync. Not only slow, but pointless too. The argument goes: It is slow and pointless and so it may be dropped. Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. I just gave you an example of why sync is needed: Suspend, pull out USB drive when putting laptop in bag. Exactly the same thing can happen while not suspended. What does make suspend special with respect to that? Stop being obtuse. Stop being rude. If you remember that you need to pull the USB stick before you suspend, then you damn well remember to safely eject the USB stick and that syncs, unmounts and flushes the drive caches before telling you it is safe to pull the stick. All of that only means the kernel needs to protect users from consequences of silly mistakes which I'm not sure I agree with. For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. Well, say you never suspend, but you also only shut down the system when you need to replace the kernel on it. How often would you invoke global sync on that system? Never, because: - the kernel does background writeback of dirty data so you don't need to run sync while the system is running; and OK, and that surely happens between suspend-resume cycles too, so why does the sync in the suspend code path make a difference, really (except for avoiding the consequences of possible user mistakes)? - unmount during shutdown runs sync_filesystem() internally (just like sys_sync does) to ensure the filesystem is clean and no data is lost. And that still will be the case regardless of whether or not suspend is used. Seriously, stop being making ignorant arguments to justify removing sys_sync(). *If* there's a problem sys_sync() we need to *fix it*, not ignore it. I'm not advocating for dropping the sys_sync(). I'd just have applied the Len's patch had I thought that had been the way to go. I'm not agreeing with your argumentation about why the sync is needed, though (except for the one point that removing it might uncover some latent bugs). For now, I've seen two arguments: the one about the possible latent bug (that I agree with) and the one about how users may be hurt if they unplug storage devices holding unmounted filesystems while suspended (which may be addressed by a sync in user space just fine). Are there any other reasons? Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote: On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote: On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. We'll probably use a sysfs file (possibly plus a Kconfig option to set the boot time default). Android people can already do sync-less s2ram using existing interface. IMO they should just do it. In any case, sysfs file + Kconfig is an overkill. We already have too many Kconfig options. I don't think we can reach a general agreement on what's the *right* approach with respect to the sys_sync() in the suspend code path, so the only way out of this situation I can see is to make it configurable. So first: not having general agreement does not mean we should introduce Kconfig + sysfs file. Second: your proposal of lets sync if runtime was shorter than xxx is over complex, but at least should not need Kconfig support... Third: we have ioctl() based interface, and I guess android should use that one; it already has s2ram without sync method. There's not a single Android phone supported by mainline kernel. I'm sure they have bigger problems than Android setting default sysfs values... But perhaps we'd like to change that? We'd like to, but lets start with the real hard stuff (merging support for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks that would be 1-line change, but we pollute code with Kconfig+sysfs making it 100.. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 12:25:07 PM Pavel Machek wrote: On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote: On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote: On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. We'll probably use a sysfs file (possibly plus a Kconfig option to set the boot time default). Android people can already do sync-less s2ram using existing interface. IMO they should just do it. In any case, sysfs file + Kconfig is an overkill. We already have too many Kconfig options. I don't think we can reach a general agreement on what's the *right* approach with respect to the sys_sync() in the suspend code path, so the only way out of this situation I can see is to make it configurable. So first: not having general agreement does not mean we should introduce Kconfig + sysfs file. Second: your proposal of lets sync if runtime was shorter than xxx is over complex, but at least should not need Kconfig support... The Kconfig part is only about setting the default at compile time, so I'm not sure why you have any problems with it. Third: we have ioctl() based interface, and I guess android should use that one; it already has s2ram without sync method. And that is tied to hibernation and therefore useless to people who don't build that in. There's not a single Android phone supported by mainline kernel. I'm sure they have bigger problems than Android setting default sysfs values... But perhaps we'd like to change that? We'd like to, but lets start with the real hard stuff (merging support for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks that would be 1-line change, but we pollute code with Kconfig+sysfs making it 100.. It is suboptimal, but this is the only way I can see allowing us to make any forward progress here. Evidently, some people are not happy with the status quo and they also should be able to cover their use cases with the mainline kernel, shouldn't they? Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. There is a race you cannot close in user space. Yes, there is, but I'm not sure how much of a help the sync in the kernel provides here anyway. Say this happens. There is a process writing to a file running in parallel with the suspend process. Suspend starts and that process is frozen. The sync is called and causes all of the outstanding data to be written back. The user doesn't realize that the write is technically still in progress, so he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. Is this possible? If not, why not? Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
At Tue, 7 Jul 2015 11:17:56 +1000, Dave Chinner wrote: Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. Well, say you never suspend, but you also only shut down the system when you need to replace the kernel on it. How often would you invoke global sync on that system? Never, because: - the kernel does background writeback of dirty data so you don't need to run sync while the system is running; and - unmount during shutdown runs sync_filesystem() internally (just like sys_sync does) to ensure the filesystem is clean and no data is lost. Seriously, stop being making ignorant arguments to justify removing sys_sync(). *If* there's a problem sys_sync() we need to *fix it*, not ignore it. This thread remembered me of an old problem I faced: suspend fails while copying a large file. So I tried again with 4.2-rc1. I put a slow USB stick with ext4, started copying a 4GB ISO file, and triggered the suspend. It failed. Then removed sys_sync(), and retested. It still failed. The log shows like: PM: Preparing system for sleep (freeze) Freezing user space processes ... (elapsed 0.002 seconds) done. Freezing remaining freezable tasks ... Freezing of tasks failed after 20.003 seconds (0 tasks refusing to freeze, wq_busy=1): Restarting kernel threads ... done. Restarting tasks ... done. So, removing sys_sync() will save a small amount of time, but it won't help always... Takashi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. There is a race you cannot close in user space. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote: On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote: On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote: For example, on desktop systems I use user space syncs filesystems before writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't seem to serve any purpose. There is a race you cannot close in user space. Yes, there is, but I'm not sure how much of a help the sync in the kernel provides here anyway. Say this happens. There is a process writing to a file running in parallel with the suspend process. Suspend starts and that process is frozen. The sync is called and causes all of the outstanding data to be written back. The user doesn't realize that the write is technically still in progress, so Well, in that case the user never got the feedback that the write is finished. That is a race that always exists, like sending SIGKILL to a running task. What you describe is in principle unsolvable every time under any circumstances. he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, 7 Jul 2015, Oliver Neukum wrote: he (or she) pulls the storage device out of the system, moves it to another system, makes changes (say removes the file written to by the process above, so the blocks previously occupied by that file are now used for some metadata) and moves the storage back to the suspended system. The system is resumed and the writing process continues writing possibly to the wrong blocks and corrupts the filesystem. That is a tough nut. But that's not a reason to make it worse. I'd say there's no reason not to use a secondary interface to suspend without syncing or to extend or introduce such an interface if the API is deficient. Indeed, the problem Rafael outlined always exists whether or not the kernel does a sync. Even if no I/O is in progress when the system goes to sleep, if the user moves a portable storage device with a mounted filesystem to another computer and updates it before waking the system up, corruption is highly likely. In principle this could be solved by adding suspend/resume callbacks to filesystems. For example, the resume callback could verify that the superblock had not been changed since the suspend occurred. Or there could be some other simple way of determining that the filesystem had not been remounted and changed. Either way, this is irrelevant to the question of whether the kernel should issue a sync when suspending. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote: > On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote: > > On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: > > > > No, your observation was that "sync is slow". Your *solution* is "we > > > > need to remove sync". > > > > > > Not only slow, but pointless too. The argument goes: "It is slow and > > > pointless and so it may be dropped." > > > > > > Now, I can agree that it wasn't clearly demonstrated that the > > > unconditional > > > sys_sync() in the suspend code path was pointless, but it also has never > > > been clearly shown why it is not pointless on systems that suspend and > > > resume > > > reliably. > > > > I just gave you an example of why sync is needed: Suspend, pull out > > USB drive when putting laptop in bag. > > Exactly the same thing can happen while not suspended. What does make suspend > special with respect to that? Stop being obtuse. If you remember that you need to pull the USB stick before you suspend, then you damn well remember to "safely eject" the USB stick and that syncs, unmounts and flushes the drive caches before telling you it is safe to pull the stick. > > > Moreover, question is if we really need to carry out the sync on *every* > > > suspend even if it is not pointless overall. That shouldn't really be > > > necessary if we suspend and resume often enough or if we resume only for > > > a while and then suspend again. Maybe it should be rate limited somehow > > > at least? > > > > If you suspend and resume frequently, then the cost of the sync > > shoul dbe negliable because the amount of data dirtied between > > resume/suspend shoul dbe negliable. hence my questions about where > > sync is spending too much time, and whether we've actually fixed > > those problems or not. If sync speed on clean filesystems is a > > problem then we need to fix sync, not work around it. > > Well, say you never suspend, but you also only shut down the system when you > need to replace the kernel on it. How often would you invoke global sync on > that system? Never, because: - the kernel does background writeback of dirty data so you don't need to run sync while the system is running; and - unmount during shutdown runs sync_filesystem() internally (just like sys_sync does) to ensure the filesystem is clean and no data is lost. Seriously, stop being making ignorant arguments to justify removing sys_sync(). *If* there's a problem sys_sync() we need to *fix it*, not ignore it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote: > On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: > > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: > > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: > > > > > > > The only argument against dropping sys_sync() from the suspend code path > > > > I've seen in this thread that I entirely agree with is that it may lead > > > > to > > > > regressions, because we've done it practically forever and it may hide > > > > latent > > > > bugs somewhere in block drivers etc. Dropping it, though, is the only > > > > way > > > > to see those bugs, if any, and if we want to ever fix them, we need to > > > > see > > > > them. That's why I think that it may be a good idea to allow people to > > > > drop it if they are willing to accept some extra risk (via the kernel > > > > command line, for example). > > > > > > I'd be perfectly happy to have the sync selectable at runtime, one way > > > or another. The three most reasonable options seem to be: > > > > > > kernel command line > > > > > > sysfs file > > > > > > sysctl setting > > > > > > The command line is less flexible (it can't be changed after booting). > > > Either of the other two would be fine with me. > > > > We'll probably use a sysfs file (possibly plus a Kconfig option to set the > > boot time default). > > Android people can already do sync-less s2ram using existing > interface. IMO they should just do it. > > In any case, sysfs file + Kconfig is an overkill. We already have too > many Kconfig options. I don't think we can reach a general agreement on what's the *right* approach with respect to the sys_sync() in the suspend code path, so the only way out of this situation I can see is to make it configurable. > There's not a single Android phone supported by mainline > kernel. I'm sure they have bigger problems than Android setting > default sysfs values... But perhaps we'd like to change that? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote: > On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: > > On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: > > > On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: > > > > >> The _vast_ majority of systems using Linux suspend today are under > > > > >> an Android user-space. Android has no assumption that that suspend > > > > >> to > > > > >> mem will necessarily stay suspended for a long time. > > > > > > > > > > Indeed, however your change was not android-specific, and it is not > > > > > "comfortable" on x86-style hardware and usage patterns. > > > > > > > > "comfortable on x86-style and usage patterns"? > > > > If you mean "traditional" instead of "comfortable", > > > > where "tradition" is based on 10-year old systems, then sure. > > > > > > Even if this were true(*) we don't break things that currently work > > > just because something different is "just around the corner". e.g. > > > if you shut the lid on your laptop and it suspends to RAM, you can > > > pull the USB drive out that you just copied stuff to and plug it > > > into another machine and find all the data you copied there is > > > present. > > > > > > Remove the sync() from the freeze code, and this isn't guaranteed to > > > work anymore. It is now dependent on userspace implementations for > > > this to work, and we know what userspace developers will choose in > > > this situation. i.e. fast and "works for me", not "safe for > > > everyone". > > > > > > (*) Which it clearly isn't true because, as this example shows, my > > > shiny new laptop still has exactly the same data integrity > > > requirements as the laptop I was using 10 years ago. > > > > > > Just because there are lots of Android or Chrome out there it > > > doesn't mean we can just ignore the requirements of everything > > > else... > > > > > > > > That said, as long as x86 will still try to safeguard my data during > > > > > mem > > > > > sleep/resume as it does today, I have no strong feelings about > > > > > light/heavy-weight "mem" sleep being strictly a compile-time > > > > > selectable > > > > > thing, or a more flexible runtime-selectable behavior. > > > > > > > > The observation here is that the kernel should not force every system > > > > to sys_sync() on every suspend. The only question is how to best > > > > implement that. > > > > > > No, your observation was that "sync is slow". Your *solution* is "we > > > need to remove sync". > > > > Not only slow, but pointless too. The argument goes: "It is slow and > > pointless and so it may be dropped." > > > > Now, I can agree that it wasn't clearly demonstrated that the unconditional > > sys_sync() in the suspend code path was pointless, but it also has never > > been clearly shown why it is not pointless on systems that suspend and > > resume > > reliably. > > I just gave you an example of why sync is needed: Suspend, pull out > USB drive when putting laptop in bag. Exactly the same thing can happen while not suspended. What does make suspend special with respect to that? > > [The argument that the user can pull removable storage devices out of the > > system while suspended doesn't hold any water to me, because the user can > > pull them out of the system when not suspended just as well and cause the > > same kind of damage to happen.] > > How many times have you forgotten to pull a usb stick out of a > laptop before putting it away? I've done that several times in the > past few months, and so I've *personally* pulled the USB sticks out > of suspended machines. This is a *common occurrence* and it > currently works just fine, so changing sync behaviour is something > that will directly impact me *as a user*. > > Not to mention hybrid suspend (i.e write suspend image to disk, then > suspend to RAM for fast resume, but if power is lost resume from > disk image) both images have exactly the same filesystem state in > them and that is an absolute requirement for a hybrid suspend... For that you need to sync. Let's not confuse what we're talking about. The code path in question is suspend-only, no hibernation involved in any form. > > Moreover, question is if we really need to carry out the sync on *every* > > suspend even if it is not pointless overall. That shouldn't really be > > necessary if we suspend and resume often enough or if we resume only for > > a while and then suspend again. Maybe it should be rate limited somehow > > at least? > > If you suspend and resume frequently, then the cost of the sync > shoul dbe negliable because the amount of data dirtied between > resume/suspend shoul dbe negliable. hence my questions about where > sync is spending too much time, and whether we've actually fixed > those problems or not. If sync speed on clean filesystems is a > problem then we need to fix sync, not work around it. Well, say you never suspend, but you also only shut down the system when you
Re: [PATCH 1/1] suspend: delete sys_sync()
Hi! > > Moreover, question is if we really need to carry out the sync on *every* > > suspend even if it is not pointless overall. That shouldn't really be > > necessary if we suspend and resume often enough or if we resume only for > > a while and then suspend again. Maybe it should be rate limited somehow > > at least? > > If you suspend and resume frequently, then the cost of the sync > shoul dbe negliable because the amount of data dirtied between > resume/suspend shoul dbe negliable. hence my questions about where > sync is spending too much time, and whether we've actually fixed > those problems or not. If sync speed on clean filesystems is a > problem then we need to fix sync, not work around it. And yes, that's solution I'd really prefer over adding knobs to suspend. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: > > > > > The only argument against dropping sys_sync() from the suspend code path > > > I've seen in this thread that I entirely agree with is that it may lead to > > > regressions, because we've done it practically forever and it may hide > > > latent > > > bugs somewhere in block drivers etc. Dropping it, though, is the only way > > > to see those bugs, if any, and if we want to ever fix them, we need to see > > > them. That's why I think that it may be a good idea to allow people to > > > drop it if they are willing to accept some extra risk (via the kernel > > > command line, for example). > > > > I'd be perfectly happy to have the sync selectable at runtime, one way > > or another. The three most reasonable options seem to be: > > > > kernel command line > > > > sysfs file > > > > sysctl setting > > > > The command line is less flexible (it can't be changed after booting). > > Either of the other two would be fine with me. > > We'll probably use a sysfs file (possibly plus a Kconfig option to set the > boot time default). Android people can already do sync-less s2ram using existing interface. IMO they should just do it. In any case, sysfs file + Kconfig is an overkill. We already have too many Kconfig options. There's not a single Android phone supported by mainline kernel. I'm sure they have bigger problems than Android setting default sysfs values... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, Jul 4, 2015 at 9:03 AM, Rafael J. Wysocki wrote: > On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: >> On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: >> > >> The _vast_ majority of systems using Linux suspend today are under >> > >> an Android user-space. Android has no assumption that that suspend to >> > >> mem will necessarily stay suspended for a long time. >> > > >> > > Indeed, however your change was not android-specific, and it is not >> > > "comfortable" on x86-style hardware and usage patterns. >> > >> > "comfortable on x86-style and usage patterns"? >> > If you mean "traditional" instead of "comfortable", >> > where "tradition" is based on 10-year old systems, then sure. >> >> Even if this were true(*) we don't break things that currently work >> just because something different is "just around the corner". e.g. >> if you shut the lid on your laptop and it suspends to RAM, you can >> pull the USB drive out that you just copied stuff to and plug it >> into another machine and find all the data you copied there is >> present. >> >> Remove the sync() from the freeze code, and this isn't guaranteed to >> work anymore. It is now dependent on userspace implementations for >> this to work, and we know what userspace developers will choose in >> this situation. i.e. fast and "works for me", not "safe for >> everyone". >> >> (*) Which it clearly isn't true because, as this example shows, my >> shiny new laptop still has exactly the same data integrity >> requirements as the laptop I was using 10 years ago. >> >> Just because there are lots of Android or Chrome out there it >> doesn't mean we can just ignore the requirements of everything >> else... >> >> > > That said, as long as x86 will still try to safeguard my data during mem >> > > sleep/resume as it does today, I have no strong feelings about >> > > light/heavy-weight "mem" sleep being strictly a compile-time selectable >> > > thing, or a more flexible runtime-selectable behavior. >> > >> > The observation here is that the kernel should not force every system >> > to sys_sync() on every suspend. The only question is how to best >> > implement that. >> >> No, your observation was that "sync is slow". Your *solution* is "we >> need to remove sync". > > Not only slow, but pointless too. The argument goes: "It is slow and > pointless and so it may be dropped." > > Now, I can agree that it wasn't clearly demonstrated that the unconditional > sys_sync() in the suspend code path was pointless, but it also has never > been clearly shown why it is not pointless on systems that suspend and resume > reliably. > > [The argument that the user can pull removable storage devices out of the > system while suspended doesn't hold any water to me, because the user can > pull them out of the system when not suspended just as well and cause the > same kind of damage to happen.] I think common users do understand the difference between suspend and running, don't they? So they may not complain anyone if they plug off USB disk when system is running, but they will complain the patch and the regression if their data is lost because doing that during suspend. Not to mention, it is very common to plug off USB disk after suspend laptop. Also there are network based storage(iSCSI, NBD, ...), NFS/CIFS, and is it safe for these cases? -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Hi! > conditions, or s2disk fails because a kernel upgrade was done before > the s2disk and so can't be resumed. With your change, users lose all s2disk should be able to survive kernel change on x86-64 for long time now, and I have patches for x86, too. [But yes, I think sync() should be kept.] Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Hi! > > Understand, however, there are systems which suspend/resume reliably > > many times per second, making policy choice of having the kernel hard-code > > a sys_sync() into the suspend path a bad idea. > > My current view on that is that whether or not to do a sync() before > suspending > ultimately is a policy decision and should belong to user space as such > (modulo > the autosleep situation when user space may not know when the suspend is going > to happen). > > Moreover, user space is free to do as many sync()s before suspending as it > wants to and the question here is whether or not the *kernel* should sync() > in the suspend code path. sync()s from userspace do not work, as userspace is still running. sync() from kernel happens with tasks stopped. ... so it should really get consistent image on disk. And there are already interfaces that can s2ram without sync, just use uswsusp ioctls, not the sysfs writes. If you are doing multiple suspends per second, a) you are doing something wrong and b) you'd better use ioctl anyway. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Hi! Understand, however, there are systems which suspend/resume reliably many times per second, making policy choice of having the kernel hard-code a sys_sync() into the suspend path a bad idea. My current view on that is that whether or not to do a sync() before suspending ultimately is a policy decision and should belong to user space as such (modulo the autosleep situation when user space may not know when the suspend is going to happen). Moreover, user space is free to do as many sync()s before suspending as it wants to and the question here is whether or not the *kernel* should sync() in the suspend code path. sync()s from userspace do not work, as userspace is still running. sync() from kernel happens with tasks stopped. ... so it should really get consistent image on disk. And there are already interfaces that can s2ram without sync, just use uswsusp ioctls, not the sysfs writes. If you are doing multiple suspends per second, a) you are doing something wrong and b) you'd better use ioctl anyway. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Hi! conditions, or s2disk fails because a kernel upgrade was done before the s2disk and so can't be resumed. With your change, users lose all s2disk should be able to survive kernel change on x86-64 for long time now, and I have patches for x86, too. [But yes, I think sync() should be kept.] Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. We'll probably use a sysfs file (possibly plus a Kconfig option to set the boot time default). Android people can already do sync-less s2ram using existing interface. IMO they should just do it. In any case, sysfs file + Kconfig is an overkill. We already have too many Kconfig options. There's not a single Android phone supported by mainline kernel. I'm sure they have bigger problems than Android setting default sysfs values... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, Jul 4, 2015 at 9:03 AM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. comfortable on x86-style and usage patterns? If you mean traditional instead of comfortable, where tradition is based on 10-year old systems, then sure. Even if this were true(*) we don't break things that currently work just because something different is just around the corner. e.g. if you shut the lid on your laptop and it suspends to RAM, you can pull the USB drive out that you just copied stuff to and plug it into another machine and find all the data you copied there is present. Remove the sync() from the freeze code, and this isn't guaranteed to work anymore. It is now dependent on userspace implementations for this to work, and we know what userspace developers will choose in this situation. i.e. fast and works for me, not safe for everyone. (*) Which it clearly isn't true because, as this example shows, my shiny new laptop still has exactly the same data integrity requirements as the laptop I was using 10 years ago. Just because there are lots of Android or Chrome out there it doesn't mean we can just ignore the requirements of everything else... That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. No, your observation was that sync is slow. Your *solution* is we need to remove sync. Not only slow, but pointless too. The argument goes: It is slow and pointless and so it may be dropped. Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] I think common users do understand the difference between suspend and running, don't they? So they may not complain anyone if they plug off USB disk when system is running, but they will complain the patch and the regression if their data is lost because doing that during suspend. Not to mention, it is very common to plug off USB disk after suspend laptop. Also there are network based storage(iSCSI, NBD, ...), NFS/CIFS, and is it safe for these cases? -- Ming Lei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Hi! Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. And yes, that's solution I'd really prefer over adding knobs to suspend. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote: On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. comfortable on x86-style and usage patterns? If you mean traditional instead of comfortable, where tradition is based on 10-year old systems, then sure. Even if this were true(*) we don't break things that currently work just because something different is just around the corner. e.g. if you shut the lid on your laptop and it suspends to RAM, you can pull the USB drive out that you just copied stuff to and plug it into another machine and find all the data you copied there is present. Remove the sync() from the freeze code, and this isn't guaranteed to work anymore. It is now dependent on userspace implementations for this to work, and we know what userspace developers will choose in this situation. i.e. fast and works for me, not safe for everyone. (*) Which it clearly isn't true because, as this example shows, my shiny new laptop still has exactly the same data integrity requirements as the laptop I was using 10 years ago. Just because there are lots of Android or Chrome out there it doesn't mean we can just ignore the requirements of everything else... That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. No, your observation was that sync is slow. Your *solution* is we need to remove sync. Not only slow, but pointless too. The argument goes: It is slow and pointless and so it may be dropped. Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. I just gave you an example of why sync is needed: Suspend, pull out USB drive when putting laptop in bag. Exactly the same thing can happen while not suspended. What does make suspend special with respect to that? [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] How many times have you forgotten to pull a usb stick out of a laptop before putting it away? I've done that several times in the past few months, and so I've *personally* pulled the USB sticks out of suspended machines. This is a *common occurrence* and it currently works just fine, so changing sync behaviour is something that will directly impact me *as a user*. Not to mention hybrid suspend (i.e write suspend image to disk, then suspend to RAM for fast resume, but if power is lost resume from disk image) both images have exactly the same filesystem state in them and that is an absolute requirement for a hybrid suspend... For that you need to sync. Let's not confuse what we're talking about. The code path in question is suspend-only, no hibernation involved in any form. Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. Well, say you never suspend, but you also only shut down the system when you need to replace the kernel on it. How often would you invoke global sync on that system? Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH 1/1] suspend: delete sys_sync()
On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote: On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote: On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. We'll probably use a sysfs file (possibly plus a Kconfig option to set the boot time default). Android people can already do sync-less s2ram using existing interface. IMO they should just do it. In any case, sysfs file + Kconfig is an overkill. We already have too many Kconfig options. I don't think we can reach a general agreement on what's the *right* approach with respect to the sys_sync() in the suspend code path, so the only way out of this situation I can see is to make it configurable. There's not a single Android phone supported by mainline kernel. I'm sure they have bigger problems than Android setting default sysfs values... But perhaps we'd like to change that? Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote: On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote: On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: No, your observation was that sync is slow. Your *solution* is we need to remove sync. Not only slow, but pointless too. The argument goes: It is slow and pointless and so it may be dropped. Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. I just gave you an example of why sync is needed: Suspend, pull out USB drive when putting laptop in bag. Exactly the same thing can happen while not suspended. What does make suspend special with respect to that? Stop being obtuse. If you remember that you need to pull the USB stick before you suspend, then you damn well remember to safely eject the USB stick and that syncs, unmounts and flushes the drive caches before telling you it is safe to pull the stick. Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. Well, say you never suspend, but you also only shut down the system when you need to replace the kernel on it. How often would you invoke global sync on that system? Never, because: - the kernel does background writeback of dirty data so you don't need to run sync while the system is running; and - unmount during shutdown runs sync_filesystem() internally (just like sys_sync does) to ensure the filesystem is clean and no data is lost. Seriously, stop being making ignorant arguments to justify removing sys_sync(). *If* there's a problem sys_sync() we need to *fix it*, not ignore it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: > On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: > > On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: > > > >> The _vast_ majority of systems using Linux suspend today are under > > > >> an Android user-space. Android has no assumption that that suspend to > > > >> mem will necessarily stay suspended for a long time. > > > > > > > > Indeed, however your change was not android-specific, and it is not > > > > "comfortable" on x86-style hardware and usage patterns. > > > > > > "comfortable on x86-style and usage patterns"? > > > If you mean "traditional" instead of "comfortable", > > > where "tradition" is based on 10-year old systems, then sure. > > > > Even if this were true(*) we don't break things that currently work > > just because something different is "just around the corner". e.g. > > if you shut the lid on your laptop and it suspends to RAM, you can > > pull the USB drive out that you just copied stuff to and plug it > > into another machine and find all the data you copied there is > > present. > > > > Remove the sync() from the freeze code, and this isn't guaranteed to > > work anymore. It is now dependent on userspace implementations for > > this to work, and we know what userspace developers will choose in > > this situation. i.e. fast and "works for me", not "safe for > > everyone". > > > > (*) Which it clearly isn't true because, as this example shows, my > > shiny new laptop still has exactly the same data integrity > > requirements as the laptop I was using 10 years ago. > > > > Just because there are lots of Android or Chrome out there it > > doesn't mean we can just ignore the requirements of everything > > else... > > > > > > That said, as long as x86 will still try to safeguard my data during mem > > > > sleep/resume as it does today, I have no strong feelings about > > > > light/heavy-weight "mem" sleep being strictly a compile-time selectable > > > > thing, or a more flexible runtime-selectable behavior. > > > > > > The observation here is that the kernel should not force every system > > > to sys_sync() on every suspend. The only question is how to best > > > implement that. > > > > No, your observation was that "sync is slow". Your *solution* is "we > > need to remove sync". > > Not only slow, but pointless too. The argument goes: "It is slow and > pointless and so it may be dropped." > > Now, I can agree that it wasn't clearly demonstrated that the unconditional > sys_sync() in the suspend code path was pointless, but it also has never > been clearly shown why it is not pointless on systems that suspend and resume > reliably. I just gave you an example of why sync is needed: Suspend, pull out USB drive when putting laptop in bag. > [The argument that the user can pull removable storage devices out of the > system while suspended doesn't hold any water to me, because the user can > pull them out of the system when not suspended just as well and cause the > same kind of damage to happen.] How many times have you forgotten to pull a usb stick out of a laptop before putting it away? I've done that several times in the past few months, and so I've *personally* pulled the USB sticks out of suspended machines. This is a *common occurrence* and it currently works just fine, so changing sync behaviour is something that will directly impact me *as a user*. Not to mention hybrid suspend (i.e write suspend image to disk, then suspend to RAM for fast resume, but if power is lost resume from disk image) both images have exactly the same filesystem state in them and that is an absolute requirement for a hybrid suspend... > Moreover, question is if we really need to carry out the sync on *every* > suspend even if it is not pointless overall. That shouldn't really be > necessary if we suspend and resume often enough or if we resume only for > a while and then suspend again. Maybe it should be rate limited somehow > at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: > > > The only argument against dropping sys_sync() from the suspend code path > > I've seen in this thread that I entirely agree with is that it may lead to > > regressions, because we've done it practically forever and it may hide > > latent > > bugs somewhere in block drivers etc. Dropping it, though, is the only way > > to see those bugs, if any, and if we want to ever fix them, we need to see > > them. That's why I think that it may be a good idea to allow people to > > drop it if they are willing to accept some extra risk (via the kernel > > command line, for example). > > I'd be perfectly happy to have the sync selectable at runtime, one way > or another. The three most reasonable options seem to be: > > kernel command line > > sysfs file > > sysctl setting > > The command line is less flexible (it can't be changed after booting). > Either of the other two would be fine with me. We'll probably use a sysfs file (possibly plus a Kconfig option to set the boot time default). > > Moreover, question is if we really need to carry out the sync on *every* > > suspend even if it is not pointless overall. That shouldn't really be > > necessary if we suspend and resume often enough or if we resume only for > > a while and then suspend again. Maybe it should be rate limited somehow > > at least? > > For example, skip the sync if the system has been awake for < 100 ms? Yes, something like that. > The cutoff time could also be controlled by the sysfs file: -1 => > never sync, 0 => always sync, > 0 => sync if the system has been awake > longer than the value. That sounds like a good idea to me. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Saturday, July 04, 2015 10:50:36 AM Geert Uytterhoeven wrote: > On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki wrote: > > [The argument that the user can pull removable storage devices out of the > > system while suspended doesn't hold any water to me, because the user can > > pull them out of the system when not suspended just as well and cause the > > same kind of damage to happen.] > > The user may not notice the difference between a suspended and powered > off machine, and thus think it's safe to unplug the USB memory stick while > the machine appears off. > > If that's considered a valid use case, this would be a regression. Well, if user space configures the system to use suspend as a "power off flavor" and doesn't even try to ensure the filesystems to be synced before suspending (in that mode), that's on the edge of insanity. And it falls into the "latent bugs that may be there because we've always synced before suspending" category in my view. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki wrote: > [The argument that the user can pull removable storage devices out of the > system while suspended doesn't hold any water to me, because the user can > pull them out of the system when not suspended just as well and cause the > same kind of damage to happen.] The user may not notice the difference between a suspended and powered off machine, and thus think it's safe to unplug the USB memory stick while the machine appears off. If that's considered a valid use case, this would be a regression. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: > The only argument against dropping sys_sync() from the suspend code path > I've seen in this thread that I entirely agree with is that it may lead to > regressions, because we've done it practically forever and it may hide latent > bugs somewhere in block drivers etc. Dropping it, though, is the only way > to see those bugs, if any, and if we want to ever fix them, we need to see > them. That's why I think that it may be a good idea to allow people to > drop it if they are willing to accept some extra risk (via the kernel > command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. > Moreover, question is if we really need to carry out the sync on *every* > suspend even if it is not pointless overall. That shouldn't really be > necessary if we suspend and resume often enough or if we resume only for > a while and then suspend again. Maybe it should be rate limited somehow > at least? For example, skip the sync if the system has been awake for < 100 ms? The cutoff time could also be controlled by the sysfs file: -1 => never sync, 0 => always sync, > 0 => sync if the system has been awake longer than the value. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki r...@rjwysocki.net wrote: [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] The user may not notice the difference between a suspended and powered off machine, and thus think it's safe to unplug the USB memory stick while the machine appears off. If that's considered a valid use case, this would be a regression. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? For example, skip the sync if the system has been awake for 100 ms? The cutoff time could also be controlled by the sysfs file: -1 = never sync, 0 = always sync, 0 = sync if the system has been awake longer than the value. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Saturday, July 04, 2015 10:50:36 AM Geert Uytterhoeven wrote: On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki r...@rjwysocki.net wrote: [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] The user may not notice the difference between a suspended and powered off machine, and thus think it's safe to unplug the USB memory stick while the machine appears off. If that's considered a valid use case, this would be a regression. Well, if user space configures the system to use suspend as a power off flavor and doesn't even try to ensure the filesystems to be synced before suspending (in that mode), that's on the edge of insanity. And it falls into the latent bugs that may be there because we've always synced before suspending category in my view. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote: On Sat, 4 Jul 2015, Rafael J. Wysocki wrote: The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). I'd be perfectly happy to have the sync selectable at runtime, one way or another. The three most reasonable options seem to be: kernel command line sysfs file sysctl setting The command line is less flexible (it can't be changed after booting). Either of the other two would be fine with me. We'll probably use a sysfs file (possibly plus a Kconfig option to set the boot time default). Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? For example, skip the sync if the system has been awake for 100 ms? Yes, something like that. The cutoff time could also be controlled by the sysfs file: -1 = never sync, 0 = always sync, 0 = sync if the system has been awake longer than the value. That sounds like a good idea to me. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote: On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. comfortable on x86-style and usage patterns? If you mean traditional instead of comfortable, where tradition is based on 10-year old systems, then sure. Even if this were true(*) we don't break things that currently work just because something different is just around the corner. e.g. if you shut the lid on your laptop and it suspends to RAM, you can pull the USB drive out that you just copied stuff to and plug it into another machine and find all the data you copied there is present. Remove the sync() from the freeze code, and this isn't guaranteed to work anymore. It is now dependent on userspace implementations for this to work, and we know what userspace developers will choose in this situation. i.e. fast and works for me, not safe for everyone. (*) Which it clearly isn't true because, as this example shows, my shiny new laptop still has exactly the same data integrity requirements as the laptop I was using 10 years ago. Just because there are lots of Android or Chrome out there it doesn't mean we can just ignore the requirements of everything else... That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. No, your observation was that sync is slow. Your *solution* is we need to remove sync. Not only slow, but pointless too. The argument goes: It is slow and pointless and so it may be dropped. Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. I just gave you an example of why sync is needed: Suspend, pull out USB drive when putting laptop in bag. [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] How many times have you forgotten to pull a usb stick out of a laptop before putting it away? I've done that several times in the past few months, and so I've *personally* pulled the USB sticks out of suspended machines. This is a *common occurrence* and it currently works just fine, so changing sync behaviour is something that will directly impact me *as a user*. Not to mention hybrid suspend (i.e write suspend image to disk, then suspend to RAM for fast resume, but if power is lost resume from disk image) both images have exactly the same filesystem state in them and that is an absolute requirement for a hybrid suspend... Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? If you suspend and resume frequently, then the cost of the sync shoul dbe negliable because the amount of data dirtied between resume/suspend shoul dbe negliable. hence my questions about where sync is spending too much time, and whether we've actually fixed those problems or not. If sync speed on clean filesystems is a problem then we need to fix sync, not work around it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: > On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: > > >> The _vast_ majority of systems using Linux suspend today are under > > >> an Android user-space. Android has no assumption that that suspend to > > >> mem will necessarily stay suspended for a long time. > > > > > > Indeed, however your change was not android-specific, and it is not > > > "comfortable" on x86-style hardware and usage patterns. > > > > "comfortable on x86-style and usage patterns"? > > If you mean "traditional" instead of "comfortable", > > where "tradition" is based on 10-year old systems, then sure. > > Even if this were true(*) we don't break things that currently work > just because something different is "just around the corner". e.g. > if you shut the lid on your laptop and it suspends to RAM, you can > pull the USB drive out that you just copied stuff to and plug it > into another machine and find all the data you copied there is > present. > > Remove the sync() from the freeze code, and this isn't guaranteed to > work anymore. It is now dependent on userspace implementations for > this to work, and we know what userspace developers will choose in > this situation. i.e. fast and "works for me", not "safe for > everyone". > > (*) Which it clearly isn't true because, as this example shows, my > shiny new laptop still has exactly the same data integrity > requirements as the laptop I was using 10 years ago. > > Just because there are lots of Android or Chrome out there it > doesn't mean we can just ignore the requirements of everything > else... > > > > That said, as long as x86 will still try to safeguard my data during mem > > > sleep/resume as it does today, I have no strong feelings about > > > light/heavy-weight "mem" sleep being strictly a compile-time selectable > > > thing, or a more flexible runtime-selectable behavior. > > > > The observation here is that the kernel should not force every system > > to sys_sync() on every suspend. The only question is how to best > > implement that. > > No, your observation was that "sync is slow". Your *solution* is "we > need to remove sync". Not only slow, but pointless too. The argument goes: "It is slow and pointless and so it may be dropped." Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] When we were adding it, the thinking was along the lines of "Well, suspend isn't too reliable, so let's put sys_sync() in there to possibly reduce the damage from suspend/resume crashes and suspend is slow anyway, so the possible effect on performance from that shouldn't be noticeable". Clearly, the world has changed since then and suspend is far more reliable than it used to be in general and it is not that slow too at least on some systems (especially the suspend-to-idle flavor). The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote: On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. comfortable on x86-style and usage patterns? If you mean traditional instead of comfortable, where tradition is based on 10-year old systems, then sure. Even if this were true(*) we don't break things that currently work just because something different is just around the corner. e.g. if you shut the lid on your laptop and it suspends to RAM, you can pull the USB drive out that you just copied stuff to and plug it into another machine and find all the data you copied there is present. Remove the sync() from the freeze code, and this isn't guaranteed to work anymore. It is now dependent on userspace implementations for this to work, and we know what userspace developers will choose in this situation. i.e. fast and works for me, not safe for everyone. (*) Which it clearly isn't true because, as this example shows, my shiny new laptop still has exactly the same data integrity requirements as the laptop I was using 10 years ago. Just because there are lots of Android or Chrome out there it doesn't mean we can just ignore the requirements of everything else... That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. No, your observation was that sync is slow. Your *solution* is we need to remove sync. Not only slow, but pointless too. The argument goes: It is slow and pointless and so it may be dropped. Now, I can agree that it wasn't clearly demonstrated that the unconditional sys_sync() in the suspend code path was pointless, but it also has never been clearly shown why it is not pointless on systems that suspend and resume reliably. [The argument that the user can pull removable storage devices out of the system while suspended doesn't hold any water to me, because the user can pull them out of the system when not suspended just as well and cause the same kind of damage to happen.] When we were adding it, the thinking was along the lines of Well, suspend isn't too reliable, so let's put sys_sync() in there to possibly reduce the damage from suspend/resume crashes and suspend is slow anyway, so the possible effect on performance from that shouldn't be noticeable. Clearly, the world has changed since then and suspend is far more reliable than it used to be in general and it is not that slow too at least on some systems (especially the suspend-to-idle flavor). The only argument against dropping sys_sync() from the suspend code path I've seen in this thread that I entirely agree with is that it may lead to regressions, because we've done it practically forever and it may hide latent bugs somewhere in block drivers etc. Dropping it, though, is the only way to see those bugs, if any, and if we want to ever fix them, we need to see them. That's why I think that it may be a good idea to allow people to drop it if they are willing to accept some extra risk (via the kernel command line, for example). Moreover, question is if we really need to carry out the sync on *every* suspend even if it is not pointless overall. That shouldn't really be necessary if we suspend and resume often enough or if we resume only for a while and then suspend again. Maybe it should be rate limited somehow at least? Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: > >> The _vast_ majority of systems using Linux suspend today are under > >> an Android user-space. Android has no assumption that that suspend to > >> mem will necessarily stay suspended for a long time. > > > > Indeed, however your change was not android-specific, and it is not > > "comfortable" on x86-style hardware and usage patterns. > > "comfortable on x86-style and usage patterns"? > If you mean "traditional" instead of "comfortable", > where "tradition" is based on 10-year old systems, then sure. Even if this were true(*) we don't break things that currently work just because something different is "just around the corner". e.g. if you shut the lid on your laptop and it suspends to RAM, you can pull the USB drive out that you just copied stuff to and plug it into another machine and find all the data you copied there is present. Remove the sync() from the freeze code, and this isn't guaranteed to work anymore. It is now dependent on userspace implementations for this to work, and we know what userspace developers will choose in this situation. i.e. fast and "works for me", not "safe for everyone". (*) Which it clearly isn't true because, as this example shows, my shiny new laptop still has exactly the same data integrity requirements as the laptop I was using 10 years ago. Just because there are lots of Android or Chrome out there it doesn't mean we can just ignore the requirements of everything else... > > That said, as long as x86 will still try to safeguard my data during mem > > sleep/resume as it does today, I have no strong feelings about > > light/heavy-weight "mem" sleep being strictly a compile-time selectable > > thing, or a more flexible runtime-selectable behavior. > > The observation here is that the kernel should not force every system > to sys_sync() on every suspend. The only question is how to best > implement that. No, your observation was that "sync is slow". Your *solution* is "we need to remove sync". However, your arguement so far has these problems: - repeated sync from outside the suspend context is does not demonstrate the problem you are seeing during suspend, and you have not yet identified why this is the case. - it has been demonstrated that inode cache size plays a significant role in sync latency, but you haven't provided any information to tell us what the cache sizes were when you see large latencies. - it has been demonstrated that there are patches pending that improve clean filesystem sync speed, but you have not produced numbers to demonstrate that those patches do not meet your requirements. - In several tests your "sync latency" monitoring was dirtying the filesystem and hence *causing* the repeated syncs to be slow. - you have not told us whether your suspend monitoring was the cause of the suspend sync latency or not. - you have been testing on hardware with questionable power management behaviour. IOWs, you have not yet identified the root cause of the slow sync behaviour on suspend, you have not determined if pending work fixes the latency problems, and you have not reproduced your results after fixing the flaws in your testing methodology. > The obvious solution was to delete this forced policy > from the kernel, and let user-space handle it. > Rafael has not agreed to push that obvious, though less-than-gentle > solution upstream, and so I'll re-send the historic patch > that allows distros to still sync like it is 1999, if they want to:-) Please stop shouting about "obvious" solutions until you've actually proved there is a problem and that problems you find aren't already fixed by the pending sync changes Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote: The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. comfortable on x86-style and usage patterns? If you mean traditional instead of comfortable, where tradition is based on 10-year old systems, then sure. Even if this were true(*) we don't break things that currently work just because something different is just around the corner. e.g. if you shut the lid on your laptop and it suspends to RAM, you can pull the USB drive out that you just copied stuff to and plug it into another machine and find all the data you copied there is present. Remove the sync() from the freeze code, and this isn't guaranteed to work anymore. It is now dependent on userspace implementations for this to work, and we know what userspace developers will choose in this situation. i.e. fast and works for me, not safe for everyone. (*) Which it clearly isn't true because, as this example shows, my shiny new laptop still has exactly the same data integrity requirements as the laptop I was using 10 years ago. Just because there are lots of Android or Chrome out there it doesn't mean we can just ignore the requirements of everything else... That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. No, your observation was that sync is slow. Your *solution* is we need to remove sync. However, your arguement so far has these problems: - repeated sync from outside the suspend context is does not demonstrate the problem you are seeing during suspend, and you have not yet identified why this is the case. - it has been demonstrated that inode cache size plays a significant role in sync latency, but you haven't provided any information to tell us what the cache sizes were when you see large latencies. - it has been demonstrated that there are patches pending that improve clean filesystem sync speed, but you have not produced numbers to demonstrate that those patches do not meet your requirements. - In several tests your sync latency monitoring was dirtying the filesystem and hence *causing* the repeated syncs to be slow. - you have not told us whether your suspend monitoring was the cause of the suspend sync latency or not. - you have been testing on hardware with questionable power management behaviour. IOWs, you have not yet identified the root cause of the slow sync behaviour on suspend, you have not determined if pending work fixes the latency problems, and you have not reproduced your results after fixing the flaws in your testing methodology. The obvious solution was to delete this forced policy from the kernel, and let user-space handle it. Rafael has not agreed to push that obvious, though less-than-gentle solution upstream, and so I'll re-send the historic patch that allows distros to still sync like it is 1999, if they want to:-) Please stop shouting about obvious solutions until you've actually proved there is a problem and that problems you find aren't already fixed by the pending sync changes Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
>> The _vast_ majority of systems using Linux suspend today are under >> an Android user-space. Android has no assumption that that suspend to >> mem will necessarily stay suspended for a long time. > > Indeed, however your change was not android-specific, and it is not > "comfortable" on x86-style hardware and usage patterns. "comfortable on x86-style and usage patterns"? If you mean "traditional" instead of "comfortable", where "tradition" is based on 10-year old systems, then sure. But today, my x86 Android tablet is quite "comfortable" without a sys_sync() in the kernel suspend path. No, this isn't Android specific, Android is just the highest-volume demand, making it an obvious example. Chrome is the #1 selling "x86-style" clamshell laptop. Chrome is not only "comfortable" with fast suspend/resume, the Chrome developers demand it. > That said, as long as x86 will still try to safeguard my data during mem > sleep/resume as it does today, I have no strong feelings about > light/heavy-weight "mem" sleep being strictly a compile-time selectable > thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. The obvious solution was to delete this forced policy from the kernel, and let user-space handle it. Rafael has not agreed to push that obvious, though less-than-gentle solution upstream, and so I'll re-send the historic patch that allows distros to still sync like it is 1999, if they want to:-) thanks, Len Brown, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, Jun 30, 2015, at 17:04, Len Brown wrote: > > Entering "mem" suspend mode through sysfs currently has the implied meaning > > of "prepare the *entire* system to stay on a powered down state for > > pontentially a _long_ time", where long means "certainly more than 10 > > seconds" ;-) This is unlikely to be written anywhere, of course, that's just > > how it was used by the vast majority for years, at least on traditional > > server/desktop/laptop platforms such as x86. > > The _vast_ majority of systems using Linux suspend today are under > an Android user-space. Android has no assumption that that suspend to > mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not "comfortable" on x86-style hardware and usage patterns. > > IMO, we would actually benefit from *adding* new system-wide sleep/suspend > > modes that are optimized for oportunistic, short-lived system-wide sleep > > cycles (aka "catnap") that is fast to enter and exit from, and which will be > > triggered very frequently, instead of trying to change the assumptions and > > expected behavior of the current "deep-sleep" mode... > > Thank you for sharing your opinion. > > I am going to give up trying to change your mind, and those who share > your view. I plan to revive my patch from 2014 which > makes sys_sync() optional. That will not change the historic behavior, > and will still allow everybody to do what they want. > Rafael has said that he can live with the resulting kernel clutter. ... > BTW. the answer does not appear to be creating a new system sleep state. > Android invokes "mem", and they don't seem excited about teaching > user-space that runs on multiple platforms that what used to be a "mem" and > no > "freeze" could be a "mem" plus a "freeze", or a "freeze" and no "mem". Hmm, maybe we could: 1. Make the behavior of "mem" configurable (select default at compile time, allow it to be changed at runtime). 2. Add a way to always enter the "heavyweight" (x86-style) mem sleep in platforms where it exists. 3. Add a way to always enter the "light" (android-style) mem sleep in platforms where it exists. And make (2) and (3) optional (as in: you can compile out the clutter). That at least provides a way forward for userspace, at the price of more gunk on the kernel side. We already have a lot of stuff that works that way (idle and freq governors, io elevators, etc). That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight "mem" sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Tue, Jun 30, 2015, at 17:04, Len Brown wrote: Entering mem suspend mode through sysfs currently has the implied meaning of prepare the *entire* system to stay on a powered down state for pontentially a _long_ time, where long means certainly more than 10 seconds ;-) This is unlikely to be written anywhere, of course, that's just how it was used by the vast majority for years, at least on traditional server/desktop/laptop platforms such as x86. The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. IMO, we would actually benefit from *adding* new system-wide sleep/suspend modes that are optimized for oportunistic, short-lived system-wide sleep cycles (aka catnap) that is fast to enter and exit from, and which will be triggered very frequently, instead of trying to change the assumptions and expected behavior of the current deep-sleep mode... Thank you for sharing your opinion. I am going to give up trying to change your mind, and those who share your view. I plan to revive my patch from 2014 which makes sys_sync() optional. That will not change the historic behavior, and will still allow everybody to do what they want. Rafael has said that he can live with the resulting kernel clutter. ... BTW. the answer does not appear to be creating a new system sleep state. Android invokes mem, and they don't seem excited about teaching user-space that runs on multiple platforms that what used to be a mem and no freeze could be a mem plus a freeze, or a freeze and no mem. Hmm, maybe we could: 1. Make the behavior of mem configurable (select default at compile time, allow it to be changed at runtime). 2. Add a way to always enter the heavyweight (x86-style) mem sleep in platforms where it exists. 3. Add a way to always enter the light (android-style) mem sleep in platforms where it exists. And make (2) and (3) optional (as in: you can compile out the clutter). That at least provides a way forward for userspace, at the price of more gunk on the kernel side. We already have a lot of stuff that works that way (idle and freq governors, io elevators, etc). That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. Indeed, however your change was not android-specific, and it is not comfortable on x86-style hardware and usage patterns. comfortable on x86-style and usage patterns? If you mean traditional instead of comfortable, where tradition is based on 10-year old systems, then sure. But today, my x86 Android tablet is quite comfortable without a sys_sync() in the kernel suspend path. No, this isn't Android specific, Android is just the highest-volume demand, making it an obvious example. Chrome is the #1 selling x86-style clamshell laptop. Chrome is not only comfortable with fast suspend/resume, the Chrome developers demand it. That said, as long as x86 will still try to safeguard my data during mem sleep/resume as it does today, I have no strong feelings about light/heavy-weight mem sleep being strictly a compile-time selectable thing, or a more flexible runtime-selectable behavior. The observation here is that the kernel should not force every system to sys_sync() on every suspend. The only question is how to best implement that. The obvious solution was to delete this forced policy from the kernel, and let user-space handle it. Rafael has not agreed to push that obvious, though less-than-gentle solution upstream, and so I'll re-send the historic patch that allows distros to still sync like it is 1999, if they want to:-) thanks, Len Brown, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 25, 2015 at 1:11 PM, Henrique de Moraes Holschuh wrote: > On Mon, 11 May 2015, Len Brown wrote: >> On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh >> wrote: >> > On Sat, 09 May 2015, Alan Stern wrote: >> >> On Fri, 8 May 2015, Rafael J. Wysocki wrote: >> >> > My current view on that is that whether or not to do a sync() before >> >> > suspending >> >> > ultimately is a policy decision and should belong to user space as such >> >> > (modulo >> >> > the autosleep situation when user space may not know when the suspend >> >> > is going >> >> > to happen). >> >> > >> >> > Moreover, user space is free to do as many sync()s before suspending as >> >> > it >> >> > wants to and the question here is whether or not the *kernel* should >> >> > sync() >> >> > in the suspend code path. >> >> > >> >> > Since we pretty much can demonstrate that having just one sync() in >> >> > there is >> >> > not sufficient in general, should we put two of them in there? Or just >> >> > remove the existing one and leave it to user space entirely? >> >> >> >> I don't know about the advantages of one sync over two. But how about >> >> adding a "syncs_before_suspend" (or just "syncs") sysfs attribute that >> >> takes a small numeric value? The default can be 0, and the user could >> >> set it to 1 or 2 (or higher). >> > >> > IMO it would be much safer to both have that knob, and to set it to keep >> > the >> > current behavior as the default. Userspace will adapt and change that knob >> > to whatever is sufficient based on what it does before signaling the kernel >> > to suspend. >> > >> > A regression in sync-before-suspend is sure to cause data loss episodes, >> > after all. And, as far as bikeshedding goes, IMHO syncs_before_suspend is >> > self-explanatory, which would be a very good reason to use it instead of >> > the >> > shorter requires-you-to-know-what-it-is-about "syncs". >> >> When I first thought about this, I had a similar view: >> https://lkml.org/lkml/2014/1/23/45 >> >> But upon reflection, I do not believe that the kernel is adding value >> here, instead it is imposing a policy, and that policy decision is >> sometimes prohibitively expensive. User-space can do this for itself (and >> in the case of desktop distros, already does), and so the kernel should >> butt-out. > > There is a lot of added value in my filesystems not being trashed by > sleep/resume issues on laptops, IMHO, and the reason why we need the kernel > itself to take care of syncing and freezing filesystems has been explained > elsewhere in this thread. > I thoght for a while before replying, and I think the real issue behind this > thread is the want of a change of expected-but-implied semanthics and > behavior for the system-wide sleep-to-memory trigger, to adequate it to a > new reality for newer classes of devices. > > Entering "mem" suspend mode through sysfs currently has the implied meaning > of "prepare the *entire* system to stay on a powered down state for > pontentially a _long_ time", where long means "certainly more than 10 > seconds" ;-) This is unlikely to be written anywhere, of course, that's just > how it was used by the vast majority for years, at least on traditional > server/desktop/laptop platforms such as x86. The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. The wake-on-packet use-case is generally a much shorter suspend duration, and more frequent number of suspend/resume cycles than the "wake on lid-open or button-press" use case. > On those platforms, we have to assume the user might plug/unplug devices, > that the power supply might shut down while we're sleeping, that the entire > process is not painless and has a reasonable chance of misbehaving (crashes > on sleep/resume are _really_ common), etc. The fact is that sys_sync() in the kernel suspend to mem path is too expensive for devices which suspend/resume quickly, routinely, and reliably. The reality that somebody can stick a broken device into their desktop or laptop and break suspend doesn't change that. > What is the safe and proper thing to do in that situation is not necessarily > the best way to go about it when you actually want a somewhat different > behavior, i.e. to "prepare the system to stay on a powered down state for a > short while, and be very fast because this could happen at a very high > frequency"... > > IMO, we would actually benefit from *adding* new system-wide sleep/suspend > modes that are optimized for oportunistic, short-lived system-wide sleep > cycles (aka "catnap") that is fast to enter and exit from, and which will be > triggered very frequently, instead of trying to change the assumptions and > expected behavior of the current "deep-sleep" mode... Thank you for sharing your opinion. I am going to give up trying to change your mind, and those who share your
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 25, 2015 at 1:11 PM, Henrique de Moraes Holschuh h...@hmh.eng.br wrote: On Mon, 11 May 2015, Len Brown wrote: On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh h...@hmh.eng.br wrote: On Sat, 09 May 2015, Alan Stern wrote: On Fri, 8 May 2015, Rafael J. Wysocki wrote: My current view on that is that whether or not to do a sync() before suspending ultimately is a policy decision and should belong to user space as such (modulo the autosleep situation when user space may not know when the suspend is going to happen). Moreover, user space is free to do as many sync()s before suspending as it wants to and the question here is whether or not the *kernel* should sync() in the suspend code path. Since we pretty much can demonstrate that having just one sync() in there is not sufficient in general, should we put two of them in there? Or just remove the existing one and leave it to user space entirely? I don't know about the advantages of one sync over two. But how about adding a syncs_before_suspend (or just syncs) sysfs attribute that takes a small numeric value? The default can be 0, and the user could set it to 1 or 2 (or higher). IMO it would be much safer to both have that knob, and to set it to keep the current behavior as the default. Userspace will adapt and change that knob to whatever is sufficient based on what it does before signaling the kernel to suspend. A regression in sync-before-suspend is sure to cause data loss episodes, after all. And, as far as bikeshedding goes, IMHO syncs_before_suspend is self-explanatory, which would be a very good reason to use it instead of the shorter requires-you-to-know-what-it-is-about syncs. When I first thought about this, I had a similar view: https://lkml.org/lkml/2014/1/23/45 But upon reflection, I do not believe that the kernel is adding value here, instead it is imposing a policy, and that policy decision is sometimes prohibitively expensive. User-space can do this for itself (and in the case of desktop distros, already does), and so the kernel should butt-out. There is a lot of added value in my filesystems not being trashed by sleep/resume issues on laptops, IMHO, and the reason why we need the kernel itself to take care of syncing and freezing filesystems has been explained elsewhere in this thread. I thoght for a while before replying, and I think the real issue behind this thread is the want of a change of expected-but-implied semanthics and behavior for the system-wide sleep-to-memory trigger, to adequate it to a new reality for newer classes of devices. Entering mem suspend mode through sysfs currently has the implied meaning of prepare the *entire* system to stay on a powered down state for pontentially a _long_ time, where long means certainly more than 10 seconds ;-) This is unlikely to be written anywhere, of course, that's just how it was used by the vast majority for years, at least on traditional server/desktop/laptop platforms such as x86. The _vast_ majority of systems using Linux suspend today are under an Android user-space. Android has no assumption that that suspend to mem will necessarily stay suspended for a long time. The wake-on-packet use-case is generally a much shorter suspend duration, and more frequent number of suspend/resume cycles than the wake on lid-open or button-press use case. On those platforms, we have to assume the user might plug/unplug devices, that the power supply might shut down while we're sleeping, that the entire process is not painless and has a reasonable chance of misbehaving (crashes on sleep/resume are _really_ common), etc. The fact is that sys_sync() in the kernel suspend to mem path is too expensive for devices which suspend/resume quickly, routinely, and reliably. The reality that somebody can stick a broken device into their desktop or laptop and break suspend doesn't change that. What is the safe and proper thing to do in that situation is not necessarily the best way to go about it when you actually want a somewhat different behavior, i.e. to prepare the system to stay on a powered down state for a short while, and be very fast because this could happen at a very high frequency... IMO, we would actually benefit from *adding* new system-wide sleep/suspend modes that are optimized for oportunistic, short-lived system-wide sleep cycles (aka catnap) that is fast to enter and exit from, and which will be triggered very frequently, instead of trying to change the assumptions and expected behavior of the current deep-sleep mode... Thank you for sharing your opinion. I am going to give up trying to change your mind, and those who share your view. I plan to revive my patch from 2014 which makes sys_sync() optional. That will not change the historic behavior, and will still allow everybody to do what they want. Rafael has said that
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon, 11 May 2015, Len Brown wrote: > On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh > wrote: > > On Sat, 09 May 2015, Alan Stern wrote: > >> On Fri, 8 May 2015, Rafael J. Wysocki wrote: > >> > My current view on that is that whether or not to do a sync() before > >> > suspending > >> > ultimately is a policy decision and should belong to user space as such > >> > (modulo > >> > the autosleep situation when user space may not know when the suspend is > >> > going > >> > to happen). > >> > > >> > Moreover, user space is free to do as many sync()s before suspending as > >> > it > >> > wants to and the question here is whether or not the *kernel* should > >> > sync() > >> > in the suspend code path. > >> > > >> > Since we pretty much can demonstrate that having just one sync() in > >> > there is > >> > not sufficient in general, should we put two of them in there? Or just > >> > remove the existing one and leave it to user space entirely? > >> > >> I don't know about the advantages of one sync over two. But how about > >> adding a "syncs_before_suspend" (or just "syncs") sysfs attribute that > >> takes a small numeric value? The default can be 0, and the user could > >> set it to 1 or 2 (or higher). > > > > IMO it would be much safer to both have that knob, and to set it to keep the > > current behavior as the default. Userspace will adapt and change that knob > > to whatever is sufficient based on what it does before signaling the kernel > > to suspend. > > > > A regression in sync-before-suspend is sure to cause data loss episodes, > > after all. And, as far as bikeshedding goes, IMHO syncs_before_suspend is > > self-explanatory, which would be a very good reason to use it instead of the > > shorter requires-you-to-know-what-it-is-about "syncs". > > When I first thought about this, I had a similar view: > https://lkml.org/lkml/2014/1/23/45 > > But upon reflection, I do not believe that the kernel is adding value > here, instead it is imposing a policy, and that policy decision is > sometimes prohibitively expensive. User-space can do this for itself (and > in the case of desktop distros, already does), and so the kernel should > butt-out. There is a lot of added value in my filesystems not being trashed by sleep/resume issues on laptops, IMHO, and the reason why we need the kernel itself to take care of syncing and freezing filesystems has been explained elsewhere in this thread. I thoght for a while before replying, and I think the real issue behind this thread is the want of a change of expected-but-implied semanthics and behavior for the system-wide sleep-to-memory trigger, to adequate it to a new reality for newer classes of devices. Entering "mem" suspend mode through sysfs currently has the implied meaning of "prepare the *entire* system to stay on a powered down state for pontentially a _long_ time", where long means "certainly more than 10 seconds" ;-) This is unlikely to be written anywhere, of course, that's just how it was used by the vast majority for years, at least on traditional server/desktop/laptop platforms such as x86. On those platforms, we have to assume the user might plug/unplug devices, that the power supply might shut down while we're sleeping, that the entire process is not painless and has a reasonable chance of misbehaving (crashes on sleep/resume are _really_ common), etc. What is the safe and proper thing to do in that situation is not necessarily the best way to go about it when you actually want a somewhat different behavior, i.e. to "prepare the system to stay on a powered down state for a short while, and be very fast because this could happen at a very high frequency"... IMO, we would actually benefit from *adding* new system-wide sleep/suspend modes that are optimized for oportunistic, short-lived system-wide sleep cycles (aka "catnap") that is fast to enter and exit from, and which will be triggered very frequently, instead of trying to change the assumptions and expected behavior of the current "deep-sleep" mode... -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Mon, 11 May 2015, Len Brown wrote: On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh h...@hmh.eng.br wrote: On Sat, 09 May 2015, Alan Stern wrote: On Fri, 8 May 2015, Rafael J. Wysocki wrote: My current view on that is that whether or not to do a sync() before suspending ultimately is a policy decision and should belong to user space as such (modulo the autosleep situation when user space may not know when the suspend is going to happen). Moreover, user space is free to do as many sync()s before suspending as it wants to and the question here is whether or not the *kernel* should sync() in the suspend code path. Since we pretty much can demonstrate that having just one sync() in there is not sufficient in general, should we put two of them in there? Or just remove the existing one and leave it to user space entirely? I don't know about the advantages of one sync over two. But how about adding a syncs_before_suspend (or just syncs) sysfs attribute that takes a small numeric value? The default can be 0, and the user could set it to 1 or 2 (or higher). IMO it would be much safer to both have that knob, and to set it to keep the current behavior as the default. Userspace will adapt and change that knob to whatever is sufficient based on what it does before signaling the kernel to suspend. A regression in sync-before-suspend is sure to cause data loss episodes, after all. And, as far as bikeshedding goes, IMHO syncs_before_suspend is self-explanatory, which would be a very good reason to use it instead of the shorter requires-you-to-know-what-it-is-about syncs. When I first thought about this, I had a similar view: https://lkml.org/lkml/2014/1/23/45 But upon reflection, I do not believe that the kernel is adding value here, instead it is imposing a policy, and that policy decision is sometimes prohibitively expensive. User-space can do this for itself (and in the case of desktop distros, already does), and so the kernel should butt-out. There is a lot of added value in my filesystems not being trashed by sleep/resume issues on laptops, IMHO, and the reason why we need the kernel itself to take care of syncing and freezing filesystems has been explained elsewhere in this thread. I thoght for a while before replying, and I think the real issue behind this thread is the want of a change of expected-but-implied semanthics and behavior for the system-wide sleep-to-memory trigger, to adequate it to a new reality for newer classes of devices. Entering mem suspend mode through sysfs currently has the implied meaning of prepare the *entire* system to stay on a powered down state for pontentially a _long_ time, where long means certainly more than 10 seconds ;-) This is unlikely to be written anywhere, of course, that's just how it was used by the vast majority for years, at least on traditional server/desktop/laptop platforms such as x86. On those platforms, we have to assume the user might plug/unplug devices, that the power supply might shut down while we're sleeping, that the entire process is not painless and has a reasonable chance of misbehaving (crashes on sleep/resume are _really_ common), etc. What is the safe and proper thing to do in that situation is not necessarily the best way to go about it when you actually want a somewhat different behavior, i.e. to prepare the system to stay on a powered down state for a short while, and be very fast because this could happen at a very high frequency... IMO, we would actually benefit from *adding* new system-wide sleep/suspend modes that are optimized for oportunistic, short-lived system-wide sleep cycles (aka catnap) that is fast to enter and exit from, and which will be triggered very frequently, instead of trying to change the assumptions and expected behavior of the current deep-sleep mode... -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
>> Putting a function trace on sys_sync and executing sync manually, >> I was able to see it take 100ms, >> though function trace itself could be contributing to that... > > It would seem that way - you need to get the traces to dump to > something that has no sync overhead I don't think that ftrace buffers have sync overhead -- at least not until they are copied into files. >> Curiously, in another run, sync ran at 15ms, but sd suspend exploded to >> 300ms. >> I've seen that in some other results. Sometimes sync if fast, but sd >> then more than makes up for it by being slow:-( > > Oh, I see that too. Normally That's because the filesystem hasn't > been told to enter an idle state and so is doing metadata writeback > IO after the sync. When that happens the sd suspend has wait for > request queues to drain, IO to complete and device caches to flush. > This simply cannot be avoided because suspend never tells the > filesytems to enter an idle state I captured a trace of a slow sd suspend. Apparently, it does two operations -- first a sync_cache, then a stop operation. The sync was fast. The stop command was where all the time went. I'll look at a more modern drive on the same system next week, just for comparison. > i.e. remember what I said initially in this thread about suspend > actually needing to freeze filesystems, not just sync them? I think with the complexity of file systems and the underlying devices, yes, we need to think about how to efficiently and safely suspend/resume them. But sys_sync is too expensive to have hard-coded in the kernel suspend path. Some machines can suspend and resume in under 10ms -- they absolutely do not want sys_sync() hard-coded in the suspend path. >> FYI, >> I ran analyze_suspend.py -x2 >> from current directory /tmp, which is mounted on tmpfs, >> but still found the 2nd sync was very slow -- 200ms >> vs 6 - 20 ms for the sync preceding the 1st suspend. > > So where did that time go? As I pointed out previously, function > trace will only tell us if the delay is data writeback or not. We > seem to have confirmed that the delay is, indeed, writeback of dirty > data. Now we need to identify what the dirty data belongs to: we > need to trace individual writeback events to see what dirty inodes > are actually being written. I expect that analyze_suspend.py is moving data around between the back-to-back suspends when the -x2 option is used -- will look into it. thanks, Len Brown, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote: > > Can you repeat this test on your system, so that we can determine if > > the 5ms ""sync time" is actually just the overhead of inode cache > > traversal? If that is the case, the speed of sync on a clean > > filesystem is already a solved problem - the patchset should be > > merged in the 4.2 cycle > > Yes, drop_caches does seem to help repeated sync on this system: > Exactly what patch series does this? I'm running ext4 (the default, > not btrfs) None. It's the current behaviour of sync that is ends up walking the inode cache in it's entirity to find dirty inodes that need to be waited on. That's what the sync scalability patch series I pointed you at fixes - sync then keeps a "dirty inodes that need to be waited on list" instead of doing a cache traversal to find them. i.e. the "no cache" results you see will soon be the behaviour sync has regardless of the size of the inode cache. > [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo > ext4_inode_cache3536 3536 1008 164 : tunables00 > 0 : slabdata221221 0 That's actually a really small cache to begin with. > > This is the problem we really need to reproduce and track down. > > Putting a function trace on sys_sync and executing sync manually, > I was able to see it take 100ms, > though function trace itself could be contributing to that... It would seem that way - you need to get the traces to dump to something that has no sync overhead > running analyze_suspend.py after the slab tweak above didn't change much. > in one run sync was 20ms (out of a total suspend time of 60ms). Which may be because the inode cache was larger? > Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms. > I've seen that in some other results. Sometimes sync if fast, but sd > then more than makes up for it by being slow:-( Oh, I see that too. Normally That's because the filesystem hasn't been told to enter an idle state and so is doing metadata writeback IO after the sync. When that happens the sd suspend has wait for request queues to drain, IO to complete and device caches to flush. This simply cannot be avoided because suspend never tells the filesytems to enter an idle state i.e. remember what I said initially in this thread about suspend actually needing to freeze filesystems, not just sync them? > FYI, > I ran analyze_suspend.py -x2 > from current directory /tmp, which is mounted on tmpfs, > but still found the 2nd sync was very slow -- 200ms > vs 6 - 20 ms for the sync preceding the 1st suspend. So where did that time go? As I pointed out previously, function trace will only tell us if the delay is data writeback or not. We seem to have confirmed that the delay is, indeed, writeback of dirty data. Now we need to identify what the dirty data belongs to: we need to trace individual writeback events to see what dirty inodes are actually being written. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
> Can you repeat this test on your system, so that we can determine if > the 5ms ""sync time" is actually just the overhead of inode cache > traversal? If that is the case, the speed of sync on a clean > filesystem is already a solved problem - the patchset should be > merged in the 4.2 cycle Yes, drop_caches does seem to help repeated sync on this system: Exactly what patch series does this? I'm running ext4 (the default, not btrfs) [lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sleep 0 ; done real0m0.002s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.001s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.000s sys 0m0.001s [lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done real0m0.004s user0m0.000s sys 0m0.003s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.003s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo ext4_inode_cache3536 3536 1008 164 : tunables00 0 : slabdata221221 0 [lenb@d975xbx ~]$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches " [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo ext4_inode_cache 553 1680 1008 164 : tunables00 0 : slabdata105105 0 [lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done real0m0.002s user0m0.000s sys 0m0.001s real0m0.002s user0m0.000s sys 0m0.002s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.002s >> While they were all to slow, none of them were >> O(500ms), so yes, there >> does seem to be some state change >> that causes the 2nd sync after a resume to be especially slow. >> >> Unfortunately, I've not got an ftrace on the 500ms flavor yet. > > This is the problem we really need to reproduce and track down. Putting a function trace on sys_sync and executing sync manually, I was able to see it take 100ms, though function trace itself could be contributing to that... [lenb@d975xbx ~]$ grep \# trace.20150619_013702 # tracer: function_graph # # TIMECPU DURATION FUNCTION CALLS # | | | | | | | | 374.665063 | 0) # 2229.612 us |} /* __schedule */ 374.665064 | 0) # 2230.571 us | } /* schedule */ 374.665064 | 0) # 2231.494 us |} /* schedule_timeout */ 374.665065 | 0) # 2235.937 us | } /* wait_for_completion */ 374.745616 | 0) # 80518.73 us | } /* __schedule */ 374.745616 | 0) # 80519.47 us |} /* schedule */ 374.745617 | 0) # 80520.28 us | } /* schedule_timeout */ 374.745621 | 0) # 80526.38 us |} /* io_schedule_timeout */ 374.745621 | 0) # 80527.23 us | } /* bit_wait_io */ 374.745622 | 0) # 80531.04 us |} /* __wait_on_bit */ 374.745623 | 0) # 80531.95 us | } /* wait_on_page_bit */ 374.745644 | 0) # 80555.58 us |} /* filemap_fdatawait_range */ 374.745644 | 0) # 80556.36 us | } /* filemap_fdatawait */ 374.748029 | 0) # 1300.848 us | } /* __schedule */ 374.748029 | 0) # 1301.376 us |} /* schedule */ 374.748029 | 0) # 1301.923 us | } /* schedule_timeout */ 374.748032 | 0) # 1306.133 us |} /* io_schedule_timeout */ 374.748032 | 0) # 1306.651 us | } /* bit_wait_io */ 374.748033 | 0) # 1309.298 us |} /* __wait_on_bit */ 374.748033 | 0) # 1309.838 us | } /* wait_on_page_bit */ 374.750502 | 0) # 1099.379 us | } /*
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote: Can you repeat this test on your system, so that we can determine if the 5ms sync time is actually just the overhead of inode cache traversal? If that is the case, the speed of sync on a clean filesystem is already a solved problem - the patchset should be merged in the 4.2 cycle Yes, drop_caches does seem to help repeated sync on this system: Exactly what patch series does this? I'm running ext4 (the default, not btrfs) None. It's the current behaviour of sync that is ends up walking the inode cache in it's entirity to find dirty inodes that need to be waited on. That's what the sync scalability patch series I pointed you at fixes - sync then keeps a dirty inodes that need to be waited on list instead of doing a cache traversal to find them. i.e. the no cache results you see will soon be the behaviour sync has regardless of the size of the inode cache. [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo ext4_inode_cache3536 3536 1008 164 : tunables00 0 : slabdata221221 0 That's actually a really small cache to begin with. This is the problem we really need to reproduce and track down. Putting a function trace on sys_sync and executing sync manually, I was able to see it take 100ms, though function trace itself could be contributing to that... It would seem that way - you need to get the traces to dump to something that has no sync overhead running analyze_suspend.py after the slab tweak above didn't change much. in one run sync was 20ms (out of a total suspend time of 60ms). Which may be because the inode cache was larger? Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms. I've seen that in some other results. Sometimes sync if fast, but sd then more than makes up for it by being slow:-( Oh, I see that too. Normally That's because the filesystem hasn't been told to enter an idle state and so is doing metadata writeback IO after the sync. When that happens the sd suspend has wait for request queues to drain, IO to complete and device caches to flush. This simply cannot be avoided because suspend never tells the filesytems to enter an idle state i.e. remember what I said initially in this thread about suspend actually needing to freeze filesystems, not just sync them? FYI, I ran analyze_suspend.py -x2 from current directory /tmp, which is mounted on tmpfs, but still found the 2nd sync was very slow -- 200ms vs 6 - 20 ms for the sync preceding the 1st suspend. So where did that time go? As I pointed out previously, function trace will only tell us if the delay is data writeback or not. We seem to have confirmed that the delay is, indeed, writeback of dirty data. Now we need to identify what the dirty data belongs to: we need to trace individual writeback events to see what dirty inodes are actually being written. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Putting a function trace on sys_sync and executing sync manually, I was able to see it take 100ms, though function trace itself could be contributing to that... It would seem that way - you need to get the traces to dump to something that has no sync overhead I don't think that ftrace buffers have sync overhead -- at least not until they are copied into files. Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms. I've seen that in some other results. Sometimes sync if fast, but sd then more than makes up for it by being slow:-( Oh, I see that too. Normally That's because the filesystem hasn't been told to enter an idle state and so is doing metadata writeback IO after the sync. When that happens the sd suspend has wait for request queues to drain, IO to complete and device caches to flush. This simply cannot be avoided because suspend never tells the filesytems to enter an idle state I captured a trace of a slow sd suspend. Apparently, it does two operations -- first a sync_cache, then a stop operation. The sync was fast. The stop command was where all the time went. I'll look at a more modern drive on the same system next week, just for comparison. i.e. remember what I said initially in this thread about suspend actually needing to freeze filesystems, not just sync them? I think with the complexity of file systems and the underlying devices, yes, we need to think about how to efficiently and safely suspend/resume them. But sys_sync is too expensive to have hard-coded in the kernel suspend path. Some machines can suspend and resume in under 10ms -- they absolutely do not want sys_sync() hard-coded in the suspend path. FYI, I ran analyze_suspend.py -x2 from current directory /tmp, which is mounted on tmpfs, but still found the 2nd sync was very slow -- 200ms vs 6 - 20 ms for the sync preceding the 1st suspend. So where did that time go? As I pointed out previously, function trace will only tell us if the delay is data writeback or not. We seem to have confirmed that the delay is, indeed, writeback of dirty data. Now we need to identify what the dirty data belongs to: we need to trace individual writeback events to see what dirty inodes are actually being written. I expect that analyze_suspend.py is moving data around between the back-to-back suspends when the -x2 option is used -- will look into it. thanks, Len Brown, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Can you repeat this test on your system, so that we can determine if the 5ms sync time is actually just the overhead of inode cache traversal? If that is the case, the speed of sync on a clean filesystem is already a solved problem - the patchset should be merged in the 4.2 cycle Yes, drop_caches does seem to help repeated sync on this system: Exactly what patch series does this? I'm running ext4 (the default, not btrfs) [lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sleep 0 ; done real0m0.002s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.001s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.001s sys 0m0.000s real0m0.001s user0m0.000s sys 0m0.001s [lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done real0m0.004s user0m0.000s sys 0m0.003s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.003s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s real0m0.002s user0m0.000s sys 0m0.002s [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo ext4_inode_cache3536 3536 1008 164 : tunables00 0 : slabdata221221 0 [lenb@d975xbx ~]$ sudo sh -c echo 3 /proc/sys/vm/drop_caches [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo ext4_inode_cache 553 1680 1008 164 : tunables00 0 : slabdata105105 0 [lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done real0m0.002s user0m0.000s sys 0m0.001s real0m0.002s user0m0.000s sys 0m0.002s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.001s real0m0.001s user0m0.000s sys 0m0.002s While they were all to slow, none of them were O(500ms), so yes, there does seem to be some state change that causes the 2nd sync after a resume to be especially slow. Unfortunately, I've not got an ftrace on the 500ms flavor yet. This is the problem we really need to reproduce and track down. Putting a function trace on sys_sync and executing sync manually, I was able to see it take 100ms, though function trace itself could be contributing to that... [lenb@d975xbx ~]$ grep \# trace.20150619_013702 # tracer: function_graph # # TIMECPU DURATION FUNCTION CALLS # | | | | | | | | 374.665063 | 0) # 2229.612 us |} /* __schedule */ 374.665064 | 0) # 2230.571 us | } /* schedule */ 374.665064 | 0) # 2231.494 us |} /* schedule_timeout */ 374.665065 | 0) # 2235.937 us | } /* wait_for_completion */ 374.745616 | 0) # 80518.73 us | } /* __schedule */ 374.745616 | 0) # 80519.47 us |} /* schedule */ 374.745617 | 0) # 80520.28 us | } /* schedule_timeout */ 374.745621 | 0) # 80526.38 us |} /* io_schedule_timeout */ 374.745621 | 0) # 80527.23 us | } /* bit_wait_io */ 374.745622 | 0) # 80531.04 us |} /* __wait_on_bit */ 374.745623 | 0) # 80531.95 us | } /* wait_on_page_bit */ 374.745644 | 0) # 80555.58 us |} /* filemap_fdatawait_range */ 374.745644 | 0) # 80556.36 us | } /* filemap_fdatawait */ 374.748029 | 0) # 1300.848 us | } /* __schedule */ 374.748029 | 0) # 1301.376 us |} /* schedule */ 374.748029 | 0) # 1301.923 us | } /* schedule_timeout */ 374.748032 | 0) # 1306.133 us |} /* io_schedule_timeout */ 374.748032 | 0) # 1306.651 us | } /* bit_wait_io */ 374.748033 | 0) # 1309.298 us |} /* __wait_on_bit */ 374.748033 | 0) # 1309.838 us | } /* wait_on_page_bit */ 374.750502 | 0) # 1099.379 us | } /* __schedule */
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 18, 2015 at 10:35:44PM -0400, Len Brown wrote: > On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner wrote: > > On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote: > >> On Sun, May 17, 2015 at 9:57 PM, NeilBrown wrote: > >> > 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of > >> >other people claim there shouldn't be much delay. > >> > > >> >Len: can you provide numbers? How long does the sys_sync take > >> >(min/max/mean). I think an interesting number would be in a quick > >> >"resume, do something, suspend" cycle, what percent of the time is > >> > spent > >> >in sys_sync. > >> >Maybe also report what filesystems are in use, and whether you expect > >> >there to be any dirty data. > >> > > >> >If I run "time sync; time sync; time sync" it reports about 50msec for > >> >each sync. If I run "time sleep 0; time sleep 0; time sleep 0" it > >> > reports > >> >about 2 msec. So sys_sync is taking at least 50msec. > >> >Almost all of that is in sync_inodes_sb() and much of that is in > >> >_raw_spin_lock though I might be misinterpreting perf. It seems > >> > to > >> >wait for a BDI flusher thread to go off and do nothing. > >> > > >> >Len: is 50msec enough to bother you? > >> > >> 50ms is not acceptable. > >> 5ms is also not acceptable. > >> > >> If it were guaranteed to be under 1ms, it would not behigh on my > >> "performance pain" list, but I would still require the option > >> to delete it entirely. > >> > >> But sys_sync time is random on many systems, > >> with a very large maximum duration. > >> > >> Attached is the output from analyze_suspend.py -x2 > >> on a desktop machine, which has just an inexpensive SSD for storage. > >> It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel > >> with no tweaks. > > > > More information required. Mounted filesystems, mount options, etc > > at minimum so we have some idea of what is being synced. We don't > > even know what filesystem you are using > > In this case, I am using Fedora 22 defaults on a single SSD: > > ext4 (rw,relatime,seclabel,data=ordered) OK, nothing unusual. > [lenb@d975xbx ~]$ mount > /dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered) > /dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered) And so effectively only a single filesystem that would do IO when sync is called. > > The graph posted just has a single box labelled "sync_filesystems" > > with no other information about what is actually taking that time. > > We need to know what is consuming all that time *inside* > > sync_filesystems() to make any progress here. > > Just for grins, did 100 sys_sync instead of one. > They all ran about the same 5ms. Ok, so what's the overhead? On one of my test VMs, running on spinning disk backed devices and a single ext3 root filesystem and running that sync scalability patch series: > > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git > > superblock-scaling $ for i in `seq 0 1 100`; do time sleep 0 ; done 2>&1 | stats real0m0.002s user0m0.00-0.004000(0.000168317+/-0.00058)s sys 0m0.00-0.003000(0.00154455+/-0.00074)s $ for i in `seq 0 1 100`; do time sync ; done 2>&1 | stats real0m0.002s user0m0.00-0.002000(0.000118812+/-0.00041)s sys 0m0.00-0.003000(0.00151485+/-0.0007)s $ The sync runtime is exactly the same as running "sleep 0" 100 times. Without the sync scalability patchset: $ for i in `seq 0 1 100`; do time sync ; done 2>&1 | stats real 0m0.007s user 0m0.00-0.004000(0.000316832+/-0.00096)s sys 0m0.003000-0.008000(0.00690099+/-0.00099)s $ There's 5ms extra time on every sync, and it is entirely system CPU time. That will be the inode cache traversal overhead. To confirm, drop the cache and rerun: $ sudo grep ext3_inode /proc/slabinfo ext3_inode_cache 23086 23086960 174 : tunables000 : slabdata 1358 1358 0 $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches " $ sudo grep ext3_inode /proc/slabinfo ext3_inode_cache3152 4488960 174 : tunables000 : slabdata264264 0 $ for i in `seq 0 1 100`; do time sync ; done 2>&1 | stats real 0m0.002s user 0m0.00-0.005000(0.000227723+/-0.00074)s sys 0m0.00-0.002000(0.00182178+/-0.00057)s Dropping the inode cache (from 23,000 inodes to 3,000) makes the 5ms sync runtime disappear. Can you repeat this test on your system, so that we can determine if the 5ms ""sync time" is actually just the overhead of inode cache traversal? If that is the case, the speed of sync on a clean filesystem is already a solved problem - the patchset should be merged in the 4.2 cycle > While they were all to slow, none of them were > O(500ms), so yes, there > does seem to be some state change > that causes the 2nd sync after a resume to be especially slow. > > Unfortunately, I've not got an ftrace on the 500ms
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner wrote: > On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote: >> On Sun, May 17, 2015 at 9:57 PM, NeilBrown wrote: >> > On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes >> > wrote: >> > >> >> > > Data loss may be caused for hotplug storage(like USB), or all storage >> >> > > when power is exhausted during suspend. >> >> > >> >> > Which also may very well happen at run time, right? >> >> >> >> Intuitively users treat "suspended" as a bit like off and do remove >> >> devices they've "finished with". >> >> >> >> Technically yes the instant off/on on a phone is the same as the suspend >> >> to RAM on a laptop but it doesn't mean the model in people's heads is the >> >> same... not yet anyway. >> >> >> >> > > Is there obvious advantage to remove sys_sync() in the case? >> >> > >> >> > Yes, there is. It is not necessary to sync() every time you suspend >> >> > if you do that very often. >> >> >> >> But if you do it very often you won't have any dirty pages to flush so it >> >> will be very fast. >> > >> > Several people have said things like this, and yet Len clearly saw a >> > problem >> > so sometimes it isn't as fast as you think it should be. I guess we need >> > some data. >> > >> > In fact there seem to be a number of open questions: >> > >> > 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of >> >other people claim there shouldn't be much delay. >> > >> >Len: can you provide numbers? How long does the sys_sync take >> >(min/max/mean). I think an interesting number would be in a quick >> >"resume, do something, suspend" cycle, what percent of the time is spent >> >in sys_sync. >> >Maybe also report what filesystems are in use, and whether you expect >> >there to be any dirty data. >> > >> >If I run "time sync; time sync; time sync" it reports about 50msec for >> >each sync. If I run "time sleep 0; time sleep 0; time sleep 0" it >> > reports >> >about 2 msec. So sys_sync is taking at least 50msec. >> >Almost all of that is in sync_inodes_sb() and much of that is in >> >_raw_spin_lock though I might be misinterpreting perf. It seems to >> >wait for a BDI flusher thread to go off and do nothing. >> > >> >Len: is 50msec enough to bother you? >> >> 50ms is not acceptable. >> 5ms is also not acceptable. >> >> If it were guaranteed to be under 1ms, it would not behigh on my >> "performance pain" list, but I would still require the option >> to delete it entirely. >> >> But sys_sync time is random on many systems, >> with a very large maximum duration. >> >> Attached is the output from analyze_suspend.py -x2 >> on a desktop machine, which has just an inexpensive SSD for storage. >> It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel >> with no tweaks. > > More information required. Mounted filesystems, mount options, etc > at minimum so we have some idea of what is being synced. We don't > even know what filesystem you are using In this case, I am using Fedora 22 defaults on a single SSD: ext4 (rw,relatime,seclabel,data=ordered) [lenb@d975xbx ~]$ mount /dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered) devtmpfs on /dev type devtmpfs (rw,relatime,seclabel,size=3053436k,nr_inodes=76) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,rela) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=6) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,x) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cp) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatim) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,r) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blk) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,h) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relati) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,me) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,d) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,f) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=21,pgrp=1,tim) debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) tmpfs on /tmp type tmpfs
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote: > On Sun, May 17, 2015 at 9:57 PM, NeilBrown wrote: > > On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes > > wrote: > > > >> > > Data loss may be caused for hotplug storage(like USB), or all storage > >> > > when power is exhausted during suspend. > >> > > >> > Which also may very well happen at run time, right? > >> > >> Intuitively users treat "suspended" as a bit like off and do remove > >> devices they've "finished with". > >> > >> Technically yes the instant off/on on a phone is the same as the suspend > >> to RAM on a laptop but it doesn't mean the model in people's heads is the > >> same... not yet anyway. > >> > >> > > Is there obvious advantage to remove sys_sync() in the case? > >> > > >> > Yes, there is. It is not necessary to sync() every time you suspend > >> > if you do that very often. > >> > >> But if you do it very often you won't have any dirty pages to flush so it > >> will be very fast. > > > > Several people have said things like this, and yet Len clearly saw a problem > > so sometimes it isn't as fast as you think it should be. I guess we need > > some data. > > > > In fact there seem to be a number of open questions: > > > > 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of > >other people claim there shouldn't be much delay. > > > >Len: can you provide numbers? How long does the sys_sync take > >(min/max/mean). I think an interesting number would be in a quick > >"resume, do something, suspend" cycle, what percent of the time is spent > >in sys_sync. > >Maybe also report what filesystems are in use, and whether you expect > >there to be any dirty data. > > > >If I run "time sync; time sync; time sync" it reports about 50msec for > >each sync. If I run "time sleep 0; time sleep 0; time sleep 0" it > > reports > >about 2 msec. So sys_sync is taking at least 50msec. > >Almost all of that is in sync_inodes_sb() and much of that is in > >_raw_spin_lock though I might be misinterpreting perf. It seems to > >wait for a BDI flusher thread to go off and do nothing. > > > >Len: is 50msec enough to bother you? > > 50ms is not acceptable. > 5ms is also not acceptable. > > If it were guaranteed to be under 1ms, it would not behigh on my > "performance pain" list, but I would still require the option > to delete it entirely. > > But sys_sync time is random on many systems, > with a very large maximum duration. > > Attached is the output from analyze_suspend.py -x2 > on a desktop machine, which has just an inexpensive SSD for storage. > It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel > with no tweaks. More information required. Mounted filesystems, mount options, etc at minimum so we have some idea of what is being synced. We don't even know what filesystem you are using > The 1st suspend starts with a sync that takes 6.6ms > But the 2nd suspend starts with a sync that takes 451.8ms The graph posted just has a single box labelled "sync_filesystems" with no other information about what is actually taking that time. We need to know what is consuming all that time *inside* sync_filesystems() to make any progress here. > So the reasoning that subsequent sys_sync's should be instantly quick > because they are following a previous sys_sync is found to be simply > erroneous by observations such as this. Which indicates that either the system is not idle as expected, or there's some kind of bug in one of the filesystems that is being synced. But we can't do any analysis of the problem you are seeing if the only information you give us "it takes too long". Something like a function trace that tells use exactly what functions are being called and how long they take to execute would be really handy here. Alternatively, you don't need to suspend to test this - just run a test program on an idle system that does two sync() calls 50ms apart and see what happens. If you can't reproduce it outside of a suspend operation, then we need to look at what suspend is actually causing to be dirtied in those 50ms. If you can reproduce it, you can then even narrow down the filesystem that is causing the problem by using syncfs() instead of sync() to test the filesystems one at a time, and once you've done that take a writeback event trace so we can see exactly what inodes/data is being written back, and hence identify what the first sync is not cleaning. FWIW, I'd also suggest trying the changes to the sync code here: git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling As they avoid the need for sync to walk all the cached inodes in the system needlessly and so should reduce the CPU overhead of sys_sync() quite substantially. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 18, 2015 at 10:35:44PM -0400, Len Brown wrote: On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner da...@fromorbit.com wrote: On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote: On Sun, May 17, 2015 at 9:57 PM, NeilBrown ne...@suse.de wrote: 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of other people claim there shouldn't be much delay. Len: can you provide numbers? How long does the sys_sync take (min/max/mean). I think an interesting number would be in a quick resume, do something, suspend cycle, what percent of the time is spent in sys_sync. Maybe also report what filesystems are in use, and whether you expect there to be any dirty data. If I run time sync; time sync; time sync it reports about 50msec for each sync. If I run time sleep 0; time sleep 0; time sleep 0 it reports about 2 msec. So sys_sync is taking at least 50msec. Almost all of that is in sync_inodes_sb() and much of that is in _raw_spin_lock though I might be misinterpreting perf. It seems to wait for a BDI flusher thread to go off and do nothing. Len: is 50msec enough to bother you? 50ms is not acceptable. 5ms is also not acceptable. If it were guaranteed to be under 1ms, it would not behigh on my performance pain list, but I would still require the option to delete it entirely. But sys_sync time is random on many systems, with a very large maximum duration. Attached is the output from analyze_suspend.py -x2 on a desktop machine, which has just an inexpensive SSD for storage. It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel with no tweaks. More information required. Mounted filesystems, mount options, etc at minimum so we have some idea of what is being synced. We don't even know what filesystem you are using In this case, I am using Fedora 22 defaults on a single SSD: ext4 (rw,relatime,seclabel,data=ordered) OK, nothing unusual. [lenb@d975xbx ~]$ mount /dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered) /dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered) And so effectively only a single filesystem that would do IO when sync is called. The graph posted just has a single box labelled sync_filesystems with no other information about what is actually taking that time. We need to know what is consuming all that time *inside* sync_filesystems() to make any progress here. Just for grins, did 100 sys_sync instead of one. They all ran about the same 5ms. Ok, so what's the overhead? On one of my test VMs, running on spinning disk backed devices and a single ext3 root filesystem and running that sync scalability patch series: git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling $ for i in `seq 0 1 100`; do time sleep 0 ; done 21 | stats real0m0.002s user0m0.00-0.004000(0.000168317+/-0.00058)s sys 0m0.00-0.003000(0.00154455+/-0.00074)s $ for i in `seq 0 1 100`; do time sync ; done 21 | stats real0m0.002s user0m0.00-0.002000(0.000118812+/-0.00041)s sys 0m0.00-0.003000(0.00151485+/-0.0007)s $ The sync runtime is exactly the same as running sleep 0 100 times. Without the sync scalability patchset: $ for i in `seq 0 1 100`; do time sync ; done 21 | stats real 0m0.007s user 0m0.00-0.004000(0.000316832+/-0.00096)s sys 0m0.003000-0.008000(0.00690099+/-0.00099)s $ There's 5ms extra time on every sync, and it is entirely system CPU time. That will be the inode cache traversal overhead. To confirm, drop the cache and rerun: $ sudo grep ext3_inode /proc/slabinfo ext3_inode_cache 23086 23086960 174 : tunables000 : slabdata 1358 1358 0 $ sudo sh -c echo 3 /proc/sys/vm/drop_caches $ sudo grep ext3_inode /proc/slabinfo ext3_inode_cache3152 4488960 174 : tunables000 : slabdata264264 0 $ for i in `seq 0 1 100`; do time sync ; done 21 | stats real 0m0.002s user 0m0.00-0.005000(0.000227723+/-0.00074)s sys 0m0.00-0.002000(0.00182178+/-0.00057)s Dropping the inode cache (from 23,000 inodes to 3,000) makes the 5ms sync runtime disappear. Can you repeat this test on your system, so that we can determine if the 5ms sync time is actually just the overhead of inode cache traversal? If that is the case, the speed of sync on a clean filesystem is already a solved problem - the patchset should be merged in the 4.2 cycle While they were all to slow, none of them were O(500ms), so yes, there does seem to be some state change that causes the 2nd sync after a resume to be especially slow. Unfortunately, I've not got an ftrace on the 500ms flavor yet. This is the problem we really need to reproduce and track down. So the reasoning that subsequent sys_sync's should be instantly quick because they
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote: On Sun, May 17, 2015 at 9:57 PM, NeilBrown ne...@suse.de wrote: On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes gno...@lxorguk.ukuu.org.uk wrote: Data loss may be caused for hotplug storage(like USB), or all storage when power is exhausted during suspend. Which also may very well happen at run time, right? Intuitively users treat suspended as a bit like off and do remove devices they've finished with. Technically yes the instant off/on on a phone is the same as the suspend to RAM on a laptop but it doesn't mean the model in people's heads is the same... not yet anyway. Is there obvious advantage to remove sys_sync() in the case? Yes, there is. It is not necessary to sync() every time you suspend if you do that very often. But if you do it very often you won't have any dirty pages to flush so it will be very fast. Several people have said things like this, and yet Len clearly saw a problem so sometimes it isn't as fast as you think it should be. I guess we need some data. In fact there seem to be a number of open questions: 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of other people claim there shouldn't be much delay. Len: can you provide numbers? How long does the sys_sync take (min/max/mean). I think an interesting number would be in a quick resume, do something, suspend cycle, what percent of the time is spent in sys_sync. Maybe also report what filesystems are in use, and whether you expect there to be any dirty data. If I run time sync; time sync; time sync it reports about 50msec for each sync. If I run time sleep 0; time sleep 0; time sleep 0 it reports about 2 msec. So sys_sync is taking at least 50msec. Almost all of that is in sync_inodes_sb() and much of that is in _raw_spin_lock though I might be misinterpreting perf. It seems to wait for a BDI flusher thread to go off and do nothing. Len: is 50msec enough to bother you? 50ms is not acceptable. 5ms is also not acceptable. If it were guaranteed to be under 1ms, it would not behigh on my performance pain list, but I would still require the option to delete it entirely. But sys_sync time is random on many systems, with a very large maximum duration. Attached is the output from analyze_suspend.py -x2 on a desktop machine, which has just an inexpensive SSD for storage. It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel with no tweaks. More information required. Mounted filesystems, mount options, etc at minimum so we have some idea of what is being synced. We don't even know what filesystem you are using The 1st suspend starts with a sync that takes 6.6ms But the 2nd suspend starts with a sync that takes 451.8ms The graph posted just has a single box labelled sync_filesystems with no other information about what is actually taking that time. We need to know what is consuming all that time *inside* sync_filesystems() to make any progress here. So the reasoning that subsequent sys_sync's should be instantly quick because they are following a previous sys_sync is found to be simply erroneous by observations such as this. Which indicates that either the system is not idle as expected, or there's some kind of bug in one of the filesystems that is being synced. But we can't do any analysis of the problem you are seeing if the only information you give us it takes too long. Something like a function trace that tells use exactly what functions are being called and how long they take to execute would be really handy here. Alternatively, you don't need to suspend to test this - just run a test program on an idle system that does two sync() calls 50ms apart and see what happens. If you can't reproduce it outside of a suspend operation, then we need to look at what suspend is actually causing to be dirtied in those 50ms. If you can reproduce it, you can then even narrow down the filesystem that is causing the problem by using syncfs() instead of sync() to test the filesystems one at a time, and once you've done that take a writeback event trace so we can see exactly what inodes/data is being written back, and hence identify what the first sync is not cleaning. FWIW, I'd also suggest trying the changes to the sync code here: git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling As they avoid the need for sync to walk all the cached inodes in the system needlessly and so should reduce the CPU overhead of sys_sync() quite substantially. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH 1/1] suspend: delete sys_sync()
On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner da...@fromorbit.com wrote: On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote: On Sun, May 17, 2015 at 9:57 PM, NeilBrown ne...@suse.de wrote: On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes gno...@lxorguk.ukuu.org.uk wrote: Data loss may be caused for hotplug storage(like USB), or all storage when power is exhausted during suspend. Which also may very well happen at run time, right? Intuitively users treat suspended as a bit like off and do remove devices they've finished with. Technically yes the instant off/on on a phone is the same as the suspend to RAM on a laptop but it doesn't mean the model in people's heads is the same... not yet anyway. Is there obvious advantage to remove sys_sync() in the case? Yes, there is. It is not necessary to sync() every time you suspend if you do that very often. But if you do it very often you won't have any dirty pages to flush so it will be very fast. Several people have said things like this, and yet Len clearly saw a problem so sometimes it isn't as fast as you think it should be. I guess we need some data. In fact there seem to be a number of open questions: 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of other people claim there shouldn't be much delay. Len: can you provide numbers? How long does the sys_sync take (min/max/mean). I think an interesting number would be in a quick resume, do something, suspend cycle, what percent of the time is spent in sys_sync. Maybe also report what filesystems are in use, and whether you expect there to be any dirty data. If I run time sync; time sync; time sync it reports about 50msec for each sync. If I run time sleep 0; time sleep 0; time sleep 0 it reports about 2 msec. So sys_sync is taking at least 50msec. Almost all of that is in sync_inodes_sb() and much of that is in _raw_spin_lock though I might be misinterpreting perf. It seems to wait for a BDI flusher thread to go off and do nothing. Len: is 50msec enough to bother you? 50ms is not acceptable. 5ms is also not acceptable. If it were guaranteed to be under 1ms, it would not behigh on my performance pain list, but I would still require the option to delete it entirely. But sys_sync time is random on many systems, with a very large maximum duration. Attached is the output from analyze_suspend.py -x2 on a desktop machine, which has just an inexpensive SSD for storage. It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel with no tweaks. More information required. Mounted filesystems, mount options, etc at minimum so we have some idea of what is being synced. We don't even know what filesystem you are using In this case, I am using Fedora 22 defaults on a single SSD: ext4 (rw,relatime,seclabel,data=ordered) [lenb@d975xbx ~]$ mount /dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered) devtmpfs on /dev type devtmpfs (rw,relatime,seclabel,size=3053436k,nr_inodes=76) sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,rela) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=6) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,x) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cp) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatim) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,r) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blk) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,h) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relati) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,me) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,d) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,f) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=21,pgrp=1,tim) debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) tmpfs on /tmp type tmpfs (rw,seclabel) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) configfs on /sys/kernel/config type configfs (rw,relatime) sunrpc on /var/lib/nfs/rpc_pipefs type
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes wrote: > > > Data loss may be caused for hotplug storage(like USB), or all storage > > > when power is exhausted during suspend. > > > > Which also may very well happen at run time, right? > > Intuitively users treat "suspended" as a bit like off and do remove > devices they've "finished with". > > Technically yes the instant off/on on a phone is the same as the suspend > to RAM on a laptop but it doesn't mean the model in people's heads is the > same... not yet anyway. > > > > Is there obvious advantage to remove sys_sync() in the case? > > > > Yes, there is. It is not necessary to sync() every time you suspend > > if you do that very often. > > But if you do it very often you won't have any dirty pages to flush so it > will be very fast. Several people have said things like this, and yet Len clearly saw a problem so sometimes it isn't as fast as you think it should be. I guess we need some data. In fact there seem to be a number of open questions: 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of other people claim there shouldn't be much delay. Len: can you provide numbers? How long does the sys_sync take (min/max/mean). I think an interesting number would be in a quick "resume, do something, suspend" cycle, what percent of the time is spent in sys_sync. Maybe also report what filesystems are in use, and whether you expect there to be any dirty data. If I run "time sync; time sync; time sync" it reports about 50msec for each sync. If I run "time sleep 0; time sleep 0; time sleep 0" it reports about 2 msec. So sys_sync is taking at least 50msec. Almost all of that is in sync_inodes_sb() and much of that is in _raw_spin_lock though I might be misinterpreting perf. It seems to wait for a BDI flusher thread to go off and do nothing. Len: is 50msec enough to bother you? 2/ Is a 'sys_sync' really necessary. People claim that "users might lose data". I claim that some user data could be still in the application buffers so they would lose that anyway, and applications call fsync after writing everything out, so sys_sync isn't necessary. Does anyone have an example of an application which writes important user data, but doesn't call "fsync" shortly after all data has been written out of the application's internal buffers? 3/ Is a 'sys_sync' sufficient. Len claims that in some cases, when running sync; sync the second sync takes longer, suggesting that the first sync didn't do everything that was expected. Prior to Linux 1.3.20, sys_sync didn't wait for everything. Since then it seems to be designed to, but maybe something isn't right there. In that case, having sys_sync may not be helping anyway. Len: can you reliably reproduce this? Can you provide a recipe? 4/ Which part of 'sync' is really important? sys_sync does two things. It initiates writeout on dirty data and it wait for writeout to complete. Even if the former isn't necessary, the latter probably is. Do we need to reliably wait for all output queues to flush? Where is the best place to do that? Thanks, NeilBrown > > > And it is done in such a place that everything needs to wait for it to > > complete. > > Only because the code deciding to trigger any automated suspend doesn't > do a sync a few seconds before. In the case the user goes to the menus > and does power->suspend then yes it's a delay. In the case where the OS > at some level has decided that it's 10 seconds from automatically > suspending to something the user space can issue a pre-emptive sync to > get the queue size down. > > Alan pgpIOVBdOCU28.pgp Description: OpenPGP digital signature
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes gno...@lxorguk.ukuu.org.uk wrote: Data loss may be caused for hotplug storage(like USB), or all storage when power is exhausted during suspend. Which also may very well happen at run time, right? Intuitively users treat suspended as a bit like off and do remove devices they've finished with. Technically yes the instant off/on on a phone is the same as the suspend to RAM on a laptop but it doesn't mean the model in people's heads is the same... not yet anyway. Is there obvious advantage to remove sys_sync() in the case? Yes, there is. It is not necessary to sync() every time you suspend if you do that very often. But if you do it very often you won't have any dirty pages to flush so it will be very fast. Several people have said things like this, and yet Len clearly saw a problem so sometimes it isn't as fast as you think it should be. I guess we need some data. In fact there seem to be a number of open questions: 1/ Len has seen enough delays to bother sending a patch. Twice. Lots of other people claim there shouldn't be much delay. Len: can you provide numbers? How long does the sys_sync take (min/max/mean). I think an interesting number would be in a quick resume, do something, suspend cycle, what percent of the time is spent in sys_sync. Maybe also report what filesystems are in use, and whether you expect there to be any dirty data. If I run time sync; time sync; time sync it reports about 50msec for each sync. If I run time sleep 0; time sleep 0; time sleep 0 it reports about 2 msec. So sys_sync is taking at least 50msec. Almost all of that is in sync_inodes_sb() and much of that is in _raw_spin_lock though I might be misinterpreting perf. It seems to wait for a BDI flusher thread to go off and do nothing. Len: is 50msec enough to bother you? 2/ Is a 'sys_sync' really necessary. People claim that users might lose data. I claim that some user data could be still in the application buffers so they would lose that anyway, and applications call fsync after writing everything out, so sys_sync isn't necessary. Does anyone have an example of an application which writes important user data, but doesn't call fsync shortly after all data has been written out of the application's internal buffers? 3/ Is a 'sys_sync' sufficient. Len claims that in some cases, when running sync; sync the second sync takes longer, suggesting that the first sync didn't do everything that was expected. Prior to Linux 1.3.20, sys_sync didn't wait for everything. Since then it seems to be designed to, but maybe something isn't right there. In that case, having sys_sync may not be helping anyway. Len: can you reliably reproduce this? Can you provide a recipe? 4/ Which part of 'sync' is really important? sys_sync does two things. It initiates writeout on dirty data and it wait for writeout to complete. Even if the former isn't necessary, the latter probably is. Do we need to reliably wait for all output queues to flush? Where is the best place to do that? Thanks, NeilBrown And it is done in such a place that everything needs to wait for it to complete. Only because the code deciding to trigger any automated suspend doesn't do a sync a few seconds before. In the case the user goes to the menus and does power-suspend then yes it's a delay. In the case where the OS at some level has decided that it's 10 seconds from automatically suspending to something the user space can issue a pre-emptive sync to get the queue size down. Alan pgpIOVBdOCU28.pgp Description: OpenPGP digital signature
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015, Alan Stern wrote: > On Fri, 15 May 2015, NeilBrown wrote: > > > Some storage devices don't handle suspend as well as they should and lose > > requests resulting in corruption. They should obviously be fixed, but it is > > you who gets the problem reports and you are not in a position to fix them. > > So you want a general solution that hides those problems. > > sys_sync at suspend time is a sort-of solution because it flushes and waits > > so there is less in-flight IO immediately after a sys_sync and so less > > opportunity for a bad device to stuff up. > > But you seem to suggest that sys_sync isn't a complete solution and it > > doesn't guarantee that xfs is not doing some background metadata IO. > > > > Maybe a sensible thing to do would be to hook the "disk" devices into > > suspend > > and have them flush their queue and possibly send a CACHE_FLUSH command. > > That would provide more of a guarantee for you, and less of a cost for Len, > > would it not? > > The sd driver already does this. Sorry -- I meant that it does send a SYNCHRONIZE CACHE command. It doesn't flush the I/O queue. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015, NeilBrown wrote: > Some storage devices don't handle suspend as well as they should and lose > requests resulting in corruption. They should obviously be fixed, but it is > you who gets the problem reports and you are not in a position to fix them. > So you want a general solution that hides those problems. > sys_sync at suspend time is a sort-of solution because it flushes and waits > so there is less in-flight IO immediately after a sys_sync and so less > opportunity for a bad device to stuff up. > But you seem to suggest that sys_sync isn't a complete solution and it > doesn't guarantee that xfs is not doing some background metadata IO. > > Maybe a sensible thing to do would be to hook the "disk" devices into suspend > and have them flush their queue and possibly send a CACHE_FLUSH command. > That would provide more of a guarantee for you, and less of a cost for Len, > would it not? The sd driver already does this. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015, Dave Chinner wrote: > It has nothing to do with journalling, and everything to do with > bring filesystems to an *idle state* before suspend runs. We have a > long history of bug reports with XFS that go: suspend, resume, XFS > almost immediately detects corruption, shuts down. > > The problem is that "sync" doesn't make the filesystem idle - XFs > has *lots* of background work going on, and if we aren't *real > careful* the filesystem is still doing work while the hardware gets > powerd down and the suspend image is being taken. the result is on > resume that the on-disk filesystem state does not match the memory > image pulled back from resume, and we get shutdowns. > > sys_sync() does not guarantee a filesystem is idle - it guarantees > the data in memory is recoverable, butit doesn't stop the filesystem > from doing things like writing back metadata or running background > cleaup tasks. If those aren't stopped properly, then we get into > the state where in-memory and on-disk state get out of whack. And > s2ram can have these problems too, because if there is IO in flight > when the hardware is powered down, that IO is lost > > Every time some piece of generic infrastructure changes behaviour > w.r.t. suspend/resume, we get a new set of problems being reported > by users. It's extremely hard to test for these problems and it > might take months of occasional corruption reports from a user to > isolate it to being a suspend/resume problem. It's a game of > whack-a-mole, because quite often they come down to the fact that > something changed and nobody in the XFS world knew they had to now > set an different initialisation flag on some structure or workqueue > to make it work the way it needed to work. > > Go back an look at the history of sys_sync() in suspend discussions > over the past 10 years. You'll find me saying exactly the same > thing again and again about sys_sync(): it does not guarantee the > filesystem is in an idle or coherent, unchanging state, and nothing > in the suspend code tells the filesystem to enter an idle or frozen > state. We actually have mechanisms for doing this - we use it in the > storage layers to idle the filesystem while we do things like *take > a snapshot*. > > What is the mechanism suspend to disk uses? It *takes a snapshot* of > system state, written to disk. It's supposed to be consistent, and > the only way you can guarantee the state of an active, mounted > filesystem has consistent in-memory state and on-disk state and > that it won't get changed is to *freeze the filesystem*. > > Removing the sync is only going to make this problem worse because > the delta between on-disk and in-memory state is going to be much, > much larger. There is also likely to be significant filesystem > activity occurring when the filesystem has all it's background > threads and work queues abruptly frozen with no warning or > co-ordination, which makes it impossible for anyone to test > suspend/resume reliably. > > Sorry for the long rant, but I've been saying the same thing for 10 > years, which is abotu as long as I've been dealing with filesystem > corruptions that have resulted from suspend/resume. Are you suggesting that filesystems should register a "freeze the filesystem" routine, and the PM core should call this routine during hibernation (aka suspend-to-disk)? With a corresponding "unfreeze" hook for resume, of course. That seems like a good idea to me. It would also present a logical place to add some simple checks for changes to the on-disk image caused by the user moving the storage device to another system, modifying the contents, and then moving it back all while the current system was hibernating (or suspended). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
> > Data loss may be caused for hotplug storage(like USB), or all storage > > when power is exhausted during suspend. > > Which also may very well happen at run time, right? Intuitively users treat "suspended" as a bit like off and do remove devices they've "finished with". Technically yes the instant off/on on a phone is the same as the suspend to RAM on a laptop but it doesn't mean the model in people's heads is the same... not yet anyway. > > Is there obvious advantage to remove sys_sync() in the case? > > Yes, there is. It is not necessary to sync() every time you suspend > if you do that very often. But if you do it very often you won't have any dirty pages to flush so it will be very fast. > And it is done in such a place that everything needs to wait for it to > complete. Only because the code deciding to trigger any automated suspend doesn't do a sync a few seconds before. In the case the user goes to the menus and does power->suspend then yes it's a delay. In the case where the OS at some level has decided that it's 10 seconds from automatically suspending to something the user space can issue a pre-emptive sync to get the queue size down. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
Data loss may be caused for hotplug storage(like USB), or all storage when power is exhausted during suspend. Which also may very well happen at run time, right? Intuitively users treat suspended as a bit like off and do remove devices they've finished with. Technically yes the instant off/on on a phone is the same as the suspend to RAM on a laptop but it doesn't mean the model in people's heads is the same... not yet anyway. Is there obvious advantage to remove sys_sync() in the case? Yes, there is. It is not necessary to sync() every time you suspend if you do that very often. But if you do it very often you won't have any dirty pages to flush so it will be very fast. And it is done in such a place that everything needs to wait for it to complete. Only because the code deciding to trigger any automated suspend doesn't do a sync a few seconds before. In the case the user goes to the menus and does power-suspend then yes it's a delay. In the case where the OS at some level has decided that it's 10 seconds from automatically suspending to something the user space can issue a pre-emptive sync to get the queue size down. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015, NeilBrown wrote: Some storage devices don't handle suspend as well as they should and lose requests resulting in corruption. They should obviously be fixed, but it is you who gets the problem reports and you are not in a position to fix them. So you want a general solution that hides those problems. sys_sync at suspend time is a sort-of solution because it flushes and waits so there is less in-flight IO immediately after a sys_sync and so less opportunity for a bad device to stuff up. But you seem to suggest that sys_sync isn't a complete solution and it doesn't guarantee that xfs is not doing some background metadata IO. Maybe a sensible thing to do would be to hook the disk devices into suspend and have them flush their queue and possibly send a CACHE_FLUSH command. That would provide more of a guarantee for you, and less of a cost for Len, would it not? The sd driver already does this. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015, Dave Chinner wrote: It has nothing to do with journalling, and everything to do with bring filesystems to an *idle state* before suspend runs. We have a long history of bug reports with XFS that go: suspend, resume, XFS almost immediately detects corruption, shuts down. The problem is that sync doesn't make the filesystem idle - XFs has *lots* of background work going on, and if we aren't *real careful* the filesystem is still doing work while the hardware gets powerd down and the suspend image is being taken. the result is on resume that the on-disk filesystem state does not match the memory image pulled back from resume, and we get shutdowns. sys_sync() does not guarantee a filesystem is idle - it guarantees the data in memory is recoverable, butit doesn't stop the filesystem from doing things like writing back metadata or running background cleaup tasks. If those aren't stopped properly, then we get into the state where in-memory and on-disk state get out of whack. And s2ram can have these problems too, because if there is IO in flight when the hardware is powered down, that IO is lost Every time some piece of generic infrastructure changes behaviour w.r.t. suspend/resume, we get a new set of problems being reported by users. It's extremely hard to test for these problems and it might take months of occasional corruption reports from a user to isolate it to being a suspend/resume problem. It's a game of whack-a-mole, because quite often they come down to the fact that something changed and nobody in the XFS world knew they had to now set an different initialisation flag on some structure or workqueue to make it work the way it needed to work. Go back an look at the history of sys_sync() in suspend discussions over the past 10 years. You'll find me saying exactly the same thing again and again about sys_sync(): it does not guarantee the filesystem is in an idle or coherent, unchanging state, and nothing in the suspend code tells the filesystem to enter an idle or frozen state. We actually have mechanisms for doing this - we use it in the storage layers to idle the filesystem while we do things like *take a snapshot*. What is the mechanism suspend to disk uses? It *takes a snapshot* of system state, written to disk. It's supposed to be consistent, and the only way you can guarantee the state of an active, mounted filesystem has consistent in-memory state and on-disk state and that it won't get changed is to *freeze the filesystem*. Removing the sync is only going to make this problem worse because the delta between on-disk and in-memory state is going to be much, much larger. There is also likely to be significant filesystem activity occurring when the filesystem has all it's background threads and work queues abruptly frozen with no warning or co-ordination, which makes it impossible for anyone to test suspend/resume reliably. Sorry for the long rant, but I've been saying the same thing for 10 years, which is abotu as long as I've been dealing with filesystem corruptions that have resulted from suspend/resume. Are you suggesting that filesystems should register a freeze the filesystem routine, and the PM core should call this routine during hibernation (aka suspend-to-disk)? With a corresponding unfreeze hook for resume, of course. That seems like a good idea to me. It would also present a logical place to add some simple checks for changes to the on-disk image caused by the user moving the storage device to another system, modifying the contents, and then moving it back all while the current system was hibernating (or suspended). Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015, Alan Stern wrote: On Fri, 15 May 2015, NeilBrown wrote: Some storage devices don't handle suspend as well as they should and lose requests resulting in corruption. They should obviously be fixed, but it is you who gets the problem reports and you are not in a position to fix them. So you want a general solution that hides those problems. sys_sync at suspend time is a sort-of solution because it flushes and waits so there is less in-flight IO immediately after a sys_sync and so less opportunity for a bad device to stuff up. But you seem to suggest that sys_sync isn't a complete solution and it doesn't guarantee that xfs is not doing some background metadata IO. Maybe a sensible thing to do would be to hook the disk devices into suspend and have them flush their queue and possibly send a CACHE_FLUSH command. That would provide more of a guarantee for you, and less of a cost for Len, would it not? The sd driver already does this. Sorry -- I meant that it does send a SYNCHRONIZE CACHE command. It doesn't flush the I/O queue. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, May 15, 2015 at 8:59 AM, Rafael J. Wysocki wrote: > On Fri, May 15, 2015 at 2:40 AM, Ming Lei wrote: >> On Fri, May 15, 2015 at 8:34 AM, Rafael J. Wysocki >> wrote: >>> On Friday, May 15, 2015 09:54:26 AM Dave Chinner wrote: ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote: > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner > wrote: > > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote: > > > From: Len Brown > > > > > > Remove sys_sync() from the kernel's suspend flow. > > > > > > sys_sync() is extremely expensive in some configurations, > > > and so the kernel should not force users to pay this cost > > > on every suspend. > > > > Since when? Please explain what your use case is that makes this > > so prohibitively expensive it needs to be removed. > > > > > > > > The user-space utilities s2ram and s2disk choose to invoke sync() > > > today. > > > A user can invoke suspend directly via /sys/power/state to skip that > > > cost. > > > > So, you want to have s2disk write all the dirty pages in memory to > > the suspend image, rather than to the filesystem? > > > > Either way you have to write that dirty data to disk, but if you > > write it to the suspend image, it then has to be loaded again on > > resume, and then written again to the filesystem the system has > > resumed. This doesn't seem very efficient to me > > > > And, quite frankly, machines fail to resume from suspne dall the > > time. e.g. run out of batteries when they are under s2ram > > conditions, or s2disk fails because a kernel upgrade was done before > > the s2disk and so can't be resumed. With your change, users lose all > > the data that was buffered in memory before suspend, whereas right > > now it is written to disk and so nothing is lost if the resume from > > suspend fails for whatever reason. > > > > IOWs, I can see several good reasons why the sys_sync() needs to > > remain in the suspend code. User data safety and filesystem > > integrity is far, far more important than a couple of seconds > > improvement in suspend speed > > To be honest, this sounds like superstition and fear, not science and > fact. > > "filesystem integrity" is not an issue for the fast majority of > filesystems > which use journalling to ensure continued integrity even after a crash. > I > think even XFS does that :-) It has nothing to do with journalling, and everything to do with bring filesystems to an *idle state* before suspend runs. We have a long history of bug reports with XFS that go: suspend, resume, XFS almost immediately detects corruption, shuts down. The problem is that "sync" doesn't make the filesystem idle - XFs has *lots* of background work going on, and if we aren't *real careful* the filesystem is still doing work while the hardware gets powerd down and the suspend image is being taken. the result is on resume that the on-disk filesystem state does not match the memory image pulled back from resume, and we get shutdowns. sys_sync() does not guarantee a filesystem is idle - it guarantees the data in memory is recoverable, butit doesn't stop the filesystem from doing things like writing back metadata or running background cleaup tasks. If those aren't stopped properly, then we get into the state where in-memory and on-disk state get out of whack. And s2ram can have these problems too, because if there is IO in flight when the hardware is powered down, that IO is lost Every time some piece of generic infrastructure changes behaviour w.r.t. suspend/resume, we get a new set of problems being reported by users. It's extremely hard to test for these problems and it might take months of occasional corruption reports from a user to isolate it to being a suspend/resume problem. It's a game of whack-a-mole, because quite often they come down to the fact that something changed and nobody in the XFS world knew they had to now set an different initialisation flag on some structure or workqueue to make it work the way it needed to work. Go back an look at the history of sys_sync() in suspend discussions over the past 10 years. You'll find me saying exactly the same thing again and again about sys_sync(): it does not guarantee the filesystem is in an idle or coherent, unchanging state, and nothing in the suspend code tells the filesystem to enter an idle or frozen state. We actually have mechanisms for doing this - we use it in the storage layers to idle the filesystem while we do things like *take a snapshot*. What is the mechanism suspend to
Re: [PATCH 1/1] suspend: delete sys_sync()
On Fri, 15 May 2015 09:54:26 +1000 Dave Chinner wrote: > ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote: > > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner wrote: > > > > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote: > > > > From: Len Brown > > > > > > > > Remove sys_sync() from the kernel's suspend flow. > > > > > > > > sys_sync() is extremely expensive in some configurations, > > > > and so the kernel should not force users to pay this cost > > > > on every suspend. > > > > > > Since when? Please explain what your use case is that makes this > > > so prohibitively expensive it needs to be removed. > > > > > > > > > > > The user-space utilities s2ram and s2disk choose to invoke sync() today. > > > > A user can invoke suspend directly via /sys/power/state to skip that > > > > cost. > > > > > > So, you want to have s2disk write all the dirty pages in memory to > > > the suspend image, rather than to the filesystem? > > > > > > Either way you have to write that dirty data to disk, but if you > > > write it to the suspend image, it then has to be loaded again on > > > resume, and then written again to the filesystem the system has > > > resumed. This doesn't seem very efficient to me > > > > > > And, quite frankly, machines fail to resume from suspne dall the > > > time. e.g. run out of batteries when they are under s2ram > > > conditions, or s2disk fails because a kernel upgrade was done before > > > the s2disk and so can't be resumed. With your change, users lose all > > > the data that was buffered in memory before suspend, whereas right > > > now it is written to disk and so nothing is lost if the resume from > > > suspend fails for whatever reason. > > > > > > IOWs, I can see several good reasons why the sys_sync() needs to > > > remain in the suspend code. User data safety and filesystem > > > integrity is far, far more important than a couple of seconds > > > improvement in suspend speed > > > > To be honest, this sounds like superstition and fear, not science and fact. > > > > "filesystem integrity" is not an issue for the fast majority of filesystems > > which use journalling to ensure continued integrity even after a crash. I > > think even XFS does that :-) > > It has nothing to do with journalling, and everything to do with > bring filesystems to an *idle state* before suspend runs. We have a > long history of bug reports with XFS that go: suspend, resume, XFS > almost immediately detects corruption, shuts down. > > The problem is that "sync" doesn't make the filesystem idle - XFs > has *lots* of background work going on, and if we aren't *real > careful* the filesystem is still doing work while the hardware gets > powerd down and the suspend image is being taken. the result is on > resume that the on-disk filesystem state does not match the memory > image pulled back from resume, and we get shutdowns. > > sys_sync() does not guarantee a filesystem is idle - it guarantees > the data in memory is recoverable, butit doesn't stop the filesystem > from doing things like writing back metadata or running background > cleaup tasks. If those aren't stopped properly, then we get into > the state where in-memory and on-disk state get out of whack. And > s2ram can have these problems too, because if there is IO in flight > when the hardware is powered down, that IO is lost This seems to be the nub of your complaint - yes? Some storage devices don't handle suspend as well as they should and lose requests resulting in corruption. They should obviously be fixed, but it is you who gets the problem reports and you are not in a position to fix them. So you want a general solution that hides those problems. sys_sync at suspend time is a sort-of solution because it flushes and waits so there is less in-flight IO immediately after a sys_sync and so less opportunity for a bad device to stuff up. But you seem to suggest that sys_sync isn't a complete solution and it doesn't guarantee that xfs is not doing some background metadata IO. Maybe a sensible thing to do would be to hook the "disk" devices into suspend and have them flush their queue and possibly send a CACHE_FLUSH command. That would provide more of a guarantee for you, and less of a cost for Len, would it not? Thanks, NeilBrown > > Every time some piece of generic infrastructure changes behaviour > w.r.t. suspend/resume, we get a new set of problems being reported > by users. It's extremely hard to test for these problems and it > might take months of occasional corruption reports from a user to > isolate it to being a suspend/resume problem. It's a game of > whack-a-mole, because quite often they come down to the fact that > something changed and nobody in the XFS world knew they had to now > set an different initialisation flag on some structure or workqueue > to make it work the way it needed to work. > > Go back an look at the history of sys_sync() in suspend discussions >