subject:"\[PATCH 1\/1\] suspend\: delete sys

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-08-04 Thread Pavel Machek

On Fri 2015-07-10 01:22:55, Rafael J. Wysocki wrote:
> On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote:
> > On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote:
> > 
> > > Nothing and I'm not discussing that (I've said that already at least once 
> > > in
> > > this thread).
> > > 
> > > What I'm questioning is the "why" of calling sys_sync() from the kernel.
> > 
> > That's strictly speaking two questions
> > 
> > 1. Why do it in the kernel
> > 
> > That is easy. It closes the window of a race condition.
> > 
> > 2. Why do it at all
> > 
> > In essence because the system becomes inactive. For example we say that
> > data hits the disk after 30s maximum. We cannot meet such a limit unless
> > we sync.
> 
> Absolute deadlines are not guaranteed to be met at all in general when
> system suspend is used, at least from the user space perspective, so the
> above is quite a bit of an overstretch.

Well, this particular deadline _was_ guaranteed before Len's patch,
and it is useful one, too:

"if notebook is not on, it is ok to pull the USB stick".

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-08-04 Thread Pavel Machek

On Fri 2015-07-10 01:22:55, Rafael J. Wysocki wrote:
 On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote:
  On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote:
  
   Nothing and I'm not discussing that (I've said that already at least once 
   in
   this thread).
   
   What I'm questioning is the why of calling sys_sync() from the kernel.
  
  That's strictly speaking two questions
  
  1. Why do it in the kernel
  
  That is easy. It closes the window of a race condition.
  
  2. Why do it at all
  
  In essence because the system becomes inactive. For example we say that
  data hits the disk after 30s maximum. We cannot meet such a limit unless
  we sync.
 
 Absolute deadlines are not guaranteed to be met at all in general when
 system suspend is used, at least from the user space perspective, so the
 above is quite a bit of an overstretch.

Well, this particular deadline _was_ guaranteed before Len's patch,
and it is useful one, too:

if notebook is not on, it is ok to pull the USB stick.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-09 Thread Rafael J. Wysocki

On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote:
> On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote:
> 
> > Nothing and I'm not discussing that (I've said that already at least once in
> > this thread).
> > 
> > What I'm questioning is the "why" of calling sys_sync() from the kernel.
> 
> That's strictly speaking two questions
> 
> 1. Why do it in the kernel
> 
> That is easy. It closes the window of a race condition.
> 
> 2. Why do it at all
> 
> In essence because the system becomes inactive. For example we say that
> data hits the disk after 30s maximum. We cannot meet such a limit unless
> we sync.

Absolute deadlines are not guaranteed to be met at all in general when
system suspend is used, at least from the user space perspective, so the
above is quite a bit of an overstretch.

> There are additional issues, such as a system appearing inactive during
> suspension and frankly the far greater likelihood of a crash.

I prefer the Alan's point, to be honest: it's a tradeoff between some extra
safety and some extra suspend overhead.  Unfortunately, though, we don't have
any numbers allowing us to estimate how much extra safety it provides and so
to justify it quantitatively, so to speak.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-09 Thread Oliver Neukum

On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote:

> Nothing and I'm not discussing that (I've said that already at least once in
> this thread).
> 
> What I'm questioning is the "why" of calling sys_sync() from the kernel.

That's strictly speaking two questions

1. Why do it in the kernel

That is easy. It closes the window of a race condition.

2. Why do it at all

In essence because the system becomes inactive. For example we say that
data hits the disk after 30s maximum. We cannot meet such a limit unless
we sync.
There are additional issues, such as a system appearing inactive during
suspension and frankly the far greater likelihood of a crash.

Regards
Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-09 Thread Oliver Neukum

On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote:

 Nothing and I'm not discussing that (I've said that already at least once in
 this thread).
 
 What I'm questioning is the why of calling sys_sync() from the kernel.

That's strictly speaking two questions

1. Why do it in the kernel

That is easy. It closes the window of a race condition.

2. Why do it at all

In essence because the system becomes inactive. For example we say that
data hits the disk after 30s maximum. We cannot meet such a limit unless
we sync.
There are additional issues, such as a system appearing inactive during
suspension and frankly the far greater likelihood of a crash.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-09 Thread Rafael J. Wysocki

On Thursday, July 09, 2015 09:32:51 AM Oliver Neukum wrote:
 On Thu, 2015-07-09 at 00:03 +0200, Rafael J. Wysocki wrote:
 
  Nothing and I'm not discussing that (I've said that already at least once in
  this thread).
  
  What I'm questioning is the why of calling sys_sync() from the kernel.
 
 That's strictly speaking two questions
 
 1. Why do it in the kernel
 
 That is easy. It closes the window of a race condition.
 
 2. Why do it at all
 
 In essence because the system becomes inactive. For example we say that
 data hits the disk after 30s maximum. We cannot meet such a limit unless
 we sync.

Absolute deadlines are not guaranteed to be met at all in general when
system suspend is used, at least from the user space perspective, so the
above is quite a bit of an overstretch.

 There are additional issues, such as a system appearing inactive during
 suspension and frankly the far greater likelihood of a crash.

I prefer the Alan's point, to be honest: it's a tradeoff between some extra
safety and some extra suspend overhead.  Unfortunately, though, we don't have
any numbers allowing us to estimate how much extra safety it provides and so
to justify it quantitatively, so to speak.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Rafael J. Wysocki

On Wednesday, July 08, 2015 10:40:00 AM Alan Stern wrote:
> On Wed, 8 Jul 2015, Pavel Machek wrote:
> 
> > > well, that depends on what the purpose of the sync is supposed to be.
> > > 
> > > If it is there to prevent users from corrupting their filesystems as a 
> > > result
> > > of a mistake, it is insufficient.  If it's there for other reasons, I'm 
> > > wondering
> > > what those reasons are (on systems that suspend and resume reliably, 
> > > because the
> > > original reason to put it in there was to reduce the damage from 
> > > suspend/resume
> > > crashes).
> > 
> > I put it there, and there were more reasons than "crashes" to put it
> > there.
> > 
> > 1) crashes.
> > 
> > 2) battery is quite likely to run out in suspended machine.
> > 
> > 3) if someone pulls the stick and puts it in other machine, I wanted
> > consistent filesystem at least after journal replay.
> 
> I was going to make the same points.
> 
> From my point of view, whether to issue a sync is a tradeoff.  I can't
> remember any time in the last several years where lack of a sync would
> have caused a problem for my computers, but the possibility still
> exists.
> 
> So on one hand, issuing the sync can help prevent a low-probability 
> problem.  On the other hand, issuing the sync takes a small amount of 
> time (negligible for my purposes but not negligible for Len and 
> others).
> 
> I prefer to pay a very small cost to prevent a low-probability problem.  
> Others may not want to pay, because to them the cost is larger or the 
> probability is lower.
> 
> _That_ is the justification for not eliminating the sync completely but 
> making it optional.

Agreed.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Rafael J. Wysocki

On Wednesday, July 08, 2015 09:51:13 AM Oliver Neukum wrote:
> On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote:
> >  
> > > That is a tough nut. But that's not a reason to make it worse.
> > > I'd say there's no reason not to use a secondary interface to
> > > suspend without syncing or to extend or introduce such an interface
> > > if the API is deficient.
> > 
> > Well, the point here is that the sync we have doesn't prevent all 
> > potentially
> > possible bad things from happening.  It's a partial measure at best in that
> > respect.
> 
> Well, removed hardware doesn't work. That is a very basic limitation.
> But can we guarantee that any returned syscall actually wrote to disk?
> Yes, but it must be done in kernel space. So doing a sync has a true
> benefit.
> I don't see why you would want to generally remove it. What is wrong
> with an interface allowing a selection there to those who don't care
> about additional guarantees?

Nothing and I'm not discussing that (I've said that already at least once in
this thread).

What I'm questioning is the "why" of calling sys_sync() from the kernel.

If we had a good answer to why we do that to start with, the whole discussion
wouldn't be necessary.

So the answer I'm getting from this thread so far is something like "It is a
safety measure to prevent users from losing data if they pull removable storage
out of the system while suspended".

Your point about the returned syscall guaranees is a good one, but still.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Alan Stern

On Wed, 8 Jul 2015, Pavel Machek wrote:

> > well, that depends on what the purpose of the sync is supposed to be.
> > 
> > If it is there to prevent users from corrupting their filesystems as a 
> > result
> > of a mistake, it is insufficient.  If it's there for other reasons, I'm 
> > wondering
> > what those reasons are (on systems that suspend and resume reliably, 
> > because the
> > original reason to put it in there was to reduce the damage from 
> > suspend/resume
> > crashes).
> 
> I put it there, and there were more reasons than "crashes" to put it
> there.
> 
> 1) crashes.
> 
> 2) battery is quite likely to run out in suspended machine.
> 
> 3) if someone pulls the stick and puts it in other machine, I wanted
> consistent filesystem at least after journal replay.

I was going to make the same points.

>From my point of view, whether to issue a sync is a tradeoff.  I can't
remember any time in the last several years where lack of a sync would
have caused a problem for my computers, but the possibility still
exists.

So on one hand, issuing the sync can help prevent a low-probability 
problem.  On the other hand, issuing the sync takes a small amount of 
time (negligible for my purposes but not negligible for Len and 
others).

I prefer to pay a very small cost to prevent a low-probability problem.  
Others may not want to pay, because to them the cost is larger or the 
probability is lower.

_That_ is the justification for not eliminating the sync completely but 
making it optional.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Pavel Machek

On Wed 2015-07-08 00:20:05, Rafael J. Wysocki wrote:
> On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote:
> > On Tue, 7 Jul 2015, Oliver Neukum wrote:
> > 
> > > > he (or she) pulls the storage device out of the system, moves it to 
> > > > another
> > > > system, makes changes (say removes the file written to by the process 
> > > > above,
> > > > so the blocks previously occupied by that file are now used for some 
> > > > metadata)
> > > > and moves the storage back to the suspended system.  The system is 
> > > > resumed
> > > > and the writing process continues writing possibly to the wrong blocks 
> > > > and
> > > > corrupts the filesystem.
> > > 
> > > That is a tough nut. But that's not a reason to make it worse.
> > > I'd say there's no reason not to use a secondary interface to
> > > suspend without syncing or to extend or introduce such an interface
> > > if the API is deficient.
> > 
> > Indeed, the problem Rafael outlined always exists whether or not the
> > kernel does a sync.  Even if no I/O is in progress when the system goes
> > to sleep, if the user moves a portable storage device with a mounted
> > filesystem to another computer and updates it before waking the system
> > up, corruption is highly likely.
> > 
> > In principle this could be solved by adding suspend/resume callbacks to
> > filesystems.  For example, the resume callback could verify that the
> > superblock had not been changed since the suspend occurred.  Or there 
> > could be some other simple way of determining that the filesystem had 
> > not been remounted and changed.
> > 
> > Either way, this is irrelevant to the question of whether the kernel 
> > should issue a sync when suspending.
> 
> well, that depends on what the purpose of the sync is supposed to be.
> 
> If it is there to prevent users from corrupting their filesystems as a result
> of a mistake, it is insufficient.  If it's there for other reasons, I'm 
> wondering
> what those reasons are (on systems that suspend and resume reliably, because 
> the
> original reason to put it in there was to reduce the damage from 
> suspend/resume
> crashes).

I put it there, and there were more reasons than "crashes" to put it
there.

1) crashes.

2) battery is quite likely to run out in suspended machine.

3) if someone pulls the stick and puts it in other machine, I wanted
consistent filesystem at least after journal replay.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Pavel Machek

On Tue 2015-07-07 16:32:08, Rafael J. Wysocki wrote:
> On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
> > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
> > > For example, on desktop systems I use user space syncs filesystems
> > > before
> > > writing to /sys/power/state, so the additional sys_sync() in the
> > > kernel doesn't
> > > seem to serve any purpose.
> > 
> > There is a race you cannot close in user space.
> 
> Yes, there is, but I'm not sure how much of a help the sync in the kernel
> provides here anyway.
> 
> Say this happens.  There is a process writing to a file running in parallel
> with the suspend process.  Suspend starts and that process is frozen.  The
> sync is called and causes all of the outstanding data to be written back.
> The user doesn't realize that the write is technically still in progress, so
> he (or she) pulls the storage device out of the system, moves it to another
> system, makes changes (say removes the file written to by the process above,
> so the blocks previously occupied by that file are now used for some metadata)
> and moves the storage back to the suspended system.  The system is resumed
> and the writing process continues writing possibly to the wrong blocks and
> corrupts the filesystem.
> 
> Is this possible?  If not, why not?

Of course it is possible; don't do it.

Some warnings, first.

 * BIG FAT WARNING
 *
 *
 * If you touch anything on disk between suspend and resume...
 *  ...kiss your data goodbye.

Correct option is to set up machine so that USB unplug awakes it. Or
assume USB was always removed during suspend (we actually have an
option for that).

Still, if you don't see difference from pulling USB from running
machine, and from "off" machine.. I do. And most users I know do. And
some of my machines don't indicate whether they are "off" or "sleeping".

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Oliver Neukum

On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote:
> On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote:
>  
> > That is a tough nut. But that's not a reason to make it worse.
> > I'd say there's no reason not to use a secondary interface to
> > suspend without syncing or to extend or introduce such an interface
> > if the API is deficient.
> 
> Well, the point here is that the sync we have doesn't prevent all potentially
> possible bad things from happening.  It's a partial measure at best in that
> respect.

Well, removed hardware doesn't work. That is a very basic limitation.
But can we guarantee that any returned syscall actually wrote to disk?
Yes, but it must be done in kernel space. So doing a sync has a true
benefit.
I don't see why you would want to generally remove it. What is wrong
with an interface allowing a selection there to those who don't care
about additional guarantees?

Regards
Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Oliver Neukum

On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote:
 On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote:
  
  That is a tough nut. But that's not a reason to make it worse.
  I'd say there's no reason not to use a secondary interface to
  suspend without syncing or to extend or introduce such an interface
  if the API is deficient.
 
 Well, the point here is that the sync we have doesn't prevent all potentially
 possible bad things from happening.  It's a partial measure at best in that
 respect.

Well, removed hardware doesn't work. That is a very basic limitation.
But can we guarantee that any returned syscall actually wrote to disk?
Yes, but it must be done in kernel space. So doing a sync has a true
benefit.
I don't see why you would want to generally remove it. What is wrong
with an interface allowing a selection there to those who don't care
about additional guarantees?

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Pavel Machek

On Tue 2015-07-07 16:32:08, Rafael J. Wysocki wrote:
 On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
  On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
   For example, on desktop systems I use user space syncs filesystems
   before
   writing to /sys/power/state, so the additional sys_sync() in the
   kernel doesn't
   seem to serve any purpose.
  
  There is a race you cannot close in user space.
 
 Yes, there is, but I'm not sure how much of a help the sync in the kernel
 provides here anyway.
 
 Say this happens.  There is a process writing to a file running in parallel
 with the suspend process.  Suspend starts and that process is frozen.  The
 sync is called and causes all of the outstanding data to be written back.
 The user doesn't realize that the write is technically still in progress, so
 he (or she) pulls the storage device out of the system, moves it to another
 system, makes changes (say removes the file written to by the process above,
 so the blocks previously occupied by that file are now used for some metadata)
 and moves the storage back to the suspended system.  The system is resumed
 and the writing process continues writing possibly to the wrong blocks and
 corrupts the filesystem.
 
 Is this possible?  If not, why not?

Of course it is possible; don't do it.

Some warnings, first.

 * BIG FAT WARNING
 *
 *
 * If you touch anything on disk between suspend and resume...
 *  ...kiss your data goodbye.

Correct option is to set up machine so that USB unplug awakes it. Or
assume USB was always removed during suspend (we actually have an
option for that).

Still, if you don't see difference from pulling USB from running
machine, and from off machine.. I do. And most users I know do. And
some of my machines don't indicate whether they are off or sleeping.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Pavel Machek

On Wed 2015-07-08 00:20:05, Rafael J. Wysocki wrote:
 On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote:
  On Tue, 7 Jul 2015, Oliver Neukum wrote:
  
he (or she) pulls the storage device out of the system, moves it to 
another
system, makes changes (say removes the file written to by the process 
above,
so the blocks previously occupied by that file are now used for some 
metadata)
and moves the storage back to the suspended system.  The system is 
resumed
and the writing process continues writing possibly to the wrong blocks 
and
corrupts the filesystem.
   
   That is a tough nut. But that's not a reason to make it worse.
   I'd say there's no reason not to use a secondary interface to
   suspend without syncing or to extend or introduce such an interface
   if the API is deficient.
  
  Indeed, the problem Rafael outlined always exists whether or not the
  kernel does a sync.  Even if no I/O is in progress when the system goes
  to sleep, if the user moves a portable storage device with a mounted
  filesystem to another computer and updates it before waking the system
  up, corruption is highly likely.
  
  In principle this could be solved by adding suspend/resume callbacks to
  filesystems.  For example, the resume callback could verify that the
  superblock had not been changed since the suspend occurred.  Or there 
  could be some other simple way of determining that the filesystem had 
  not been remounted and changed.
  
  Either way, this is irrelevant to the question of whether the kernel 
  should issue a sync when suspending.
 
 well, that depends on what the purpose of the sync is supposed to be.
 
 If it is there to prevent users from corrupting their filesystems as a result
 of a mistake, it is insufficient.  If it's there for other reasons, I'm 
 wondering
 what those reasons are (on systems that suspend and resume reliably, because 
 the
 original reason to put it in there was to reduce the damage from 
 suspend/resume
 crashes).

I put it there, and there were more reasons than crashes to put it
there.

1) crashes.

2) battery is quite likely to run out in suspended machine.

3) if someone pulls the stick and puts it in other machine, I wanted
consistent filesystem at least after journal replay.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Alan Stern

On Wed, 8 Jul 2015, Pavel Machek wrote:

  well, that depends on what the purpose of the sync is supposed to be.
  
  If it is there to prevent users from corrupting their filesystems as a 
  result
  of a mistake, it is insufficient.  If it's there for other reasons, I'm 
  wondering
  what those reasons are (on systems that suspend and resume reliably, 
  because the
  original reason to put it in there was to reduce the damage from 
  suspend/resume
  crashes).
 
 I put it there, and there were more reasons than crashes to put it
 there.
 
 1) crashes.
 
 2) battery is quite likely to run out in suspended machine.
 
 3) if someone pulls the stick and puts it in other machine, I wanted
 consistent filesystem at least after journal replay.

I was going to make the same points.

From my point of view, whether to issue a sync is a tradeoff.  I can't
remember any time in the last several years where lack of a sync would
have caused a problem for my computers, but the possibility still
exists.

So on one hand, issuing the sync can help prevent a low-probability 
problem.  On the other hand, issuing the sync takes a small amount of 
time (negligible for my purposes but not negligible for Len and 
others).

I prefer to pay a very small cost to prevent a low-probability problem.  
Others may not want to pay, because to them the cost is larger or the 
probability is lower.

_That_ is the justification for not eliminating the sync completely but 
making it optional.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Rafael J. Wysocki

On Wednesday, July 08, 2015 09:51:13 AM Oliver Neukum wrote:
 On Wed, 2015-07-08 at 00:11 +0200, Rafael J. Wysocki wrote:
  On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote:
   
   That is a tough nut. But that's not a reason to make it worse.
   I'd say there's no reason not to use a secondary interface to
   suspend without syncing or to extend or introduce such an interface
   if the API is deficient.
  
  Well, the point here is that the sync we have doesn't prevent all 
  potentially
  possible bad things from happening.  It's a partial measure at best in that
  respect.
 
 Well, removed hardware doesn't work. That is a very basic limitation.
 But can we guarantee that any returned syscall actually wrote to disk?
 Yes, but it must be done in kernel space. So doing a sync has a true
 benefit.
 I don't see why you would want to generally remove it. What is wrong
 with an interface allowing a selection there to those who don't care
 about additional guarantees?

Nothing and I'm not discussing that (I've said that already at least once in
this thread).

What I'm questioning is the why of calling sys_sync() from the kernel.

If we had a good answer to why we do that to start with, the whole discussion
wouldn't be necessary.

So the answer I'm getting from this thread so far is something like It is a
safety measure to prevent users from losing data if they pull removable storage
out of the system while suspended.

Your point about the returned syscall guaranees is a good one, but still.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-08 Thread Rafael J. Wysocki

On Wednesday, July 08, 2015 10:40:00 AM Alan Stern wrote:
 On Wed, 8 Jul 2015, Pavel Machek wrote:
 
   well, that depends on what the purpose of the sync is supposed to be.
   
   If it is there to prevent users from corrupting their filesystems as a 
   result
   of a mistake, it is insufficient.  If it's there for other reasons, I'm 
   wondering
   what those reasons are (on systems that suspend and resume reliably, 
   because the
   original reason to put it in there was to reduce the damage from 
   suspend/resume
   crashes).
  
  I put it there, and there were more reasons than crashes to put it
  there.
  
  1) crashes.
  
  2) battery is quite likely to run out in suspended machine.
  
  3) if someone pulls the stick and puts it in other machine, I wanted
  consistent filesystem at least after journal replay.
 
 I was going to make the same points.
 
 From my point of view, whether to issue a sync is a tradeoff.  I can't
 remember any time in the last several years where lack of a sync would
 have caused a problem for my computers, but the possibility still
 exists.
 
 So on one hand, issuing the sync can help prevent a low-probability 
 problem.  On the other hand, issuing the sync takes a small amount of 
 time (negligible for my purposes but not negligible for Len and 
 others).
 
 I prefer to pay a very small cost to prevent a low-probability problem.  
 Others may not want to pay, because to them the cost is larger or the 
 probability is lower.
 
 _That_ is the justification for not eliminating the sync completely but 
 making it optional.

Agreed.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote:
> On Tue, 7 Jul 2015, Oliver Neukum wrote:
> 
> > > he (or she) pulls the storage device out of the system, moves it to 
> > > another
> > > system, makes changes (say removes the file written to by the process 
> > > above,
> > > so the blocks previously occupied by that file are now used for some 
> > > metadata)
> > > and moves the storage back to the suspended system.  The system is resumed
> > > and the writing process continues writing possibly to the wrong blocks and
> > > corrupts the filesystem.
> > 
> > That is a tough nut. But that's not a reason to make it worse.
> > I'd say there's no reason not to use a secondary interface to
> > suspend without syncing or to extend or introduce such an interface
> > if the API is deficient.
> 
> Indeed, the problem Rafael outlined always exists whether or not the
> kernel does a sync.  Even if no I/O is in progress when the system goes
> to sleep, if the user moves a portable storage device with a mounted
> filesystem to another computer and updates it before waking the system
> up, corruption is highly likely.
> 
> In principle this could be solved by adding suspend/resume callbacks to
> filesystems.  For example, the resume callback could verify that the
> superblock had not been changed since the suspend occurred.  Or there 
> could be some other simple way of determining that the filesystem had 
> not been remounted and changed.
> 
> Either way, this is irrelevant to the question of whether the kernel 
> should issue a sync when suspending.

well, that depends on what the purpose of the sync is supposed to be.

If it is there to prevent users from corrupting their filesystems as a result
of a mistake, it is insufficient.  If it's there for other reasons, I'm 
wondering
what those reasons are (on systems that suspend and resume reliably, because the
original reason to put it in there was to reduce the damage from suspend/resume
crashes).

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote:
> On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
> > > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
> > > > For example, on desktop systems I use user space syncs filesystems
> > > > before
> > > > writing to /sys/power/state, so the additional sys_sync() in the
> > > > kernel doesn't
> > > > seem to serve any purpose.
> > > 
> > > There is a race you cannot close in user space.
> > 
> > Yes, there is, but I'm not sure how much of a help the sync in the kernel
> > provides here anyway.
> > 
> > Say this happens.  There is a process writing to a file running in parallel
> > with the suspend process.  Suspend starts and that process is frozen.  The
> > sync is called and causes all of the outstanding data to be written back.
> > The user doesn't realize that the write is technically still in progress, so
> 
> Well, in that case the user never got the feedback that the write is
> finished. That is a race that always exists, like sending SIGKILL to a
> running task.
> What you describe is in principle unsolvable every time under
> any circumstances.
> 
> > he (or she) pulls the storage device out of the system, moves it to another
> > system, makes changes (say removes the file written to by the process above,
> > so the blocks previously occupied by that file are now used for some 
> > metadata)
> > and moves the storage back to the suspended system.  The system is resumed
> > and the writing process continues writing possibly to the wrong blocks and
> > corrupts the filesystem.
> 
> That is a tough nut. But that's not a reason to make it worse.
> I'd say there's no reason not to use a secondary interface to
> suspend without syncing or to extend or introduce such an interface
> if the API is deficient.

Well, the point here is that the sync we have doesn't prevent all potentially
possible bad things from happening.  It's a partial measure at best in that
respect.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Alan Stern

On Tue, 7 Jul 2015, Oliver Neukum wrote:

> > he (or she) pulls the storage device out of the system, moves it to another
> > system, makes changes (say removes the file written to by the process above,
> > so the blocks previously occupied by that file are now used for some 
> > metadata)
> > and moves the storage back to the suspended system.  The system is resumed
> > and the writing process continues writing possibly to the wrong blocks and
> > corrupts the filesystem.
> 
> That is a tough nut. But that's not a reason to make it worse.
> I'd say there's no reason not to use a secondary interface to
> suspend without syncing or to extend or introduce such an interface
> if the API is deficient.

Indeed, the problem Rafael outlined always exists whether or not the
kernel does a sync.  Even if no I/O is in progress when the system goes
to sleep, if the user moves a portable storage device with a mounted
filesystem to another computer and updates it before waking the system
up, corruption is highly likely.

In principle this could be solved by adding suspend/resume callbacks to
filesystems.  For example, the resume callback could verify that the
superblock had not been changed since the suspend occurred.  Or there 
could be some other simple way of determining that the filesystem had 
not been remounted and changed.

Either way, this is irrelevant to the question of whether the kernel 
should issue a sync when suspending.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Oliver Neukum

On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote:
> On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
> > On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
> > > For example, on desktop systems I use user space syncs filesystems
> > > before
> > > writing to /sys/power/state, so the additional sys_sync() in the
> > > kernel doesn't
> > > seem to serve any purpose.
> > 
> > There is a race you cannot close in user space.
> 
> Yes, there is, but I'm not sure how much of a help the sync in the kernel
> provides here anyway.
> 
> Say this happens.  There is a process writing to a file running in parallel
> with the suspend process.  Suspend starts and that process is frozen.  The
> sync is called and causes all of the outstanding data to be written back.
> The user doesn't realize that the write is technically still in progress, so

Well, in that case the user never got the feedback that the write is
finished. That is a race that always exists, like sending SIGKILL to a
running task.
What you describe is in principle unsolvable every time under
any circumstances.

> he (or she) pulls the storage device out of the system, moves it to another
> system, makes changes (say removes the file written to by the process above,
> so the blocks previously occupied by that file are now used for some metadata)
> and moves the storage back to the suspended system.  The system is resumed
> and the writing process continues writing possibly to the wrong blocks and
> corrupts the filesystem.

That is a tough nut. But that's not a reason to make it worse.
I'd say there's no reason not to use a secondary interface to
suspend without syncing or to extend or introduce such an interface
if the API is deficient.

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
> On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
> > For example, on desktop systems I use user space syncs filesystems
> > before
> > writing to /sys/power/state, so the additional sys_sync() in the
> > kernel doesn't
> > seem to serve any purpose.
> 
> There is a race you cannot close in user space.

Yes, there is, but I'm not sure how much of a help the sync in the kernel
provides here anyway.

Say this happens.  There is a process writing to a file running in parallel
with the suspend process.  Suspend starts and that process is frozen.  The
sync is called and causes all of the outstanding data to be written back.
The user doesn't realize that the write is technically still in progress, so
he (or she) pulls the storage device out of the system, moves it to another
system, makes changes (say removes the file written to by the process above,
so the blocks previously occupied by that file are now used for some metadata)
and moves the storage back to the suspended system.  The system is resumed
and the writing process continues writing possibly to the wrong blocks and
corrupts the filesystem.

Is this possible?  If not, why not?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Takashi Iwai

At Tue, 7 Jul 2015 11:17:56 +1000,
Dave Chinner wrote:
> 
> > > > Moreover, question is if we really need to carry out the sync on *every*
> > > > suspend even if it is not pointless overall.  That shouldn't really be
> > > > necessary if we suspend and resume often enough or if we resume only for
> > > > a while and then suspend again.  Maybe it should be rate limited somehow
> > > > at least?
> > > 
> > > If you suspend and resume frequently, then the cost of the sync
> > > shoul dbe negliable because the amount of data dirtied between
> > > resume/suspend shoul dbe negliable. hence my questions about where
> > > sync is spending too much time, and whether we've actually fixed
> > > those problems or not. If sync speed on clean filesystems is a
> > > problem then we need to fix sync, not work around it.
> > 
> > Well, say you never suspend, but you also only shut down the system when you
> > need to replace the kernel on it.  How often would you invoke global sync on
> > that system?
> 
> Never, because:
> 
>   - the kernel does background writeback of dirty data so you
> don't need to run sync while the system is running; and
>   - unmount during shutdown runs sync_filesystem() internally
> (just like sys_sync does) to ensure the filesystem is
> clean and no data is lost.
> 
> Seriously, stop being making ignorant arguments to justify removing
> sys_sync(). *If* there's a problem sys_sync() we need to *fix it*,
> not ignore it.

This thread remembered me of an old problem I faced: suspend fails
while copying a large file.  So I tried again with 4.2-rc1.

I put a slow USB stick with ext4, started copying a 4GB ISO file, and
triggered the suspend.  It failed.

Then removed sys_sync(), and retested.  It still failed.

The log shows like:

 PM: Preparing system for sleep (freeze)
 Freezing user space processes ... (elapsed 0.002 seconds) done.
 Freezing remaining freezable tasks ... 
 Freezing of tasks failed after 20.003 seconds (0 tasks refusing to freeze, 
wq_busy=1):
 Restarting kernel threads ... done.
 Restarting tasks ... done.

So, removing sys_sync() will save a small amount of time, but it won't
help always...


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Oliver Neukum

On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
> For example, on desktop systems I use user space syncs filesystems
> before
> writing to /sys/power/state, so the additional sys_sync() in the
> kernel doesn't
> seem to serve any purpose.

There is a race you cannot close in user space.

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 12:25:07 PM Pavel Machek wrote:
> On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote:
> > On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote:
> > > On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
> > > > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
> > > > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
> > > > > 
> > > > > > The only argument against dropping sys_sync() from the suspend code 
> > > > > > path
> > > > > > I've seen in this thread that I entirely agree with is that it may 
> > > > > > lead to
> > > > > > regressions, because we've done it practically forever and it may 
> > > > > > hide latent
> > > > > > bugs somewhere in block drivers etc.  Dropping it, though, is the 
> > > > > > only way
> > > > > > to see those bugs, if any, and if we want to ever fix them, we need 
> > > > > > to see
> > > > > > them.  That's why I think that it may be a good idea to allow 
> > > > > > people to
> > > > > > drop it if they are willing to accept some extra risk (via the 
> > > > > > kernel
> > > > > > command line, for example).
> > > > > 
> > > > > I'd be perfectly happy to have the sync selectable at runtime, one 
> > > > > way 
> > > > > or another.  The three most reasonable options seem to be:
> > > > > 
> > > > >   kernel command line
> > > > > 
> > > > >   sysfs file
> > > > > 
> > > > >   sysctl setting
> > > > > 
> > > > > The command line is less flexible (it can't be changed after 
> > > > > booting).  
> > > > > Either of the other two would be fine with me.
> > > > 
> > > > We'll probably use a sysfs file (possibly plus a Kconfig option to set 
> > > > the
> > > > boot time default).
> > > 
> > > Android people can already do sync-less s2ram using existing
> > > interface. IMO they should just do it.
> > > 
> > > In any case, sysfs file + Kconfig is an overkill. We already have too
> > > many Kconfig options.
> > 
> > I don't think we can reach a general agreement on what's the *right* 
> > approach
> > with respect to the sys_sync() in the suspend code path, so the only way out
> > of this situation I can see is to make it configurable.
> 
> So first: not having general agreement does not mean we should
> introduce Kconfig + sysfs file. Second: your proposal of "lets sync if
> runtime was shorter than xxx" is over complex, but at least should not
> need Kconfig support...

The Kconfig part is only about setting the default at compile time, so I'm
not sure why you have any problems with it.


> Third: we have ioctl() based interface, and I guess android should use that
> one; it already has "s2ram without sync" method.

And that is tied to hibernation and therefore useless to people who don't
build that in.


> > > There's not a single Android phone supported by mainline
> > > kernel. I'm sure they have bigger problems than Android setting
> > > default sysfs values...
> > 
> > But perhaps we'd like to change that?
> 
> We'd like to, but lets start with the real hard stuff (merging support
> for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks
> that would be 1-line change, but we pollute code with Kconfig+sysfs
> making it 100..

It is suboptimal, but this is the only way I can see allowing us to make
any forward progress here.

Evidently, some people are not happy with the status quo and they also should
be able to cover their use cases with the mainline kernel, shouldn't they?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 11:17:56 AM Dave Chinner wrote:
> On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote:
> > On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote:
> > > On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
> > > > > No, your observation was that "sync is slow". Your *solution* is "we
> > > > > need to remove sync".
> > > > 
> > > > Not only slow, but pointless too.  The argument goes: "It is slow and
> > > > pointless and so it may be dropped."
> > > > 
> > > > Now, I can agree that it wasn't clearly demonstrated that the 
> > > > unconditional
> > > > sys_sync() in the suspend code path was pointless, but it also has never
> > > > been clearly shown why it is not pointless on systems that suspend and 
> > > > resume
> > > > reliably.
> > > 
> > > I just gave you an example of why sync is needed: Suspend, pull out
> > > USB drive when putting laptop in bag.
> > 
> > Exactly the same thing can happen while not suspended.  What does make 
> > suspend
> > special with respect to that?
> 
> Stop being obtuse.

Stop being rude.

> If you remember that you need to pull the USB
> stick before you suspend, then you damn well remember to "safely
> eject" the USB stick and that syncs, unmounts and flushes the drive
> caches before telling you it is safe to pull the stick.

All of that only means "the kernel needs to protect users from consequences
of silly mistakes" which I'm not sure I agree with.

For example, on desktop systems I use user space syncs filesystems before
writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't
seem to serve any purpose.

> > > > Moreover, question is if we really need to carry out the sync on *every*
> > > > suspend even if it is not pointless overall.  That shouldn't really be
> > > > necessary if we suspend and resume often enough or if we resume only for
> > > > a while and then suspend again.  Maybe it should be rate limited somehow
> > > > at least?
> > > 
> > > If you suspend and resume frequently, then the cost of the sync
> > > shoul dbe negliable because the amount of data dirtied between
> > > resume/suspend shoul dbe negliable. hence my questions about where
> > > sync is spending too much time, and whether we've actually fixed
> > > those problems or not. If sync speed on clean filesystems is a
> > > problem then we need to fix sync, not work around it.
> > 
> > Well, say you never suspend, but you also only shut down the system when you
> > need to replace the kernel on it.  How often would you invoke global sync on
> > that system?
> 
> Never, because:
> 
>   - the kernel does background writeback of dirty data so you
> don't need to run sync while the system is running; and

OK, and that surely happens between suspend-resume cycles too, so why does
the sync in the suspend code path make a difference, really (except for
avoiding the consequences of possible user mistakes)?

>   - unmount during shutdown runs sync_filesystem() internally
> (just like sys_sync does) to ensure the filesystem is
> clean and no data is lost.

And that still will be the case regardless of whether or not suspend is used.

> Seriously, stop being making ignorant arguments to justify removing
> sys_sync(). *If* there's a problem sys_sync() we need to *fix it*,
> not ignore it.

I'm not advocating for dropping the sys_sync().  I'd just have applied the Len's
patch had I thought that had been the way to go.

I'm not agreeing with your argumentation about why the sync is needed, though
(except for the one point that removing it might uncover some latent bugs).

For now, I've seen two arguments: the one about the possible latent bug (that
I agree with) and the one about how users may be hurt if they unplug storage
devices holding unmounted filesystems while suspended (which may be addressed
by a sync in user space just fine).  Are there any other reasons?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Pavel Machek

On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote:
> On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote:
> > On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
> > > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
> > > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
> > > > 
> > > > > The only argument against dropping sys_sync() from the suspend code 
> > > > > path
> > > > > I've seen in this thread that I entirely agree with is that it may 
> > > > > lead to
> > > > > regressions, because we've done it practically forever and it may 
> > > > > hide latent
> > > > > bugs somewhere in block drivers etc.  Dropping it, though, is the 
> > > > > only way
> > > > > to see those bugs, if any, and if we want to ever fix them, we need 
> > > > > to see
> > > > > them.  That's why I think that it may be a good idea to allow people 
> > > > > to
> > > > > drop it if they are willing to accept some extra risk (via the kernel
> > > > > command line, for example).
> > > > 
> > > > I'd be perfectly happy to have the sync selectable at runtime, one way 
> > > > or another.  The three most reasonable options seem to be:
> > > > 
> > > > kernel command line
> > > > 
> > > > sysfs file
> > > > 
> > > > sysctl setting
> > > > 
> > > > The command line is less flexible (it can't be changed after booting).  
> > > > Either of the other two would be fine with me.
> > > 
> > > We'll probably use a sysfs file (possibly plus a Kconfig option to set the
> > > boot time default).
> > 
> > Android people can already do sync-less s2ram using existing
> > interface. IMO they should just do it.
> > 
> > In any case, sysfs file + Kconfig is an overkill. We already have too
> > many Kconfig options.
> 
> I don't think we can reach a general agreement on what's the *right* approach
> with respect to the sys_sync() in the suspend code path, so the only way out
> of this situation I can see is to make it configurable.

So first: not having general agreement does not mean we should
introduce Kconfig + sysfs file. Second: your proposal of "lets sync if
runtime was shorter than xxx" is over complex, but at least should not
need Kconfig support... Third: we have ioctl() based interface, and I
guess android should use that one; it already has "s2ram without sync"
method.

> > There's not a single Android phone supported by mainline
> > kernel. I'm sure they have bigger problems than Android setting
> > default sysfs values...
> 
> But perhaps we'd like to change that?

We'd like to, but lets start with the real hard stuff (merging support
for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks
that would be 1-line change, but we pollute code with Kconfig+sysfs
making it 100..

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 11:03:20 AM Alan Stern wrote:
 On Tue, 7 Jul 2015, Oliver Neukum wrote:
 
   he (or she) pulls the storage device out of the system, moves it to 
   another
   system, makes changes (say removes the file written to by the process 
   above,
   so the blocks previously occupied by that file are now used for some 
   metadata)
   and moves the storage back to the suspended system.  The system is resumed
   and the writing process continues writing possibly to the wrong blocks and
   corrupts the filesystem.
  
  That is a tough nut. But that's not a reason to make it worse.
  I'd say there's no reason not to use a secondary interface to
  suspend without syncing or to extend or introduce such an interface
  if the API is deficient.
 
 Indeed, the problem Rafael outlined always exists whether or not the
 kernel does a sync.  Even if no I/O is in progress when the system goes
 to sleep, if the user moves a portable storage device with a mounted
 filesystem to another computer and updates it before waking the system
 up, corruption is highly likely.
 
 In principle this could be solved by adding suspend/resume callbacks to
 filesystems.  For example, the resume callback could verify that the
 superblock had not been changed since the suspend occurred.  Or there 
 could be some other simple way of determining that the filesystem had 
 not been remounted and changed.
 
 Either way, this is irrelevant to the question of whether the kernel 
 should issue a sync when suspending.

well, that depends on what the purpose of the sync is supposed to be.

If it is there to prevent users from corrupting their filesystems as a result
of a mistake, it is insufficient.  If it's there for other reasons, I'm 
wondering
what those reasons are (on systems that suspend and resume reliably, because the
original reason to put it in there was to reduce the damage from suspend/resume
crashes).

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 04:38:26 PM Oliver Neukum wrote:
 On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote:
  On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
   On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
For example, on desktop systems I use user space syncs filesystems
before
writing to /sys/power/state, so the additional sys_sync() in the
kernel doesn't
seem to serve any purpose.
   
   There is a race you cannot close in user space.
  
  Yes, there is, but I'm not sure how much of a help the sync in the kernel
  provides here anyway.
  
  Say this happens.  There is a process writing to a file running in parallel
  with the suspend process.  Suspend starts and that process is frozen.  The
  sync is called and causes all of the outstanding data to be written back.
  The user doesn't realize that the write is technically still in progress, so
 
 Well, in that case the user never got the feedback that the write is
 finished. That is a race that always exists, like sending SIGKILL to a
 running task.
 What you describe is in principle unsolvable every time under
 any circumstances.
 
  he (or she) pulls the storage device out of the system, moves it to another
  system, makes changes (say removes the file written to by the process above,
  so the blocks previously occupied by that file are now used for some 
  metadata)
  and moves the storage back to the suspended system.  The system is resumed
  and the writing process continues writing possibly to the wrong blocks and
  corrupts the filesystem.
 
 That is a tough nut. But that's not a reason to make it worse.
 I'd say there's no reason not to use a secondary interface to
 suspend without syncing or to extend or introduce such an interface
 if the API is deficient.

Well, the point here is that the sync we have doesn't prevent all potentially
possible bad things from happening.  It's a partial measure at best in that
respect.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 11:17:56 AM Dave Chinner wrote:
 On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote:
  On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote:
   On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
 No, your observation was that sync is slow. Your *solution* is we
 need to remove sync.

Not only slow, but pointless too.  The argument goes: It is slow and
pointless and so it may be dropped.

Now, I can agree that it wasn't clearly demonstrated that the 
unconditional
sys_sync() in the suspend code path was pointless, but it also has never
been clearly shown why it is not pointless on systems that suspend and 
resume
reliably.
   
   I just gave you an example of why sync is needed: Suspend, pull out
   USB drive when putting laptop in bag.
  
  Exactly the same thing can happen while not suspended.  What does make 
  suspend
  special with respect to that?
 
 Stop being obtuse.

Stop being rude.

 If you remember that you need to pull the USB
 stick before you suspend, then you damn well remember to safely
 eject the USB stick and that syncs, unmounts and flushes the drive
 caches before telling you it is safe to pull the stick.

All of that only means the kernel needs to protect users from consequences
of silly mistakes which I'm not sure I agree with.

For example, on desktop systems I use user space syncs filesystems before
writing to /sys/power/state, so the additional sys_sync() in the kernel doesn't
seem to serve any purpose.

Moreover, question is if we really need to carry out the sync on *every*
suspend even if it is not pointless overall.  That shouldn't really be
necessary if we suspend and resume often enough or if we resume only for
a while and then suspend again.  Maybe it should be rate limited somehow
at least?
   
   If you suspend and resume frequently, then the cost of the sync
   shoul dbe negliable because the amount of data dirtied between
   resume/suspend shoul dbe negliable. hence my questions about where
   sync is spending too much time, and whether we've actually fixed
   those problems or not. If sync speed on clean filesystems is a
   problem then we need to fix sync, not work around it.
  
  Well, say you never suspend, but you also only shut down the system when you
  need to replace the kernel on it.  How often would you invoke global sync on
  that system?
 
 Never, because:
 
   - the kernel does background writeback of dirty data so you
 don't need to run sync while the system is running; and

OK, and that surely happens between suspend-resume cycles too, so why does
the sync in the suspend code path make a difference, really (except for
avoiding the consequences of possible user mistakes)?

   - unmount during shutdown runs sync_filesystem() internally
 (just like sys_sync does) to ensure the filesystem is
 clean and no data is lost.

And that still will be the case regardless of whether or not suspend is used.

 Seriously, stop being making ignorant arguments to justify removing
 sys_sync(). *If* there's a problem sys_sync() we need to *fix it*,
 not ignore it.

I'm not advocating for dropping the sys_sync().  I'd just have applied the Len's
patch had I thought that had been the way to go.

I'm not agreeing with your argumentation about why the sync is needed, though
(except for the one point that removing it might uncover some latent bugs).

For now, I've seen two arguments: the one about the possible latent bug (that
I agree with) and the one about how users may be hurt if they unplug storage
devices holding unmounted filesystems while suspended (which may be addressed
by a sync in user space just fine).  Are there any other reasons?

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Pavel Machek

On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote:
 On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote:
  On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
   On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:

 The only argument against dropping sys_sync() from the suspend code 
 path
 I've seen in this thread that I entirely agree with is that it may 
 lead to
 regressions, because we've done it practically forever and it may 
 hide latent
 bugs somewhere in block drivers etc.  Dropping it, though, is the 
 only way
 to see those bugs, if any, and if we want to ever fix them, we need 
 to see
 them.  That's why I think that it may be a good idea to allow people 
 to
 drop it if they are willing to accept some extra risk (via the kernel
 command line, for example).

I'd be perfectly happy to have the sync selectable at runtime, one way 
or another.  The three most reasonable options seem to be:

kernel command line

sysfs file

sysctl setting

The command line is less flexible (it can't be changed after booting).  
Either of the other two would be fine with me.
   
   We'll probably use a sysfs file (possibly plus a Kconfig option to set the
   boot time default).
  
  Android people can already do sync-less s2ram using existing
  interface. IMO they should just do it.
  
  In any case, sysfs file + Kconfig is an overkill. We already have too
  many Kconfig options.
 
 I don't think we can reach a general agreement on what's the *right* approach
 with respect to the sys_sync() in the suspend code path, so the only way out
 of this situation I can see is to make it configurable.

So first: not having general agreement does not mean we should
introduce Kconfig + sysfs file. Second: your proposal of lets sync if
runtime was shorter than xxx is over complex, but at least should not
need Kconfig support... Third: we have ioctl() based interface, and I
guess android should use that one; it already has s2ram without sync
method.

  There's not a single Android phone supported by mainline
  kernel. I'm sure they have bigger problems than Android setting
  default sysfs values...
 
 But perhaps we'd like to change that?

We'd like to, but lets start with the real hard stuff (merging support
for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks
that would be 1-line change, but we pollute code with Kconfig+sysfs
making it 100..

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 12:25:07 PM Pavel Machek wrote:
 On Mon 2015-07-06 15:59:15, Rafael J. Wysocki wrote:
  On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote:
   On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
 On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
 
  The only argument against dropping sys_sync() from the suspend code 
  path
  I've seen in this thread that I entirely agree with is that it may 
  lead to
  regressions, because we've done it practically forever and it may 
  hide latent
  bugs somewhere in block drivers etc.  Dropping it, though, is the 
  only way
  to see those bugs, if any, and if we want to ever fix them, we need 
  to see
  them.  That's why I think that it may be a good idea to allow 
  people to
  drop it if they are willing to accept some extra risk (via the 
  kernel
  command line, for example).
 
 I'd be perfectly happy to have the sync selectable at runtime, one 
 way 
 or another.  The three most reasonable options seem to be:
 
   kernel command line
 
   sysfs file
 
   sysctl setting
 
 The command line is less flexible (it can't be changed after 
 booting).  
 Either of the other two would be fine with me.

We'll probably use a sysfs file (possibly plus a Kconfig option to set 
the
boot time default).
   
   Android people can already do sync-less s2ram using existing
   interface. IMO they should just do it.
   
   In any case, sysfs file + Kconfig is an overkill. We already have too
   many Kconfig options.
  
  I don't think we can reach a general agreement on what's the *right* 
  approach
  with respect to the sys_sync() in the suspend code path, so the only way out
  of this situation I can see is to make it configurable.
 
 So first: not having general agreement does not mean we should
 introduce Kconfig + sysfs file. Second: your proposal of lets sync if
 runtime was shorter than xxx is over complex, but at least should not
 need Kconfig support...

The Kconfig part is only about setting the default at compile time, so I'm
not sure why you have any problems with it.


 Third: we have ioctl() based interface, and I guess android should use that
 one; it already has s2ram without sync method.

And that is tied to hibernation and therefore useless to people who don't
build that in.


   There's not a single Android phone supported by mainline
   kernel. I'm sure they have bigger problems than Android setting
   default sysfs values...
  
  But perhaps we'd like to change that?
 
 We'd like to, but lets start with the real hard stuff (merging support
 for Qualcomm chipsets) that is 100 LoC+, not with trivial tweaks
 that would be 1-line change, but we pollute code with Kconfig+sysfs
 making it 100..

It is suboptimal, but this is the only way I can see allowing us to make
any forward progress here.

Evidently, some people are not happy with the status quo and they also should
be able to cover their use cases with the mainline kernel, shouldn't they?

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Rafael J. Wysocki

On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
 On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
  For example, on desktop systems I use user space syncs filesystems
  before
  writing to /sys/power/state, so the additional sys_sync() in the
  kernel doesn't
  seem to serve any purpose.
 
 There is a race you cannot close in user space.

Yes, there is, but I'm not sure how much of a help the sync in the kernel
provides here anyway.

Say this happens.  There is a process writing to a file running in parallel
with the suspend process.  Suspend starts and that process is frozen.  The
sync is called and causes all of the outstanding data to be written back.
The user doesn't realize that the write is technically still in progress, so
he (or she) pulls the storage device out of the system, moves it to another
system, makes changes (say removes the file written to by the process above,
so the blocks previously occupied by that file are now used for some metadata)
and moves the storage back to the suspended system.  The system is resumed
and the writing process continues writing possibly to the wrong blocks and
corrupts the filesystem.

Is this possible?  If not, why not?

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Takashi Iwai

At Tue, 7 Jul 2015 11:17:56 +1000,
Dave Chinner wrote:
 
Moreover, question is if we really need to carry out the sync on *every*
suspend even if it is not pointless overall.  That shouldn't really be
necessary if we suspend and resume often enough or if we resume only for
a while and then suspend again.  Maybe it should be rate limited somehow
at least?
   
   If you suspend and resume frequently, then the cost of the sync
   shoul dbe negliable because the amount of data dirtied between
   resume/suspend shoul dbe negliable. hence my questions about where
   sync is spending too much time, and whether we've actually fixed
   those problems or not. If sync speed on clean filesystems is a
   problem then we need to fix sync, not work around it.
  
  Well, say you never suspend, but you also only shut down the system when you
  need to replace the kernel on it.  How often would you invoke global sync on
  that system?
 
 Never, because:
 
   - the kernel does background writeback of dirty data so you
 don't need to run sync while the system is running; and
   - unmount during shutdown runs sync_filesystem() internally
 (just like sys_sync does) to ensure the filesystem is
 clean and no data is lost.
 
 Seriously, stop being making ignorant arguments to justify removing
 sys_sync(). *If* there's a problem sys_sync() we need to *fix it*,
 not ignore it.

This thread remembered me of an old problem I faced: suspend fails
while copying a large file.  So I tried again with 4.2-rc1.

I put a slow USB stick with ext4, started copying a 4GB ISO file, and
triggered the suspend.  It failed.

Then removed sys_sync(), and retested.  It still failed.

The log shows like:

 PM: Preparing system for sleep (freeze)
 Freezing user space processes ... (elapsed 0.002 seconds) done.
 Freezing remaining freezable tasks ... 
 Freezing of tasks failed after 20.003 seconds (0 tasks refusing to freeze, 
wq_busy=1):
 Restarting kernel threads ... done.
 Restarting tasks ... done.

So, removing sys_sync() will save a small amount of time, but it won't
help always...


Takashi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Oliver Neukum

On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
 For example, on desktop systems I use user space syncs filesystems
 before
 writing to /sys/power/state, so the additional sys_sync() in the
 kernel doesn't
 seem to serve any purpose.

There is a race you cannot close in user space.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Oliver Neukum

On Tue, 2015-07-07 at 16:32 +0200, Rafael J. Wysocki wrote:
 On Tuesday, July 07, 2015 03:16:48 PM Oliver Neukum wrote:
  On Tue, 2015-07-07 at 14:14 +0200, Rafael J. Wysocki wrote:
   For example, on desktop systems I use user space syncs filesystems
   before
   writing to /sys/power/state, so the additional sys_sync() in the
   kernel doesn't
   seem to serve any purpose.
  
  There is a race you cannot close in user space.
 
 Yes, there is, but I'm not sure how much of a help the sync in the kernel
 provides here anyway.
 
 Say this happens.  There is a process writing to a file running in parallel
 with the suspend process.  Suspend starts and that process is frozen.  The
 sync is called and causes all of the outstanding data to be written back.
 The user doesn't realize that the write is technically still in progress, so

Well, in that case the user never got the feedback that the write is
finished. That is a race that always exists, like sending SIGKILL to a
running task.
What you describe is in principle unsolvable every time under
any circumstances.

 he (or she) pulls the storage device out of the system, moves it to another
 system, makes changes (say removes the file written to by the process above,
 so the blocks previously occupied by that file are now used for some metadata)
 and moves the storage back to the suspended system.  The system is resumed
 and the writing process continues writing possibly to the wrong blocks and
 corrupts the filesystem.

That is a tough nut. But that's not a reason to make it worse.
I'd say there's no reason not to use a secondary interface to
suspend without syncing or to extend or introduce such an interface
if the API is deficient.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-07 Thread Alan Stern

On Tue, 7 Jul 2015, Oliver Neukum wrote:

  he (or she) pulls the storage device out of the system, moves it to another
  system, makes changes (say removes the file written to by the process above,
  so the blocks previously occupied by that file are now used for some 
  metadata)
  and moves the storage back to the suspended system.  The system is resumed
  and the writing process continues writing possibly to the wrong blocks and
  corrupts the filesystem.
 
 That is a tough nut. But that's not a reason to make it worse.
 I'd say there's no reason not to use a secondary interface to
 suspend without syncing or to extend or introduce such an interface
 if the API is deficient.

Indeed, the problem Rafael outlined always exists whether or not the
kernel does a sync.  Even if no I/O is in progress when the system goes
to sleep, if the user moves a portable storage device with a mounted
filesystem to another computer and updates it before waking the system
up, corruption is highly likely.

In principle this could be solved by adding suspend/resume callbacks to
filesystems.  For example, the resume callback could verify that the
superblock had not been changed since the suspend occurred.  Or there 
could be some other simple way of determining that the filesystem had 
not been remounted and changed.

Either way, this is irrelevant to the question of whether the kernel 
should issue a sync when suspending.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Dave Chinner

On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote:
> On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote:
> > On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
> > > > No, your observation was that "sync is slow". Your *solution* is "we
> > > > need to remove sync".
> > > 
> > > Not only slow, but pointless too.  The argument goes: "It is slow and
> > > pointless and so it may be dropped."
> > > 
> > > Now, I can agree that it wasn't clearly demonstrated that the 
> > > unconditional
> > > sys_sync() in the suspend code path was pointless, but it also has never
> > > been clearly shown why it is not pointless on systems that suspend and 
> > > resume
> > > reliably.
> > 
> > I just gave you an example of why sync is needed: Suspend, pull out
> > USB drive when putting laptop in bag.
> 
> Exactly the same thing can happen while not suspended.  What does make suspend
> special with respect to that?

Stop being obtuse. If you remember that you need to pull the USB
stick before you suspend, then you damn well remember to "safely
eject" the USB stick and that syncs, unmounts and flushes the drive
caches before telling you it is safe to pull the stick.

> > > Moreover, question is if we really need to carry out the sync on *every*
> > > suspend even if it is not pointless overall.  That shouldn't really be
> > > necessary if we suspend and resume often enough or if we resume only for
> > > a while and then suspend again.  Maybe it should be rate limited somehow
> > > at least?
> > 
> > If you suspend and resume frequently, then the cost of the sync
> > shoul dbe negliable because the amount of data dirtied between
> > resume/suspend shoul dbe negliable. hence my questions about where
> > sync is spending too much time, and whether we've actually fixed
> > those problems or not. If sync speed on clean filesystems is a
> > problem then we need to fix sync, not work around it.
> 
> Well, say you never suspend, but you also only shut down the system when you
> need to replace the kernel on it.  How often would you invoke global sync on
> that system?

Never, because:

- the kernel does background writeback of dirty data so you
  don't need to run sync while the system is running; and
- unmount during shutdown runs sync_filesystem() internally
  (just like sys_sync does) to ensure the filesystem is
  clean and no data is lost.

Seriously, stop being making ignorant arguments to justify removing
sys_sync(). *If* there's a problem sys_sync() we need to *fix it*,
not ignore it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Rafael J. Wysocki

On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote:
> On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
> > On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
> > > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
> > > 
> > > > The only argument against dropping sys_sync() from the suspend code path
> > > > I've seen in this thread that I entirely agree with is that it may lead 
> > > > to
> > > > regressions, because we've done it practically forever and it may hide 
> > > > latent
> > > > bugs somewhere in block drivers etc.  Dropping it, though, is the only 
> > > > way
> > > > to see those bugs, if any, and if we want to ever fix them, we need to 
> > > > see
> > > > them.  That's why I think that it may be a good idea to allow people to
> > > > drop it if they are willing to accept some extra risk (via the kernel
> > > > command line, for example).
> > > 
> > > I'd be perfectly happy to have the sync selectable at runtime, one way 
> > > or another.  The three most reasonable options seem to be:
> > > 
> > >   kernel command line
> > > 
> > >   sysfs file
> > > 
> > >   sysctl setting
> > > 
> > > The command line is less flexible (it can't be changed after booting).  
> > > Either of the other two would be fine with me.
> > 
> > We'll probably use a sysfs file (possibly plus a Kconfig option to set the
> > boot time default).
> 
> Android people can already do sync-less s2ram using existing
> interface. IMO they should just do it.
> 
> In any case, sysfs file + Kconfig is an overkill. We already have too
> many Kconfig options.

I don't think we can reach a general agreement on what's the *right* approach
with respect to the sys_sync() in the suspend code path, so the only way out
of this situation I can see is to make it configurable.

> There's not a single Android phone supported by mainline
> kernel. I'm sure they have bigger problems than Android setting
> default sysfs values...

But perhaps we'd like to change that?

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Rafael J. Wysocki

On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote:
> On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
> > On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
> > > On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
> > > > >> The _vast_ majority of systems using Linux suspend today are under
> > > > >> an Android user-space.  Android has no assumption that that suspend 
> > > > >> to
> > > > >> mem will necessarily stay suspended for a long time.
> > > > >
> > > > > Indeed, however your change was not android-specific, and it is not
> > > > > "comfortable" on x86-style hardware and usage patterns.
> > > > 
> > > > "comfortable on x86-style and usage patterns"?
> > > > If you mean "traditional" instead of "comfortable",
> > > > where "tradition" is based on 10-year old systems, then sure.
> > > 
> > > Even if this were true(*) we don't break things that currently work
> > > just because something different is "just around the corner". e.g.
> > > if you shut the lid on your laptop and it suspends to RAM, you can
> > > pull the USB drive out that you just copied stuff to and plug it
> > > into another machine and find all the data you copied there is
> > > present.
> > > 
> > > Remove the sync() from the freeze code, and this isn't guaranteed to
> > > work anymore. It is now dependent on userspace implementations for
> > > this to work, and we know what userspace developers will choose in
> > > this situation. i.e. fast and "works for me", not "safe for
> > > everyone".
> > > 
> > > (*) Which it clearly isn't true because, as this example shows, my
> > > shiny new laptop still has exactly the same data integrity
> > > requirements as the laptop I was using 10 years ago.
> > > 
> > > Just because there are lots of Android or Chrome out there it
> > > doesn't mean we can just ignore the requirements of everything
> > > else...
> > > 
> > > > > That said, as long as x86 will still try to safeguard my data during 
> > > > > mem
> > > > > sleep/resume as it does today, I have no strong feelings about
> > > > > light/heavy-weight "mem" sleep being strictly a compile-time 
> > > > > selectable
> > > > > thing, or a more flexible runtime-selectable behavior.
> > > > 
> > > > The observation here is that the kernel should not force every system
> > > > to sys_sync() on every suspend.  The only question is how to best
> > > > implement that.
> > > 
> > > No, your observation was that "sync is slow". Your *solution* is "we
> > > need to remove sync".
> > 
> > Not only slow, but pointless too.  The argument goes: "It is slow and
> > pointless and so it may be dropped."
> > 
> > Now, I can agree that it wasn't clearly demonstrated that the unconditional
> > sys_sync() in the suspend code path was pointless, but it also has never
> > been clearly shown why it is not pointless on systems that suspend and 
> > resume
> > reliably.
> 
> I just gave you an example of why sync is needed: Suspend, pull out
> USB drive when putting laptop in bag.

Exactly the same thing can happen while not suspended.  What does make suspend
special with respect to that?

> > [The argument that the user can pull removable storage devices out of the
> > system while suspended doesn't hold any water to me, because the user can
> > pull them out of the system when not suspended just as well and cause the
> > same kind of damage to happen.]
> 
> How many times have you forgotten to pull a usb stick out of a
> laptop before putting it away? I've done that several times in the
> past few months, and so I've *personally* pulled the USB sticks out
> of suspended machines. This is a *common occurrence* and it
> currently works just fine, so changing sync behaviour is something
> that will directly impact me *as a user*.
> 
> Not to mention hybrid suspend (i.e write suspend image to disk, then
> suspend to RAM for fast resume, but if power is lost resume from
> disk image) both images have exactly the same filesystem state in
> them and that is an absolute requirement for a hybrid suspend...

For that you need to sync.  Let's not confuse what we're talking about.

The code path in question is suspend-only, no hibernation involved in any form.

> > Moreover, question is if we really need to carry out the sync on *every*
> > suspend even if it is not pointless overall.  That shouldn't really be
> > necessary if we suspend and resume often enough or if we resume only for
> > a while and then suspend again.  Maybe it should be rate limited somehow
> > at least?
> 
> If you suspend and resume frequently, then the cost of the sync
> shoul dbe negliable because the amount of data dirtied between
> resume/suspend shoul dbe negliable. hence my questions about where
> sync is spending too much time, and whether we've actually fixed
> those problems or not. If sync speed on clean filesystems is a
> problem then we need to fix sync, not work around it.

Well, say you never suspend, but you also only shut down the system when you

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

Hi!

> > Moreover, question is if we really need to carry out the sync on *every*
> > suspend even if it is not pointless overall.  That shouldn't really be
> > necessary if we suspend and resume often enough or if we resume only for
> > a while and then suspend again.  Maybe it should be rate limited somehow
> > at least?
> 
> If you suspend and resume frequently, then the cost of the sync
> shoul dbe negliable because the amount of data dirtied between
> resume/suspend shoul dbe negliable. hence my questions about where
> sync is spending too much time, and whether we've actually fixed
> those problems or not. If sync speed on clean filesystems is a
> problem then we need to fix sync, not work around it.

And yes, that's solution I'd really prefer over adding knobs to
suspend.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
> On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
> > On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
> > 
> > > The only argument against dropping sys_sync() from the suspend code path
> > > I've seen in this thread that I entirely agree with is that it may lead to
> > > regressions, because we've done it practically forever and it may hide 
> > > latent
> > > bugs somewhere in block drivers etc.  Dropping it, though, is the only way
> > > to see those bugs, if any, and if we want to ever fix them, we need to see
> > > them.  That's why I think that it may be a good idea to allow people to
> > > drop it if they are willing to accept some extra risk (via the kernel
> > > command line, for example).
> > 
> > I'd be perfectly happy to have the sync selectable at runtime, one way 
> > or another.  The three most reasonable options seem to be:
> > 
> > kernel command line
> > 
> > sysfs file
> > 
> > sysctl setting
> > 
> > The command line is less flexible (it can't be changed after booting).  
> > Either of the other two would be fine with me.
> 
> We'll probably use a sysfs file (possibly plus a Kconfig option to set the
> boot time default).

Android people can already do sync-less s2ram using existing
interface. IMO they should just do it.

In any case, sysfs file + Kconfig is an overkill. We already have too
many Kconfig options.

There's not a single Android phone supported by mainline
kernel. I'm sure they have bigger problems than Android setting
default sysfs values...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Ming Lei

On Sat, Jul 4, 2015 at 9:03 AM, Rafael J. Wysocki  wrote:
> On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
>> On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
>> > >> The _vast_ majority of systems using Linux suspend today are under
>> > >> an Android user-space.  Android has no assumption that that suspend to
>> > >> mem will necessarily stay suspended for a long time.
>> > >
>> > > Indeed, however your change was not android-specific, and it is not
>> > > "comfortable" on x86-style hardware and usage patterns.
>> >
>> > "comfortable on x86-style and usage patterns"?
>> > If you mean "traditional" instead of "comfortable",
>> > where "tradition" is based on 10-year old systems, then sure.
>>
>> Even if this were true(*) we don't break things that currently work
>> just because something different is "just around the corner". e.g.
>> if you shut the lid on your laptop and it suspends to RAM, you can
>> pull the USB drive out that you just copied stuff to and plug it
>> into another machine and find all the data you copied there is
>> present.
>>
>> Remove the sync() from the freeze code, and this isn't guaranteed to
>> work anymore. It is now dependent on userspace implementations for
>> this to work, and we know what userspace developers will choose in
>> this situation. i.e. fast and "works for me", not "safe for
>> everyone".
>>
>> (*) Which it clearly isn't true because, as this example shows, my
>> shiny new laptop still has exactly the same data integrity
>> requirements as the laptop I was using 10 years ago.
>>
>> Just because there are lots of Android or Chrome out there it
>> doesn't mean we can just ignore the requirements of everything
>> else...
>>
>> > > That said, as long as x86 will still try to safeguard my data during mem
>> > > sleep/resume as it does today, I have no strong feelings about
>> > > light/heavy-weight "mem" sleep being strictly a compile-time selectable
>> > > thing, or a more flexible runtime-selectable behavior.
>> >
>> > The observation here is that the kernel should not force every system
>> > to sys_sync() on every suspend.  The only question is how to best
>> > implement that.
>>
>> No, your observation was that "sync is slow". Your *solution* is "we
>> need to remove sync".
>
> Not only slow, but pointless too.  The argument goes: "It is slow and
> pointless and so it may be dropped."
>
> Now, I can agree that it wasn't clearly demonstrated that the unconditional
> sys_sync() in the suspend code path was pointless, but it also has never
> been clearly shown why it is not pointless on systems that suspend and resume
> reliably.
>
> [The argument that the user can pull removable storage devices out of the
> system while suspended doesn't hold any water to me, because the user can
> pull them out of the system when not suspended just as well and cause the
> same kind of damage to happen.]

I think common users do understand the difference between
suspend and running, don't they?  So they may not complain
anyone if they plug off USB disk when system is running, but they
will complain the patch and the regression if their data is lost because
doing that during suspend. Not to mention, it is very common to
plug off USB disk after suspend laptop.

Also there are network based storage(iSCSI, NBD, ...), NFS/CIFS, and
is it safe for these cases?


-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

Hi!

> conditions, or s2disk fails because a kernel upgrade was done before
> the s2disk and so can't be resumed. With your change, users lose all

s2disk should be able to survive kernel change on x86-64 for long time
now, and I have patches for x86, too.

[But yes, I think sync() should be kept.]
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

Hi!

> > Understand, however, there are systems which suspend/resume reliably
> > many times per second, making policy choice of having the kernel hard-code
> > a sys_sync() into the suspend path a bad idea.
> 
> My current view on that is that whether or not to do a sync() before 
> suspending
> ultimately is a policy decision and should belong to user space as such 
> (modulo
> the autosleep situation when user space may not know when the suspend is going
> to happen).
> 
> Moreover, user space is free to do as many sync()s before suspending as it
> wants to and the question here is whether or not the *kernel* should sync()
> in the suspend code path.

sync()s from userspace do not work, as userspace is still running.

sync() from kernel happens with tasks stopped. ... so it should really
get consistent image on disk.

And there are already interfaces that can s2ram without sync, just use
uswsusp ioctls, not the sysfs writes.

If you are doing multiple suspends per second, a) you are doing
something wrong and b) you'd better use ioctl anyway.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

Hi!

  Understand, however, there are systems which suspend/resume reliably
  many times per second, making policy choice of having the kernel hard-code
  a sys_sync() into the suspend path a bad idea.
 
 My current view on that is that whether or not to do a sync() before 
 suspending
 ultimately is a policy decision and should belong to user space as such 
 (modulo
 the autosleep situation when user space may not know when the suspend is going
 to happen).
 
 Moreover, user space is free to do as many sync()s before suspending as it
 wants to and the question here is whether or not the *kernel* should sync()
 in the suspend code path.

sync()s from userspace do not work, as userspace is still running.

sync() from kernel happens with tasks stopped. ... so it should really
get consistent image on disk.

And there are already interfaces that can s2ram without sync, just use
uswsusp ioctls, not the sysfs writes.

If you are doing multiple suspends per second, a) you are doing
something wrong and b) you'd better use ioctl anyway.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

Hi!

 conditions, or s2disk fails because a kernel upgrade was done before
 the s2disk and so can't be resumed. With your change, users lose all

s2disk should be able to survive kernel change on x86-64 for long time
now, and I have patches for x86, too.

[But yes, I think sync() should be kept.]
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
 On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
  On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
  
   The only argument against dropping sys_sync() from the suspend code path
   I've seen in this thread that I entirely agree with is that it may lead to
   regressions, because we've done it practically forever and it may hide 
   latent
   bugs somewhere in block drivers etc.  Dropping it, though, is the only way
   to see those bugs, if any, and if we want to ever fix them, we need to see
   them.  That's why I think that it may be a good idea to allow people to
   drop it if they are willing to accept some extra risk (via the kernel
   command line, for example).
  
  I'd be perfectly happy to have the sync selectable at runtime, one way 
  or another.  The three most reasonable options seem to be:
  
  kernel command line
  
  sysfs file
  
  sysctl setting
  
  The command line is less flexible (it can't be changed after booting).  
  Either of the other two would be fine with me.
 
 We'll probably use a sysfs file (possibly plus a Kconfig option to set the
 boot time default).

Android people can already do sync-less s2ram using existing
interface. IMO they should just do it.

In any case, sysfs file + Kconfig is an overkill. We already have too
many Kconfig options.

There's not a single Android phone supported by mainline
kernel. I'm sure they have bigger problems than Android setting
default sysfs values...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Ming Lei

On Sat, Jul 4, 2015 at 9:03 AM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
 On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
   The _vast_ majority of systems using Linux suspend today are under
   an Android user-space.  Android has no assumption that that suspend to
   mem will necessarily stay suspended for a long time.
  
   Indeed, however your change was not android-specific, and it is not
   comfortable on x86-style hardware and usage patterns.
 
  comfortable on x86-style and usage patterns?
  If you mean traditional instead of comfortable,
  where tradition is based on 10-year old systems, then sure.

 Even if this were true(*) we don't break things that currently work
 just because something different is just around the corner. e.g.
 if you shut the lid on your laptop and it suspends to RAM, you can
 pull the USB drive out that you just copied stuff to and plug it
 into another machine and find all the data you copied there is
 present.

 Remove the sync() from the freeze code, and this isn't guaranteed to
 work anymore. It is now dependent on userspace implementations for
 this to work, and we know what userspace developers will choose in
 this situation. i.e. fast and works for me, not safe for
 everyone.

 (*) Which it clearly isn't true because, as this example shows, my
 shiny new laptop still has exactly the same data integrity
 requirements as the laptop I was using 10 years ago.

 Just because there are lots of Android or Chrome out there it
 doesn't mean we can just ignore the requirements of everything
 else...

   That said, as long as x86 will still try to safeguard my data during mem
   sleep/resume as it does today, I have no strong feelings about
   light/heavy-weight mem sleep being strictly a compile-time selectable
   thing, or a more flexible runtime-selectable behavior.
 
  The observation here is that the kernel should not force every system
  to sys_sync() on every suspend.  The only question is how to best
  implement that.

 No, your observation was that sync is slow. Your *solution* is we
 need to remove sync.

 Not only slow, but pointless too.  The argument goes: It is slow and
 pointless and so it may be dropped.

 Now, I can agree that it wasn't clearly demonstrated that the unconditional
 sys_sync() in the suspend code path was pointless, but it also has never
 been clearly shown why it is not pointless on systems that suspend and resume
 reliably.

 [The argument that the user can pull removable storage devices out of the
 system while suspended doesn't hold any water to me, because the user can
 pull them out of the system when not suspended just as well and cause the
 same kind of damage to happen.]

I think common users do understand the difference between
suspend and running, don't they?  So they may not complain
anyone if they plug off USB disk when system is running, but they
will complain the patch and the regression if their data is lost because
doing that during suspend. Not to mention, it is very common to
plug off USB disk after suspend laptop.

Also there are network based storage(iSCSI, NBD, ...), NFS/CIFS, and
is it safe for these cases?


-- 
Ming Lei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Pavel Machek

Hi!

  Moreover, question is if we really need to carry out the sync on *every*
  suspend even if it is not pointless overall.  That shouldn't really be
  necessary if we suspend and resume often enough or if we resume only for
  a while and then suspend again.  Maybe it should be rate limited somehow
  at least?
 
 If you suspend and resume frequently, then the cost of the sync
 shoul dbe negliable because the amount of data dirtied between
 resume/suspend shoul dbe negliable. hence my questions about where
 sync is spending too much time, and whether we've actually fixed
 those problems or not. If sync speed on clean filesystems is a
 problem then we need to fix sync, not work around it.

And yes, that's solution I'd really prefer over adding knobs to
suspend.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Rafael J. Wysocki

On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote:
 On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
  On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
   On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
 The _vast_ majority of systems using Linux suspend today are under
 an Android user-space.  Android has no assumption that that suspend 
 to
 mem will necessarily stay suspended for a long time.

 Indeed, however your change was not android-specific, and it is not
 comfortable on x86-style hardware and usage patterns.

comfortable on x86-style and usage patterns?
If you mean traditional instead of comfortable,
where tradition is based on 10-year old systems, then sure.
   
   Even if this were true(*) we don't break things that currently work
   just because something different is just around the corner. e.g.
   if you shut the lid on your laptop and it suspends to RAM, you can
   pull the USB drive out that you just copied stuff to and plug it
   into another machine and find all the data you copied there is
   present.
   
   Remove the sync() from the freeze code, and this isn't guaranteed to
   work anymore. It is now dependent on userspace implementations for
   this to work, and we know what userspace developers will choose in
   this situation. i.e. fast and works for me, not safe for
   everyone.
   
   (*) Which it clearly isn't true because, as this example shows, my
   shiny new laptop still has exactly the same data integrity
   requirements as the laptop I was using 10 years ago.
   
   Just because there are lots of Android or Chrome out there it
   doesn't mean we can just ignore the requirements of everything
   else...
   
 That said, as long as x86 will still try to safeguard my data during 
 mem
 sleep/resume as it does today, I have no strong feelings about
 light/heavy-weight mem sleep being strictly a compile-time 
 selectable
 thing, or a more flexible runtime-selectable behavior.

The observation here is that the kernel should not force every system
to sys_sync() on every suspend.  The only question is how to best
implement that.
   
   No, your observation was that sync is slow. Your *solution* is we
   need to remove sync.
  
  Not only slow, but pointless too.  The argument goes: It is slow and
  pointless and so it may be dropped.
  
  Now, I can agree that it wasn't clearly demonstrated that the unconditional
  sys_sync() in the suspend code path was pointless, but it also has never
  been clearly shown why it is not pointless on systems that suspend and 
  resume
  reliably.
 
 I just gave you an example of why sync is needed: Suspend, pull out
 USB drive when putting laptop in bag.

Exactly the same thing can happen while not suspended.  What does make suspend
special with respect to that?

  [The argument that the user can pull removable storage devices out of the
  system while suspended doesn't hold any water to me, because the user can
  pull them out of the system when not suspended just as well and cause the
  same kind of damage to happen.]
 
 How many times have you forgotten to pull a usb stick out of a
 laptop before putting it away? I've done that several times in the
 past few months, and so I've *personally* pulled the USB sticks out
 of suspended machines. This is a *common occurrence* and it
 currently works just fine, so changing sync behaviour is something
 that will directly impact me *as a user*.
 
 Not to mention hybrid suspend (i.e write suspend image to disk, then
 suspend to RAM for fast resume, but if power is lost resume from
 disk image) both images have exactly the same filesystem state in
 them and that is an absolute requirement for a hybrid suspend...

For that you need to sync.  Let's not confuse what we're talking about.

The code path in question is suspend-only, no hibernation involved in any form.

  Moreover, question is if we really need to carry out the sync on *every*
  suspend even if it is not pointless overall.  That shouldn't really be
  necessary if we suspend and resume often enough or if we resume only for
  a while and then suspend again.  Maybe it should be rate limited somehow
  at least?
 
 If you suspend and resume frequently, then the cost of the sync
 shoul dbe negliable because the amount of data dirtied between
 resume/suspend shoul dbe negliable. hence my questions about where
 sync is spending too much time, and whether we've actually fixed
 those problems or not. If sync speed on clean filesystems is a
 problem then we need to fix sync, not work around it.

Well, say you never suspend, but you also only shut down the system when you
need to replace the kernel on it.  How often would you invoke global sync on
that system?

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Rafael J. Wysocki

On Monday, July 06, 2015 01:06:45 PM Pavel Machek wrote:
 On Mon 2015-07-06 01:28:20, Rafael J. Wysocki wrote:
  On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
   On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
   
The only argument against dropping sys_sync() from the suspend code path
I've seen in this thread that I entirely agree with is that it may lead 
to
regressions, because we've done it practically forever and it may hide 
latent
bugs somewhere in block drivers etc.  Dropping it, though, is the only 
way
to see those bugs, if any, and if we want to ever fix them, we need to 
see
them.  That's why I think that it may be a good idea to allow people to
drop it if they are willing to accept some extra risk (via the kernel
command line, for example).
   
   I'd be perfectly happy to have the sync selectable at runtime, one way 
   or another.  The three most reasonable options seem to be:
   
 kernel command line
   
 sysfs file
   
 sysctl setting
   
   The command line is less flexible (it can't be changed after booting).  
   Either of the other two would be fine with me.
  
  We'll probably use a sysfs file (possibly plus a Kconfig option to set the
  boot time default).
 
 Android people can already do sync-less s2ram using existing
 interface. IMO they should just do it.
 
 In any case, sysfs file + Kconfig is an overkill. We already have too
 many Kconfig options.

I don't think we can reach a general agreement on what's the *right* approach
with respect to the sys_sync() in the suspend code path, so the only way out
of this situation I can see is to make it configurable.

 There's not a single Android phone supported by mainline
 kernel. I'm sure they have bigger problems than Android setting
 default sysfs values...

But perhaps we'd like to change that?

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-06 Thread Dave Chinner

On Mon, Jul 06, 2015 at 03:52:25PM +0200, Rafael J. Wysocki wrote:
 On Monday, July 06, 2015 10:06:14 AM Dave Chinner wrote:
  On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
No, your observation was that sync is slow. Your *solution* is we
need to remove sync.
   
   Not only slow, but pointless too.  The argument goes: It is slow and
   pointless and so it may be dropped.
   
   Now, I can agree that it wasn't clearly demonstrated that the 
   unconditional
   sys_sync() in the suspend code path was pointless, but it also has never
   been clearly shown why it is not pointless on systems that suspend and 
   resume
   reliably.
  
  I just gave you an example of why sync is needed: Suspend, pull out
  USB drive when putting laptop in bag.
 
 Exactly the same thing can happen while not suspended.  What does make suspend
 special with respect to that?

Stop being obtuse. If you remember that you need to pull the USB
stick before you suspend, then you damn well remember to safely
eject the USB stick and that syncs, unmounts and flushes the drive
caches before telling you it is safe to pull the stick.

   Moreover, question is if we really need to carry out the sync on *every*
   suspend even if it is not pointless overall.  That shouldn't really be
   necessary if we suspend and resume often enough or if we resume only for
   a while and then suspend again.  Maybe it should be rate limited somehow
   at least?
  
  If you suspend and resume frequently, then the cost of the sync
  shoul dbe negliable because the amount of data dirtied between
  resume/suspend shoul dbe negliable. hence my questions about where
  sync is spending too much time, and whether we've actually fixed
  those problems or not. If sync speed on clean filesystems is a
  problem then we need to fix sync, not work around it.
 
 Well, say you never suspend, but you also only shut down the system when you
 need to replace the kernel on it.  How often would you invoke global sync on
 that system?

Never, because:

- the kernel does background writeback of dirty data so you
  don't need to run sync while the system is running; and
- unmount during shutdown runs sync_filesystem() internally
  (just like sys_sync does) to ensure the filesystem is
  clean and no data is lost.

Seriously, stop being making ignorant arguments to justify removing
sys_sync(). *If* there's a problem sys_sync() we need to *fix it*,
not ignore it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Dave Chinner

On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
> On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
> > On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
> > > >> The _vast_ majority of systems using Linux suspend today are under
> > > >> an Android user-space.  Android has no assumption that that suspend to
> > > >> mem will necessarily stay suspended for a long time.
> > > >
> > > > Indeed, however your change was not android-specific, and it is not
> > > > "comfortable" on x86-style hardware and usage patterns.
> > > 
> > > "comfortable on x86-style and usage patterns"?
> > > If you mean "traditional" instead of "comfortable",
> > > where "tradition" is based on 10-year old systems, then sure.
> > 
> > Even if this were true(*) we don't break things that currently work
> > just because something different is "just around the corner". e.g.
> > if you shut the lid on your laptop and it suspends to RAM, you can
> > pull the USB drive out that you just copied stuff to and plug it
> > into another machine and find all the data you copied there is
> > present.
> > 
> > Remove the sync() from the freeze code, and this isn't guaranteed to
> > work anymore. It is now dependent on userspace implementations for
> > this to work, and we know what userspace developers will choose in
> > this situation. i.e. fast and "works for me", not "safe for
> > everyone".
> > 
> > (*) Which it clearly isn't true because, as this example shows, my
> > shiny new laptop still has exactly the same data integrity
> > requirements as the laptop I was using 10 years ago.
> > 
> > Just because there are lots of Android or Chrome out there it
> > doesn't mean we can just ignore the requirements of everything
> > else...
> > 
> > > > That said, as long as x86 will still try to safeguard my data during mem
> > > > sleep/resume as it does today, I have no strong feelings about
> > > > light/heavy-weight "mem" sleep being strictly a compile-time selectable
> > > > thing, or a more flexible runtime-selectable behavior.
> > > 
> > > The observation here is that the kernel should not force every system
> > > to sys_sync() on every suspend.  The only question is how to best
> > > implement that.
> > 
> > No, your observation was that "sync is slow". Your *solution* is "we
> > need to remove sync".
> 
> Not only slow, but pointless too.  The argument goes: "It is slow and
> pointless and so it may be dropped."
> 
> Now, I can agree that it wasn't clearly demonstrated that the unconditional
> sys_sync() in the suspend code path was pointless, but it also has never
> been clearly shown why it is not pointless on systems that suspend and resume
> reliably.

I just gave you an example of why sync is needed: Suspend, pull out
USB drive when putting laptop in bag.

> [The argument that the user can pull removable storage devices out of the
> system while suspended doesn't hold any water to me, because the user can
> pull them out of the system when not suspended just as well and cause the
> same kind of damage to happen.]

How many times have you forgotten to pull a usb stick out of a
laptop before putting it away? I've done that several times in the
past few months, and so I've *personally* pulled the USB sticks out
of suspended machines. This is a *common occurrence* and it
currently works just fine, so changing sync behaviour is something
that will directly impact me *as a user*.

Not to mention hybrid suspend (i.e write suspend image to disk, then
suspend to RAM for fast resume, but if power is lost resume from
disk image) both images have exactly the same filesystem state in
them and that is an absolute requirement for a hybrid suspend...

> Moreover, question is if we really need to carry out the sync on *every*
> suspend even if it is not pointless overall.  That shouldn't really be
> necessary if we suspend and resume often enough or if we resume only for
> a while and then suspend again.  Maybe it should be rate limited somehow
> at least?

If you suspend and resume frequently, then the cost of the sync
shoul dbe negliable because the amount of data dirtied between
resume/suspend shoul dbe negliable. hence my questions about where
sync is spending too much time, and whether we've actually fixed
those problems or not. If sync speed on clean filesystems is a
problem then we need to fix sync, not work around it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Rafael J. Wysocki

On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
> On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
> 
> > The only argument against dropping sys_sync() from the suspend code path
> > I've seen in this thread that I entirely agree with is that it may lead to
> > regressions, because we've done it practically forever and it may hide 
> > latent
> > bugs somewhere in block drivers etc.  Dropping it, though, is the only way
> > to see those bugs, if any, and if we want to ever fix them, we need to see
> > them.  That's why I think that it may be a good idea to allow people to
> > drop it if they are willing to accept some extra risk (via the kernel
> > command line, for example).
> 
> I'd be perfectly happy to have the sync selectable at runtime, one way 
> or another.  The three most reasonable options seem to be:
> 
>   kernel command line
> 
>   sysfs file
> 
>   sysctl setting
> 
> The command line is less flexible (it can't be changed after booting).  
> Either of the other two would be fine with me.

We'll probably use a sysfs file (possibly plus a Kconfig option to set the
boot time default).

> > Moreover, question is if we really need to carry out the sync on *every*
> > suspend even if it is not pointless overall.  That shouldn't really be
> > necessary if we suspend and resume often enough or if we resume only for
> > a while and then suspend again.  Maybe it should be rate limited somehow
> > at least?
> 
> For example, skip the sync if the system has been awake for < 100 ms?

Yes, something like that.

> The cutoff time could also be controlled by the sysfs file: -1 =>
> never sync, 0 => always sync, > 0 => sync if the system has been awake 
> longer than the value.

That sounds like a good idea to me.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Rafael J. Wysocki

On Saturday, July 04, 2015 10:50:36 AM Geert Uytterhoeven wrote:
> On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki  wrote:
> > [The argument that the user can pull removable storage devices out of the
> > system while suspended doesn't hold any water to me, because the user can
> > pull them out of the system when not suspended just as well and cause the
> > same kind of damage to happen.]
> 
> The user may not notice the difference between a suspended and powered
> off machine, and thus think it's safe to unplug the USB memory stick while
> the machine appears off.
> 
> If that's considered a valid use case, this would be a regression.

Well, if user space configures the system to use suspend as a "power off
flavor" and doesn't even try to ensure the filesystems to be synced before
suspending (in that mode), that's on the edge of insanity.

And it falls into the "latent bugs that may be there because we've always
synced before suspending" category in my view.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Geert Uytterhoeven

On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki  wrote:
> [The argument that the user can pull removable storage devices out of the
> system while suspended doesn't hold any water to me, because the user can
> pull them out of the system when not suspended just as well and cause the
> same kind of damage to happen.]

The user may not notice the difference between a suspended and powered
off machine, and thus think it's safe to unplug the USB memory stick while
the machine appears off.

If that's considered a valid use case, this would be a regression.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Alan Stern

On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:

> The only argument against dropping sys_sync() from the suspend code path
> I've seen in this thread that I entirely agree with is that it may lead to
> regressions, because we've done it practically forever and it may hide latent
> bugs somewhere in block drivers etc.  Dropping it, though, is the only way
> to see those bugs, if any, and if we want to ever fix them, we need to see
> them.  That's why I think that it may be a good idea to allow people to
> drop it if they are willing to accept some extra risk (via the kernel
> command line, for example).

I'd be perfectly happy to have the sync selectable at runtime, one way 
or another.  The three most reasonable options seem to be:

kernel command line

sysfs file

sysctl setting

The command line is less flexible (it can't be changed after booting).  
Either of the other two would be fine with me.

> Moreover, question is if we really need to carry out the sync on *every*
> suspend even if it is not pointless overall.  That shouldn't really be
> necessary if we suspend and resume often enough or if we resume only for
> a while and then suspend again.  Maybe it should be rate limited somehow
> at least?

For example, skip the sync if the system has been awake for < 100 ms?  
The cutoff time could also be controlled by the sysfs file: -1 =>
never sync, 0 => always sync, > 0 => sync if the system has been awake 
longer than the value.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Geert Uytterhoeven

On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 [The argument that the user can pull removable storage devices out of the
 system while suspended doesn't hold any water to me, because the user can
 pull them out of the system when not suspended just as well and cause the
 same kind of damage to happen.]

The user may not notice the difference between a suspended and powered
off machine, and thus think it's safe to unplug the USB memory stick while
the machine appears off.

If that's considered a valid use case, this would be a regression.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Alan Stern

On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:

 The only argument against dropping sys_sync() from the suspend code path
 I've seen in this thread that I entirely agree with is that it may lead to
 regressions, because we've done it practically forever and it may hide latent
 bugs somewhere in block drivers etc.  Dropping it, though, is the only way
 to see those bugs, if any, and if we want to ever fix them, we need to see
 them.  That's why I think that it may be a good idea to allow people to
 drop it if they are willing to accept some extra risk (via the kernel
 command line, for example).

I'd be perfectly happy to have the sync selectable at runtime, one way 
or another.  The three most reasonable options seem to be:

kernel command line

sysfs file

sysctl setting

The command line is less flexible (it can't be changed after booting).  
Either of the other two would be fine with me.

 Moreover, question is if we really need to carry out the sync on *every*
 suspend even if it is not pointless overall.  That shouldn't really be
 necessary if we suspend and resume often enough or if we resume only for
 a while and then suspend again.  Maybe it should be rate limited somehow
 at least?

For example, skip the sync if the system has been awake for  100 ms?  
The cutoff time could also be controlled by the sysfs file: -1 =
never sync, 0 = always sync,  0 = sync if the system has been awake 
longer than the value.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Rafael J. Wysocki

On Saturday, July 04, 2015 10:50:36 AM Geert Uytterhoeven wrote:
 On Sat, Jul 4, 2015 at 3:03 AM, Rafael J. Wysocki r...@rjwysocki.net wrote:
  [The argument that the user can pull removable storage devices out of the
  system while suspended doesn't hold any water to me, because the user can
  pull them out of the system when not suspended just as well and cause the
  same kind of damage to happen.]
 
 The user may not notice the difference between a suspended and powered
 off machine, and thus think it's safe to unplug the USB memory stick while
 the machine appears off.
 
 If that's considered a valid use case, this would be a regression.

Well, if user space configures the system to use suspend as a power off
flavor and doesn't even try to ensure the filesystems to be synced before
suspending (in that mode), that's on the edge of insanity.

And it falls into the latent bugs that may be there because we've always
synced before suspending category in my view.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Rafael J. Wysocki

On Saturday, July 04, 2015 10:19:55 AM Alan Stern wrote:
 On Sat, 4 Jul 2015, Rafael J. Wysocki wrote:
 
  The only argument against dropping sys_sync() from the suspend code path
  I've seen in this thread that I entirely agree with is that it may lead to
  regressions, because we've done it practically forever and it may hide 
  latent
  bugs somewhere in block drivers etc.  Dropping it, though, is the only way
  to see those bugs, if any, and if we want to ever fix them, we need to see
  them.  That's why I think that it may be a good idea to allow people to
  drop it if they are willing to accept some extra risk (via the kernel
  command line, for example).
 
 I'd be perfectly happy to have the sync selectable at runtime, one way 
 or another.  The three most reasonable options seem to be:
 
   kernel command line
 
   sysfs file
 
   sysctl setting
 
 The command line is less flexible (it can't be changed after booting).  
 Either of the other two would be fine with me.

We'll probably use a sysfs file (possibly plus a Kconfig option to set the
boot time default).

  Moreover, question is if we really need to carry out the sync on *every*
  suspend even if it is not pointless overall.  That shouldn't really be
  necessary if we suspend and resume often enough or if we resume only for
  a while and then suspend again.  Maybe it should be rate limited somehow
  at least?
 
 For example, skip the sync if the system has been awake for  100 ms?

Yes, something like that.

 The cutoff time could also be controlled by the sysfs file: -1 =
 never sync, 0 = always sync,  0 = sync if the system has been awake 
 longer than the value.

That sounds like a good idea to me.

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-05 Thread Dave Chinner

On Sat, Jul 04, 2015 at 03:03:46AM +0200, Rafael J. Wysocki wrote:
 On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
  On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
The _vast_ majority of systems using Linux suspend today are under
an Android user-space.  Android has no assumption that that suspend to
mem will necessarily stay suspended for a long time.
   
Indeed, however your change was not android-specific, and it is not
comfortable on x86-style hardware and usage patterns.
   
   comfortable on x86-style and usage patterns?
   If you mean traditional instead of comfortable,
   where tradition is based on 10-year old systems, then sure.
  
  Even if this were true(*) we don't break things that currently work
  just because something different is just around the corner. e.g.
  if you shut the lid on your laptop and it suspends to RAM, you can
  pull the USB drive out that you just copied stuff to and plug it
  into another machine and find all the data you copied there is
  present.
  
  Remove the sync() from the freeze code, and this isn't guaranteed to
  work anymore. It is now dependent on userspace implementations for
  this to work, and we know what userspace developers will choose in
  this situation. i.e. fast and works for me, not safe for
  everyone.
  
  (*) Which it clearly isn't true because, as this example shows, my
  shiny new laptop still has exactly the same data integrity
  requirements as the laptop I was using 10 years ago.
  
  Just because there are lots of Android or Chrome out there it
  doesn't mean we can just ignore the requirements of everything
  else...
  
That said, as long as x86 will still try to safeguard my data during mem
sleep/resume as it does today, I have no strong feelings about
light/heavy-weight mem sleep being strictly a compile-time selectable
thing, or a more flexible runtime-selectable behavior.
   
   The observation here is that the kernel should not force every system
   to sys_sync() on every suspend.  The only question is how to best
   implement that.
  
  No, your observation was that sync is slow. Your *solution* is we
  need to remove sync.
 
 Not only slow, but pointless too.  The argument goes: It is slow and
 pointless and so it may be dropped.
 
 Now, I can agree that it wasn't clearly demonstrated that the unconditional
 sys_sync() in the suspend code path was pointless, but it also has never
 been clearly shown why it is not pointless on systems that suspend and resume
 reliably.

I just gave you an example of why sync is needed: Suspend, pull out
USB drive when putting laptop in bag.

 [The argument that the user can pull removable storage devices out of the
 system while suspended doesn't hold any water to me, because the user can
 pull them out of the system when not suspended just as well and cause the
 same kind of damage to happen.]

How many times have you forgotten to pull a usb stick out of a
laptop before putting it away? I've done that several times in the
past few months, and so I've *personally* pulled the USB sticks out
of suspended machines. This is a *common occurrence* and it
currently works just fine, so changing sync behaviour is something
that will directly impact me *as a user*.

Not to mention hybrid suspend (i.e write suspend image to disk, then
suspend to RAM for fast resume, but if power is lost resume from
disk image) both images have exactly the same filesystem state in
them and that is an absolute requirement for a hybrid suspend...

 Moreover, question is if we really need to carry out the sync on *every*
 suspend even if it is not pointless overall.  That shouldn't really be
 necessary if we suspend and resume often enough or if we resume only for
 a while and then suspend again.  Maybe it should be rate limited somehow
 at least?

If you suspend and resume frequently, then the cost of the sync
shoul dbe negliable because the amount of data dirtied between
resume/suspend shoul dbe negliable. hence my questions about where
sync is spending too much time, and whether we've actually fixed
those problems or not. If sync speed on clean filesystems is a
problem then we need to fix sync, not work around it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-03 Thread Rafael J. Wysocki

On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
> On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
> > >> The _vast_ majority of systems using Linux suspend today are under
> > >> an Android user-space.  Android has no assumption that that suspend to
> > >> mem will necessarily stay suspended for a long time.
> > >
> > > Indeed, however your change was not android-specific, and it is not
> > > "comfortable" on x86-style hardware and usage patterns.
> > 
> > "comfortable on x86-style and usage patterns"?
> > If you mean "traditional" instead of "comfortable",
> > where "tradition" is based on 10-year old systems, then sure.
> 
> Even if this were true(*) we don't break things that currently work
> just because something different is "just around the corner". e.g.
> if you shut the lid on your laptop and it suspends to RAM, you can
> pull the USB drive out that you just copied stuff to and plug it
> into another machine and find all the data you copied there is
> present.
> 
> Remove the sync() from the freeze code, and this isn't guaranteed to
> work anymore. It is now dependent on userspace implementations for
> this to work, and we know what userspace developers will choose in
> this situation. i.e. fast and "works for me", not "safe for
> everyone".
> 
> (*) Which it clearly isn't true because, as this example shows, my
> shiny new laptop still has exactly the same data integrity
> requirements as the laptop I was using 10 years ago.
> 
> Just because there are lots of Android or Chrome out there it
> doesn't mean we can just ignore the requirements of everything
> else...
> 
> > > That said, as long as x86 will still try to safeguard my data during mem
> > > sleep/resume as it does today, I have no strong feelings about
> > > light/heavy-weight "mem" sleep being strictly a compile-time selectable
> > > thing, or a more flexible runtime-selectable behavior.
> > 
> > The observation here is that the kernel should not force every system
> > to sys_sync() on every suspend.  The only question is how to best
> > implement that.
> 
> No, your observation was that "sync is slow". Your *solution* is "we
> need to remove sync".

Not only slow, but pointless too.  The argument goes: "It is slow and
pointless and so it may be dropped."

Now, I can agree that it wasn't clearly demonstrated that the unconditional
sys_sync() in the suspend code path was pointless, but it also has never
been clearly shown why it is not pointless on systems that suspend and resume
reliably.

[The argument that the user can pull removable storage devices out of the
system while suspended doesn't hold any water to me, because the user can
pull them out of the system when not suspended just as well and cause the
same kind of damage to happen.]

When we were adding it, the thinking was along the lines of "Well, suspend
isn't too reliable, so let's put sys_sync() in there to possibly reduce the
damage from suspend/resume crashes and suspend is slow anyway, so the possible
effect on performance from that shouldn't be noticeable".  Clearly, the world
has changed since then and suspend is far more reliable than it used to be in
general and it is not that slow too at least on some systems (especially the
suspend-to-idle flavor).

The only argument against dropping sys_sync() from the suspend code path
I've seen in this thread that I entirely agree with is that it may lead to
regressions, because we've done it practically forever and it may hide latent
bugs somewhere in block drivers etc.  Dropping it, though, is the only way
to see those bugs, if any, and if we want to ever fix them, we need to see
them.  That's why I think that it may be a good idea to allow people to
drop it if they are willing to accept some extra risk (via the kernel
command line, for example).

Moreover, question is if we really need to carry out the sync on *every*
suspend even if it is not pointless overall.  That shouldn't really be
necessary if we suspend and resume often enough or if we resume only for
a while and then suspend again.  Maybe it should be rate limited somehow
at least?

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-03 Thread Rafael J. Wysocki

On Friday, July 03, 2015 11:42:50 AM Dave Chinner wrote:
 On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
   The _vast_ majority of systems using Linux suspend today are under
   an Android user-space.  Android has no assumption that that suspend to
   mem will necessarily stay suspended for a long time.
  
   Indeed, however your change was not android-specific, and it is not
   comfortable on x86-style hardware and usage patterns.
  
  comfortable on x86-style and usage patterns?
  If you mean traditional instead of comfortable,
  where tradition is based on 10-year old systems, then sure.
 
 Even if this were true(*) we don't break things that currently work
 just because something different is just around the corner. e.g.
 if you shut the lid on your laptop and it suspends to RAM, you can
 pull the USB drive out that you just copied stuff to and plug it
 into another machine and find all the data you copied there is
 present.
 
 Remove the sync() from the freeze code, and this isn't guaranteed to
 work anymore. It is now dependent on userspace implementations for
 this to work, and we know what userspace developers will choose in
 this situation. i.e. fast and works for me, not safe for
 everyone.
 
 (*) Which it clearly isn't true because, as this example shows, my
 shiny new laptop still has exactly the same data integrity
 requirements as the laptop I was using 10 years ago.
 
 Just because there are lots of Android or Chrome out there it
 doesn't mean we can just ignore the requirements of everything
 else...
 
   That said, as long as x86 will still try to safeguard my data during mem
   sleep/resume as it does today, I have no strong feelings about
   light/heavy-weight mem sleep being strictly a compile-time selectable
   thing, or a more flexible runtime-selectable behavior.
  
  The observation here is that the kernel should not force every system
  to sys_sync() on every suspend.  The only question is how to best
  implement that.
 
 No, your observation was that sync is slow. Your *solution* is we
 need to remove sync.

Not only slow, but pointless too.  The argument goes: It is slow and
pointless and so it may be dropped.

Now, I can agree that it wasn't clearly demonstrated that the unconditional
sys_sync() in the suspend code path was pointless, but it also has never
been clearly shown why it is not pointless on systems that suspend and resume
reliably.

[The argument that the user can pull removable storage devices out of the
system while suspended doesn't hold any water to me, because the user can
pull them out of the system when not suspended just as well and cause the
same kind of damage to happen.]

When we were adding it, the thinking was along the lines of Well, suspend
isn't too reliable, so let's put sys_sync() in there to possibly reduce the
damage from suspend/resume crashes and suspend is slow anyway, so the possible
effect on performance from that shouldn't be noticeable.  Clearly, the world
has changed since then and suspend is far more reliable than it used to be in
general and it is not that slow too at least on some systems (especially the
suspend-to-idle flavor).

The only argument against dropping sys_sync() from the suspend code path
I've seen in this thread that I entirely agree with is that it may lead to
regressions, because we've done it practically forever and it may hide latent
bugs somewhere in block drivers etc.  Dropping it, though, is the only way
to see those bugs, if any, and if we want to ever fix them, we need to see
them.  That's why I think that it may be a good idea to allow people to
drop it if they are willing to accept some extra risk (via the kernel
command line, for example).

Moreover, question is if we really need to carry out the sync on *every*
suspend even if it is not pointless overall.  That shouldn't really be
necessary if we suspend and resume often enough or if we resume only for
a while and then suspend again.  Maybe it should be rate limited somehow
at least?

Thanks,
Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-02 Thread Dave Chinner

On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
> >> The _vast_ majority of systems using Linux suspend today are under
> >> an Android user-space.  Android has no assumption that that suspend to
> >> mem will necessarily stay suspended for a long time.
> >
> > Indeed, however your change was not android-specific, and it is not
> > "comfortable" on x86-style hardware and usage patterns.
> 
> "comfortable on x86-style and usage patterns"?
> If you mean "traditional" instead of "comfortable",
> where "tradition" is based on 10-year old systems, then sure.

Even if this were true(*) we don't break things that currently work
just because something different is "just around the corner". e.g.
if you shut the lid on your laptop and it suspends to RAM, you can
pull the USB drive out that you just copied stuff to and plug it
into another machine and find all the data you copied there is
present.

Remove the sync() from the freeze code, and this isn't guaranteed to
work anymore. It is now dependent on userspace implementations for
this to work, and we know what userspace developers will choose in
this situation. i.e. fast and "works for me", not "safe for
everyone".

(*) Which it clearly isn't true because, as this example shows, my
shiny new laptop still has exactly the same data integrity
requirements as the laptop I was using 10 years ago.

Just because there are lots of Android or Chrome out there it
doesn't mean we can just ignore the requirements of everything
else...

> > That said, as long as x86 will still try to safeguard my data during mem
> > sleep/resume as it does today, I have no strong feelings about
> > light/heavy-weight "mem" sleep being strictly a compile-time selectable
> > thing, or a more flexible runtime-selectable behavior.
> 
> The observation here is that the kernel should not force every system
> to sys_sync() on every suspend.  The only question is how to best
> implement that.

No, your observation was that "sync is slow". Your *solution* is "we
need to remove sync".

However, your arguement so far has these problems:

- repeated sync from outside the suspend context is does not
  demonstrate the problem you are seeing during suspend, and
  you have not yet identified why this is the case.
- it has been demonstrated that inode cache size plays a
  significant role in sync latency, but you haven't provided
  any information to tell us what the cache sizes were when
  you see large latencies.
- it has been demonstrated that there are patches pending
  that improve clean filesystem sync speed, but you have not
  produced numbers to demonstrate that those patches do not
  meet your requirements.
- In several tests your "sync latency" monitoring was
  dirtying the filesystem and hence *causing* the repeated
  syncs to be slow.
- you have not told us whether your suspend monitoring was
  the cause of the suspend sync latency or not.
- you have been testing on hardware with questionable power
  management behaviour.

IOWs, you have not yet identified the root cause of the slow sync
behaviour on suspend, you have not determined if pending work fixes
the latency problems, and you have not reproduced your results after
fixing  the flaws in your testing methodology.

> The obvious solution was to delete this forced policy
> from the kernel, and let user-space handle it.
> Rafael has not agreed to push that obvious, though less-than-gentle
> solution upstream, and so I'll re-send the historic patch
> that allows distros to still sync like it is 1999, if they want to:-)

Please stop shouting about "obvious" solutions until you've actually
proved there is a problem and that problems you find aren't already
fixed by the pending sync changes

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-02 Thread Dave Chinner

On Wed, Jul 01, 2015 at 11:07:29PM -0400, Len Brown wrote:
  The _vast_ majority of systems using Linux suspend today are under
  an Android user-space.  Android has no assumption that that suspend to
  mem will necessarily stay suspended for a long time.
 
  Indeed, however your change was not android-specific, and it is not
  comfortable on x86-style hardware and usage patterns.
 
 comfortable on x86-style and usage patterns?
 If you mean traditional instead of comfortable,
 where tradition is based on 10-year old systems, then sure.

Even if this were true(*) we don't break things that currently work
just because something different is just around the corner. e.g.
if you shut the lid on your laptop and it suspends to RAM, you can
pull the USB drive out that you just copied stuff to and plug it
into another machine and find all the data you copied there is
present.

Remove the sync() from the freeze code, and this isn't guaranteed to
work anymore. It is now dependent on userspace implementations for
this to work, and we know what userspace developers will choose in
this situation. i.e. fast and works for me, not safe for
everyone.

(*) Which it clearly isn't true because, as this example shows, my
shiny new laptop still has exactly the same data integrity
requirements as the laptop I was using 10 years ago.

Just because there are lots of Android or Chrome out there it
doesn't mean we can just ignore the requirements of everything
else...

  That said, as long as x86 will still try to safeguard my data during mem
  sleep/resume as it does today, I have no strong feelings about
  light/heavy-weight mem sleep being strictly a compile-time selectable
  thing, or a more flexible runtime-selectable behavior.
 
 The observation here is that the kernel should not force every system
 to sys_sync() on every suspend.  The only question is how to best
 implement that.

No, your observation was that sync is slow. Your *solution* is we
need to remove sync.

However, your arguement so far has these problems:

- repeated sync from outside the suspend context is does not
  demonstrate the problem you are seeing during suspend, and
  you have not yet identified why this is the case.
- it has been demonstrated that inode cache size plays a
  significant role in sync latency, but you haven't provided
  any information to tell us what the cache sizes were when
  you see large latencies.
- it has been demonstrated that there are patches pending
  that improve clean filesystem sync speed, but you have not
  produced numbers to demonstrate that those patches do not
  meet your requirements.
- In several tests your sync latency monitoring was
  dirtying the filesystem and hence *causing* the repeated
  syncs to be slow.
- you have not told us whether your suspend monitoring was
  the cause of the suspend sync latency or not.
- you have been testing on hardware with questionable power
  management behaviour.

IOWs, you have not yet identified the root cause of the slow sync
behaviour on suspend, you have not determined if pending work fixes
the latency problems, and you have not reproduced your results after
fixing  the flaws in your testing methodology.

 The obvious solution was to delete this forced policy
 from the kernel, and let user-space handle it.
 Rafael has not agreed to push that obvious, though less-than-gentle
 solution upstream, and so I'll re-send the historic patch
 that allows distros to still sync like it is 1999, if they want to:-)

Please stop shouting about obvious solutions until you've actually
proved there is a problem and that problems you find aren't already
fixed by the pending sync changes

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-01 Thread Len Brown

>> The _vast_ majority of systems using Linux suspend today are under
>> an Android user-space.  Android has no assumption that that suspend to
>> mem will necessarily stay suspended for a long time.
>
> Indeed, however your change was not android-specific, and it is not
> "comfortable" on x86-style hardware and usage patterns.

"comfortable on x86-style and usage patterns"?
If you mean "traditional" instead of "comfortable",
where "tradition" is based on 10-year old systems, then sure.

But today, my x86 Android tablet is quite "comfortable"
without a sys_sync() in the kernel suspend path.

No, this isn't Android specific, Android is just the highest-volume demand,
making it an obvious example.

Chrome is the #1 selling "x86-style" clamshell laptop.
Chrome is not only "comfortable" with fast suspend/resume,
the Chrome developers demand it.

> That said, as long as x86 will still try to safeguard my data during mem
> sleep/resume as it does today, I have no strong feelings about
> light/heavy-weight "mem" sleep being strictly a compile-time selectable
> thing, or a more flexible runtime-selectable behavior.

The observation here is that the kernel should not force every system
to sys_sync() on every suspend.  The only question is how to best
implement that.  The obvious solution was to delete this forced policy
from the kernel, and let user-space handle it.
Rafael has not agreed to push that obvious, though less-than-gentle
solution upstream, and so I'll re-send the historic patch
that allows distros to still sync like it is 1999, if they want to:-)

thanks,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-01 Thread Henrique de Moraes Holschuh

On Tue, Jun 30, 2015, at 17:04, Len Brown wrote:
> > Entering "mem" suspend mode through sysfs currently has the implied meaning
> > of "prepare the *entire* system to stay on a powered down state for
> > pontentially a _long_ time", where long means "certainly more than 10
> > seconds" ;-) This is unlikely to be written anywhere, of course, that's just
> > how it was used by the vast majority for years, at least on traditional
> > server/desktop/laptop platforms such as x86.
> 
> The _vast_ majority of systems using Linux suspend today are under
> an Android user-space.  Android has no assumption that that suspend to
> mem will necessarily stay suspended for a long time.

Indeed, however your change was not android-specific, and it is not
"comfortable" on x86-style hardware and usage patterns.

> > IMO, we would actually benefit from *adding* new system-wide sleep/suspend
> > modes that are optimized for oportunistic, short-lived system-wide sleep
> > cycles (aka "catnap") that is fast to enter and exit from, and which will be
> > triggered very frequently, instead of trying to change the assumptions and
> > expected behavior of the current "deep-sleep" mode...
> 
> Thank you for sharing your opinion.
> 
> I am going to give up trying to change your mind, and those who share
> your view.  I plan to revive my patch from 2014 which
> makes sys_sync() optional.  That will not change the historic behavior,
> and will still allow everybody to do what they want.
> Rafael has said that he can live with the resulting kernel clutter.

...

> BTW. the answer does not appear to be creating a new system sleep state.
> Android invokes "mem", and they don't seem excited about teaching
> user-space that runs on multiple platforms that  what used to be a "mem" and 
> no
> "freeze" could be a "mem" plus a "freeze", or a "freeze" and no "mem".

Hmm, maybe we could:

1. Make the behavior of "mem" configurable (select default at compile
time, allow it to be changed at runtime).

2. Add a way to always enter the "heavyweight" (x86-style) mem sleep in
platforms where it exists.

3. Add a way to always enter the "light" (android-style) mem sleep in
platforms where it exists.

And make (2) and (3) optional (as in: you can compile out the clutter). 
That at least provides a way forward for userspace, at the price of more
gunk on the kernel side.

We already have a lot of stuff that works that way (idle and freq
governors, io elevators, etc).

That said, as long as x86 will still try to safeguard my data during mem
sleep/resume as it does today, I have no strong feelings about
light/heavy-weight "mem" sleep being strictly a compile-time selectable
thing, or a more flexible runtime-selectable behavior.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-01 Thread Henrique de Moraes Holschuh

On Tue, Jun 30, 2015, at 17:04, Len Brown wrote:
  Entering mem suspend mode through sysfs currently has the implied meaning
  of prepare the *entire* system to stay on a powered down state for
  pontentially a _long_ time, where long means certainly more than 10
  seconds ;-) This is unlikely to be written anywhere, of course, that's just
  how it was used by the vast majority for years, at least on traditional
  server/desktop/laptop platforms such as x86.
 
 The _vast_ majority of systems using Linux suspend today are under
 an Android user-space.  Android has no assumption that that suspend to
 mem will necessarily stay suspended for a long time.

Indeed, however your change was not android-specific, and it is not
comfortable on x86-style hardware and usage patterns.

  IMO, we would actually benefit from *adding* new system-wide sleep/suspend
  modes that are optimized for oportunistic, short-lived system-wide sleep
  cycles (aka catnap) that is fast to enter and exit from, and which will be
  triggered very frequently, instead of trying to change the assumptions and
  expected behavior of the current deep-sleep mode...
 
 Thank you for sharing your opinion.
 
 I am going to give up trying to change your mind, and those who share
 your view.  I plan to revive my patch from 2014 which
 makes sys_sync() optional.  That will not change the historic behavior,
 and will still allow everybody to do what they want.
 Rafael has said that he can live with the resulting kernel clutter.

...

 BTW. the answer does not appear to be creating a new system sleep state.
 Android invokes mem, and they don't seem excited about teaching
 user-space that runs on multiple platforms that  what used to be a mem and 
 no
 freeze could be a mem plus a freeze, or a freeze and no mem.

Hmm, maybe we could:

1. Make the behavior of mem configurable (select default at compile
time, allow it to be changed at runtime).

2. Add a way to always enter the heavyweight (x86-style) mem sleep in
platforms where it exists.

3. Add a way to always enter the light (android-style) mem sleep in
platforms where it exists.

And make (2) and (3) optional (as in: you can compile out the clutter). 
That at least provides a way forward for userspace, at the price of more
gunk on the kernel side.

We already have a lot of stuff that works that way (idle and freq
governors, io elevators, etc).

That said, as long as x86 will still try to safeguard my data during mem
sleep/resume as it does today, I have no strong feelings about
light/heavy-weight mem sleep being strictly a compile-time selectable
thing, or a more flexible runtime-selectable behavior.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-07-01 Thread Len Brown

 The _vast_ majority of systems using Linux suspend today are under
 an Android user-space.  Android has no assumption that that suspend to
 mem will necessarily stay suspended for a long time.

 Indeed, however your change was not android-specific, and it is not
 comfortable on x86-style hardware and usage patterns.

comfortable on x86-style and usage patterns?
If you mean traditional instead of comfortable,
where tradition is based on 10-year old systems, then sure.

But today, my x86 Android tablet is quite comfortable
without a sys_sync() in the kernel suspend path.

No, this isn't Android specific, Android is just the highest-volume demand,
making it an obvious example.

Chrome is the #1 selling x86-style clamshell laptop.
Chrome is not only comfortable with fast suspend/resume,
the Chrome developers demand it.

 That said, as long as x86 will still try to safeguard my data during mem
 sleep/resume as it does today, I have no strong feelings about
 light/heavy-weight mem sleep being strictly a compile-time selectable
 thing, or a more flexible runtime-selectable behavior.

The observation here is that the kernel should not force every system
to sys_sync() on every suspend.  The only question is how to best
implement that.  The obvious solution was to delete this forced policy
from the kernel, and let user-space handle it.
Rafael has not agreed to push that obvious, though less-than-gentle
solution upstream, and so I'll re-send the historic patch
that allows distros to still sync like it is 1999, if they want to:-)

thanks,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-30 Thread Len Brown

On Thu, Jun 25, 2015 at 1:11 PM, Henrique de Moraes Holschuh
 wrote:
> On Mon, 11 May 2015, Len Brown wrote:
>> On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh
>>  wrote:
>> > On Sat, 09 May 2015, Alan Stern wrote:
>> >> On Fri, 8 May 2015, Rafael J. Wysocki wrote:
>> >> > My current view on that is that whether or not to do a sync() before 
>> >> > suspending
>> >> > ultimately is a policy decision and should belong to user space as such 
>> >> > (modulo
>> >> > the autosleep situation when user space may not know when the suspend 
>> >> > is going
>> >> > to happen).
>> >> >
>> >> > Moreover, user space is free to do as many sync()s before suspending as 
>> >> > it
>> >> > wants to and the question here is whether or not the *kernel* should 
>> >> > sync()
>> >> > in the suspend code path.
>> >> >
>> >> > Since we pretty much can demonstrate that having just one sync() in 
>> >> > there is
>> >> > not sufficient in general, should we put two of them in there?  Or just
>> >> > remove the existing one and leave it to user space entirely?
>> >>
>> >> I don't know about the advantages of one sync over two.  But how about
>> >> adding a "syncs_before_suspend" (or just "syncs") sysfs attribute that
>> >> takes a small numeric value?  The default can be 0, and the user could
>> >> set it to 1 or 2 (or higher).
>> >
>> > IMO it would be much safer to both have that knob, and to set it to keep 
>> > the
>> > current behavior as the default.  Userspace will adapt and change that knob
>> > to whatever is sufficient based on what it does before signaling the kernel
>> > to suspend.
>> >
>> > A regression in sync-before-suspend is sure to cause data loss episodes,
>> > after all.  And, as far as bikeshedding goes, IMHO syncs_before_suspend is
>> > self-explanatory, which would be a very good reason to use it instead of 
>> > the
>> > shorter requires-you-to-know-what-it-is-about "syncs".
>>
>> When I first thought about this, I had a similar view:
>> https://lkml.org/lkml/2014/1/23/45
>>
>> But upon reflection, I do not believe that the kernel is adding value
>> here, instead it is imposing a policy, and that policy decision is
>> sometimes prohibitively expensive.  User-space can do this for itself (and
>> in the case of desktop distros, already does), and so the kernel should
>> butt-out.
>
> There is a lot of added value in my filesystems not being trashed by
> sleep/resume issues on laptops, IMHO, and the reason why we need the kernel
> itself to take care of syncing and freezing filesystems has been explained
> elsewhere in this thread.
> I thoght for a while before replying, and I think the real issue behind this
> thread is the want of a change of expected-but-implied semanthics and
> behavior for the system-wide sleep-to-memory trigger, to adequate it to a
> new reality for newer classes of devices.
>
> Entering "mem" suspend mode through sysfs currently has the implied meaning
> of "prepare the *entire* system to stay on a powered down state for
> pontentially a _long_ time", where long means "certainly more than 10
> seconds" ;-) This is unlikely to be written anywhere, of course, that's just
> how it was used by the vast majority for years, at least on traditional
> server/desktop/laptop platforms such as x86.

The _vast_ majority of systems using Linux suspend today are under
an Android user-space.  Android has no assumption that that suspend to mem
will necessarily stay suspended for a long time.

The wake-on-packet use-case is generally a much shorter suspend duration,
and more frequent number of suspend/resume cycles
than the "wake on lid-open or button-press" use case.

> On those platforms, we have to assume the user might plug/unplug devices,
> that the power supply might shut down while we're sleeping, that the entire
> process is not painless and has a reasonable chance of misbehaving (crashes
> on sleep/resume are _really_ common), etc.

The fact is that sys_sync() in the kernel suspend to mem path is too expensive
for devices which suspend/resume quickly, routinely, and reliably.

The reality that somebody can stick a broken device into their desktop
or laptop and break suspend doesn't change that.

> What is the safe and proper thing to do in that situation is not necessarily
> the best way to go about it when you actually want a somewhat different
> behavior, i.e. to "prepare the system to stay on a powered down state for a
> short while, and be very fast because this could happen at a very high
> frequency"...
>
> IMO, we would actually benefit from *adding* new system-wide sleep/suspend
> modes that are optimized for oportunistic, short-lived system-wide sleep
> cycles (aka "catnap") that is fast to enter and exit from, and which will be
> triggered very frequently, instead of trying to change the assumptions and
> expected behavior of the current "deep-sleep" mode...

Thank you for sharing your opinion.

I am going to give up trying to change your mind, and those who share
your

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-30 Thread Len Brown

On Thu, Jun 25, 2015 at 1:11 PM, Henrique de Moraes Holschuh
h...@hmh.eng.br wrote:
 On Mon, 11 May 2015, Len Brown wrote:
 On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh
 h...@hmh.eng.br wrote:
  On Sat, 09 May 2015, Alan Stern wrote:
  On Fri, 8 May 2015, Rafael J. Wysocki wrote:
   My current view on that is that whether or not to do a sync() before 
   suspending
   ultimately is a policy decision and should belong to user space as such 
   (modulo
   the autosleep situation when user space may not know when the suspend 
   is going
   to happen).
  
   Moreover, user space is free to do as many sync()s before suspending as 
   it
   wants to and the question here is whether or not the *kernel* should 
   sync()
   in the suspend code path.
  
   Since we pretty much can demonstrate that having just one sync() in 
   there is
   not sufficient in general, should we put two of them in there?  Or just
   remove the existing one and leave it to user space entirely?
 
  I don't know about the advantages of one sync over two.  But how about
  adding a syncs_before_suspend (or just syncs) sysfs attribute that
  takes a small numeric value?  The default can be 0, and the user could
  set it to 1 or 2 (or higher).
 
  IMO it would be much safer to both have that knob, and to set it to keep 
  the
  current behavior as the default.  Userspace will adapt and change that knob
  to whatever is sufficient based on what it does before signaling the kernel
  to suspend.
 
  A regression in sync-before-suspend is sure to cause data loss episodes,
  after all.  And, as far as bikeshedding goes, IMHO syncs_before_suspend is
  self-explanatory, which would be a very good reason to use it instead of 
  the
  shorter requires-you-to-know-what-it-is-about syncs.

 When I first thought about this, I had a similar view:
 https://lkml.org/lkml/2014/1/23/45

 But upon reflection, I do not believe that the kernel is adding value
 here, instead it is imposing a policy, and that policy decision is
 sometimes prohibitively expensive.  User-space can do this for itself (and
 in the case of desktop distros, already does), and so the kernel should
 butt-out.

 There is a lot of added value in my filesystems not being trashed by
 sleep/resume issues on laptops, IMHO, and the reason why we need the kernel
 itself to take care of syncing and freezing filesystems has been explained
 elsewhere in this thread.
 I thoght for a while before replying, and I think the real issue behind this
 thread is the want of a change of expected-but-implied semanthics and
 behavior for the system-wide sleep-to-memory trigger, to adequate it to a
 new reality for newer classes of devices.

 Entering mem suspend mode through sysfs currently has the implied meaning
 of prepare the *entire* system to stay on a powered down state for
 pontentially a _long_ time, where long means certainly more than 10
 seconds ;-) This is unlikely to be written anywhere, of course, that's just
 how it was used by the vast majority for years, at least on traditional
 server/desktop/laptop platforms such as x86.

The _vast_ majority of systems using Linux suspend today are under
an Android user-space.  Android has no assumption that that suspend to mem
will necessarily stay suspended for a long time.

The wake-on-packet use-case is generally a much shorter suspend duration,
and more frequent number of suspend/resume cycles
than the wake on lid-open or button-press use case.

 On those platforms, we have to assume the user might plug/unplug devices,
 that the power supply might shut down while we're sleeping, that the entire
 process is not painless and has a reasonable chance of misbehaving (crashes
 on sleep/resume are _really_ common), etc.

The fact is that sys_sync() in the kernel suspend to mem path is too expensive
for devices which suspend/resume quickly, routinely, and reliably.

The reality that somebody can stick a broken device into their desktop
or laptop and break suspend doesn't change that.

 What is the safe and proper thing to do in that situation is not necessarily
 the best way to go about it when you actually want a somewhat different
 behavior, i.e. to prepare the system to stay on a powered down state for a
 short while, and be very fast because this could happen at a very high
 frequency...

 IMO, we would actually benefit from *adding* new system-wide sleep/suspend
 modes that are optimized for oportunistic, short-lived system-wide sleep
 cycles (aka catnap) that is fast to enter and exit from, and which will be
 triggered very frequently, instead of trying to change the assumptions and
 expected behavior of the current deep-sleep mode...

Thank you for sharing your opinion.

I am going to give up trying to change your mind, and those who share
your view.  I plan to revive my patch from 2014 which
makes sys_sync() optional.  That will not change the historic behavior,
and will still allow everybody to do what they want.
Rafael has said that

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-25 Thread Henrique de Moraes Holschuh

On Mon, 11 May 2015, Len Brown wrote:
> On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh
>  wrote:
> > On Sat, 09 May 2015, Alan Stern wrote:
> >> On Fri, 8 May 2015, Rafael J. Wysocki wrote:
> >> > My current view on that is that whether or not to do a sync() before 
> >> > suspending
> >> > ultimately is a policy decision and should belong to user space as such 
> >> > (modulo
> >> > the autosleep situation when user space may not know when the suspend is 
> >> > going
> >> > to happen).
> >> >
> >> > Moreover, user space is free to do as many sync()s before suspending as 
> >> > it
> >> > wants to and the question here is whether or not the *kernel* should 
> >> > sync()
> >> > in the suspend code path.
> >> >
> >> > Since we pretty much can demonstrate that having just one sync() in 
> >> > there is
> >> > not sufficient in general, should we put two of them in there?  Or just
> >> > remove the existing one and leave it to user space entirely?
> >>
> >> I don't know about the advantages of one sync over two.  But how about
> >> adding a "syncs_before_suspend" (or just "syncs") sysfs attribute that
> >> takes a small numeric value?  The default can be 0, and the user could
> >> set it to 1 or 2 (or higher).
> >
> > IMO it would be much safer to both have that knob, and to set it to keep the
> > current behavior as the default.  Userspace will adapt and change that knob
> > to whatever is sufficient based on what it does before signaling the kernel
> > to suspend.
> >
> > A regression in sync-before-suspend is sure to cause data loss episodes,
> > after all.  And, as far as bikeshedding goes, IMHO syncs_before_suspend is
> > self-explanatory, which would be a very good reason to use it instead of the
> > shorter requires-you-to-know-what-it-is-about "syncs".
> 
> When I first thought about this, I had a similar view:
> https://lkml.org/lkml/2014/1/23/45
> 
> But upon reflection, I do not believe that the kernel is adding value
> here, instead it is imposing a policy, and that policy decision is
> sometimes prohibitively expensive.  User-space can do this for itself (and
> in the case of desktop distros, already does), and so the kernel should
> butt-out.

There is a lot of added value in my filesystems not being trashed by
sleep/resume issues on laptops, IMHO, and the reason why we need the kernel
itself to take care of syncing and freezing filesystems has been explained
elsewhere in this thread.

I thoght for a while before replying, and I think the real issue behind this
thread is the want of a change of expected-but-implied semanthics and
behavior for the system-wide sleep-to-memory trigger, to adequate it to a
new reality for newer classes of devices.

Entering "mem" suspend mode through sysfs currently has the implied meaning
of "prepare the *entire* system to stay on a powered down state for
pontentially a _long_ time", where long means "certainly more than 10
seconds" ;-) This is unlikely to be written anywhere, of course, that's just
how it was used by the vast majority for years, at least on traditional
server/desktop/laptop platforms such as x86.

On those platforms, we have to assume the user might plug/unplug devices,
that the power supply might shut down while we're sleeping, that the entire
process is not painless and has a reasonable chance of misbehaving (crashes
on sleep/resume are _really_ common), etc.

What is the safe and proper thing to do in that situation is not necessarily
the best way to go about it when you actually want a somewhat different
behavior, i.e. to "prepare the system to stay on a powered down state for a
short while, and be very fast because this could happen at a very high
frequency"...

IMO, we would actually benefit from *adding* new system-wide sleep/suspend
modes that are optimized for oportunistic, short-lived system-wide sleep
cycles (aka "catnap") that is fast to enter and exit from, and which will be
triggered very frequently, instead of trying to change the assumptions and
expected behavior of the current "deep-sleep" mode...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-25 Thread Henrique de Moraes Holschuh

On Mon, 11 May 2015, Len Brown wrote:
 On Sat, May 9, 2015 at 4:25 PM, Henrique de Moraes Holschuh
 h...@hmh.eng.br wrote:
  On Sat, 09 May 2015, Alan Stern wrote:
  On Fri, 8 May 2015, Rafael J. Wysocki wrote:
   My current view on that is that whether or not to do a sync() before 
   suspending
   ultimately is a policy decision and should belong to user space as such 
   (modulo
   the autosleep situation when user space may not know when the suspend is 
   going
   to happen).
  
   Moreover, user space is free to do as many sync()s before suspending as 
   it
   wants to and the question here is whether or not the *kernel* should 
   sync()
   in the suspend code path.
  
   Since we pretty much can demonstrate that having just one sync() in 
   there is
   not sufficient in general, should we put two of them in there?  Or just
   remove the existing one and leave it to user space entirely?
 
  I don't know about the advantages of one sync over two.  But how about
  adding a syncs_before_suspend (or just syncs) sysfs attribute that
  takes a small numeric value?  The default can be 0, and the user could
  set it to 1 or 2 (or higher).
 
  IMO it would be much safer to both have that knob, and to set it to keep the
  current behavior as the default.  Userspace will adapt and change that knob
  to whatever is sufficient based on what it does before signaling the kernel
  to suspend.
 
  A regression in sync-before-suspend is sure to cause data loss episodes,
  after all.  And, as far as bikeshedding goes, IMHO syncs_before_suspend is
  self-explanatory, which would be a very good reason to use it instead of the
  shorter requires-you-to-know-what-it-is-about syncs.
 
 When I first thought about this, I had a similar view:
 https://lkml.org/lkml/2014/1/23/45
 
 But upon reflection, I do not believe that the kernel is adding value
 here, instead it is imposing a policy, and that policy decision is
 sometimes prohibitively expensive.  User-space can do this for itself (and
 in the case of desktop distros, already does), and so the kernel should
 butt-out.

There is a lot of added value in my filesystems not being trashed by
sleep/resume issues on laptops, IMHO, and the reason why we need the kernel
itself to take care of syncing and freezing filesystems has been explained
elsewhere in this thread.

I thoght for a while before replying, and I think the real issue behind this
thread is the want of a change of expected-but-implied semanthics and
behavior for the system-wide sleep-to-memory trigger, to adequate it to a
new reality for newer classes of devices.

Entering mem suspend mode through sysfs currently has the implied meaning
of prepare the *entire* system to stay on a powered down state for
pontentially a _long_ time, where long means certainly more than 10
seconds ;-) This is unlikely to be written anywhere, of course, that's just
how it was used by the vast majority for years, at least on traditional
server/desktop/laptop platforms such as x86.

On those platforms, we have to assume the user might plug/unplug devices,
that the power supply might shut down while we're sleeping, that the entire
process is not painless and has a reasonable chance of misbehaving (crashes
on sleep/resume are _really_ common), etc.

What is the safe and proper thing to do in that situation is not necessarily
the best way to go about it when you actually want a somewhat different
behavior, i.e. to prepare the system to stay on a powered down state for a
short while, and be very fast because this could happen at a very high
frequency...

IMO, we would actually benefit from *adding* new system-wide sleep/suspend
modes that are optimized for oportunistic, short-lived system-wide sleep
cycles (aka catnap) that is fast to enter and exit from, and which will be
triggered very frequently, instead of trying to change the assumptions and
expected behavior of the current deep-sleep mode...

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-19 Thread Len Brown

>> Putting a function trace on sys_sync and executing sync manually,
>> I was able to see it take 100ms,
>> though function trace itself could be contributing to that...
>
> It would seem that way - you need to get the traces to dump to
> something that has no sync overhead

I don't think that ftrace buffers have sync overhead --
at least not until they are copied into files.

>> Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 
>> 300ms.
>> I've seen that in some other results.  Sometimes sync if fast, but sd
>> then more than makes up for it by being slow:-(
>
> Oh, I see that too. Normally That's because the filesystem hasn't
> been told to enter an idle state and so is doing metadata writeback
> IO after the sync. When that happens the sd suspend has wait for
> request queues to drain, IO to complete and device caches to flush.
> This simply cannot be avoided because suspend never tells the
> filesytems to enter an idle state

I captured a trace of a slow sd suspend.
Apparently, it does two operations -- first  a sync_cache,
then a stop operation.  The sync was fast.   The stop command
was where all the time went.

I'll look at a more modern drive on the same system next week,
just for comparison.

> i.e. remember what I said initially in this thread about suspend
> actually needing to freeze filesystems, not just sync them?

I think with the complexity of file systems and the underlying
devices, yes, we need to think about how to efficiently
and safely suspend/resume them.

But sys_sync is too expensive to have hard-coded in the kernel suspend path.
Some machines can suspend and resume in under 10ms -- they
absolutely do not want sys_sync() hard-coded in the suspend path.

>> FYI,
>> I ran analyze_suspend.py -x2
>> from current directory /tmp, which is mounted on tmpfs,
>> but still found the 2nd sync was very slow -- 200ms
>> vs 6 - 20 ms for the sync preceding the 1st suspend.
>
> So where did that time go? As I pointed out previously, function
> trace will only tell us if the delay is data writeback or not. We
> seem to have confirmed that the delay is, indeed, writeback of dirty
> data. Now we need to identify what the dirty data belongs to: we
> need to trace individual writeback events to see what dirty inodes
> are actually being written.

I expect that analyze_suspend.py is moving data around between
the back-to-back suspends when the -x2 option is used -- will look into it.

thanks,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-19 Thread Dave Chinner

On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote:
> > Can you repeat this test on your system, so that we can determine if
> > the 5ms ""sync time" is actually just the overhead of inode cache
> > traversal? If that is the case, the speed of sync on a clean
> > filesystem is already a solved problem - the patchset should be
> > merged in the 4.2 cycle
> 
> Yes, drop_caches does seem to help repeated sync on this system:
> Exactly what patch series does this?  I'm running ext4 (the default,
> not btrfs)

None. It's the current behaviour of sync that is ends up walking the
inode cache in it's entirity to find dirty inodes that need to be
waited on. That's what the sync scalability patch series I pointed
you at fixes - sync then keeps a "dirty inodes that need to be
waited on list" instead of doing a cache traversal to find them.
i.e. the "no cache" results you see will soon be the behaviour sync
has regardless of the size of the inode cache.

> [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
> ext4_inode_cache3536   3536   1008   164 : tunables00
>   0 : slabdata221221  0

That's actually a really small cache to begin with.

> > This is the problem we really need to reproduce and track down.
> 
> Putting a function trace on sys_sync and executing sync manually,
> I was able to see it take 100ms,
> though function trace itself could be contributing to that...

It would seem that way - you need to get the traces to dump to
something that has no sync overhead

> running analyze_suspend.py after the slab tweak above didn't change much.
> in one run sync was 20ms (out of a total suspend time of 60ms).

Which may be because the inode cache was larger?

> Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms.
> I've seen that in some other results.  Sometimes sync if fast, but sd
> then more than makes up for it by being slow:-(

Oh, I see that too. Normally That's because the filesystem hasn't
been told to enter an idle state and so is doing metadata writeback
IO after the sync. When that happens the sd suspend has wait for
request queues to drain, IO to complete and device caches to flush.
This simply cannot be avoided because suspend never tells the
filesytems to enter an idle state

i.e. remember what I said initially in this thread about suspend
actually needing to freeze filesystems, not just sync them?

> FYI,
> I ran analyze_suspend.py -x2
> from current directory /tmp, which is mounted on tmpfs,
> but still found the 2nd sync was very slow -- 200ms
> vs 6 - 20 ms for the sync preceding the 1st suspend.

So where did that time go? As I pointed out previously, function
trace will only tell us if the delay is data writeback or not. We
seem to have confirmed that the delay is, indeed, writeback of dirty
data. Now we need to identify what the dirty data belongs to: we
need to trace individual writeback events to see what dirty inodes
are actually being written.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-19 Thread Len Brown

> Can you repeat this test on your system, so that we can determine if
> the 5ms ""sync time" is actually just the overhead of inode cache
> traversal? If that is the case, the speed of sync on a clean
> filesystem is already a solved problem - the patchset should be
> merged in the 4.2 cycle

Yes, drop_caches does seem to help repeated sync on this system:
Exactly what patch series does this?  I'm running ext4 (the default,
not btrfs)

[lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sleep 0 ; done

real0m0.002s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.001s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.000s
sys 0m0.001s
[lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done

real0m0.004s
user0m0.000s
sys 0m0.003s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.003s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s
[lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
ext4_inode_cache3536   3536   1008   164 : tunables00
  0 : slabdata221221  0
[lenb@d975xbx ~]$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches "
[lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
ext4_inode_cache 553   1680   1008   164 : tunables00
  0 : slabdata105105  0
[lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done

real0m0.002s
user0m0.000s
sys 0m0.001s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.002s


>> While they were all to slow, none of them were
>>  O(500ms), so yes, there
>> does seem to be some state change
>> that causes the 2nd sync after a resume to be especially slow.
>>
>> Unfortunately, I've not got an ftrace on the 500ms flavor yet.
>
> This is the problem we really need to reproduce and track down.

Putting a function trace on sys_sync and executing sync manually,
I was able to see it take 100ms,
though function trace itself could be contributing to that...

[lenb@d975xbx ~]$ grep \# trace.20150619_013702
# tracer: function_graph
#
# TIMECPU  DURATION  FUNCTION CALLS
#  |  | |   | |   |   |   |
  374.665063 |   0) # 2229.612 us |} /* __schedule */
  374.665064 |   0) # 2230.571 us |  } /* schedule */
  374.665064 |   0) # 2231.494 us |} /* schedule_timeout */
  374.665065 |   0) # 2235.937 us |  } /* wait_for_completion */
  374.745616 |   0) # 80518.73 us |  } /* __schedule */
  374.745616 |   0) # 80519.47 us |} /* schedule */
  374.745617 |   0) # 80520.28 us |  } /*
schedule_timeout */
  374.745621 |   0) # 80526.38 us |} /*
io_schedule_timeout */
  374.745621 |   0) # 80527.23 us |  } /* bit_wait_io */
  374.745622 |   0) # 80531.04 us |} /* __wait_on_bit */
  374.745623 |   0) # 80531.95 us |  } /* wait_on_page_bit */
  374.745644 |   0) # 80555.58 us |} /* filemap_fdatawait_range */
  374.745644 |   0) # 80556.36 us |  } /* filemap_fdatawait */
  374.748029 |   0) # 1300.848 us |  } /* __schedule */
  374.748029 |   0) # 1301.376 us |} /* schedule */
  374.748029 |   0) # 1301.923 us |  } /*
schedule_timeout */
  374.748032 |   0) # 1306.133 us |} /*
io_schedule_timeout */
  374.748032 |   0) # 1306.651 us |  } /* bit_wait_io */
  374.748033 |   0) # 1309.298 us |} /* __wait_on_bit */
  374.748033 |   0) # 1309.838 us |  } /* wait_on_page_bit */
  374.750502 |   0) # 1099.379 us |  } /*

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-19 Thread Dave Chinner

On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote:
  Can you repeat this test on your system, so that we can determine if
  the 5ms sync time is actually just the overhead of inode cache
  traversal? If that is the case, the speed of sync on a clean
  filesystem is already a solved problem - the patchset should be
  merged in the 4.2 cycle
 
 Yes, drop_caches does seem to help repeated sync on this system:
 Exactly what patch series does this?  I'm running ext4 (the default,
 not btrfs)

None. It's the current behaviour of sync that is ends up walking the
inode cache in it's entirity to find dirty inodes that need to be
waited on. That's what the sync scalability patch series I pointed
you at fixes - sync then keeps a dirty inodes that need to be
waited on list instead of doing a cache traversal to find them.
i.e. the no cache results you see will soon be the behaviour sync
has regardless of the size of the inode cache.

 [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
 ext4_inode_cache3536   3536   1008   164 : tunables00
   0 : slabdata221221  0

That's actually a really small cache to begin with.

  This is the problem we really need to reproduce and track down.
 
 Putting a function trace on sys_sync and executing sync manually,
 I was able to see it take 100ms,
 though function trace itself could be contributing to that...

It would seem that way - you need to get the traces to dump to
something that has no sync overhead

 running analyze_suspend.py after the slab tweak above didn't change much.
 in one run sync was 20ms (out of a total suspend time of 60ms).

Which may be because the inode cache was larger?

 Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms.
 I've seen that in some other results.  Sometimes sync if fast, but sd
 then more than makes up for it by being slow:-(

Oh, I see that too. Normally That's because the filesystem hasn't
been told to enter an idle state and so is doing metadata writeback
IO after the sync. When that happens the sd suspend has wait for
request queues to drain, IO to complete and device caches to flush.
This simply cannot be avoided because suspend never tells the
filesytems to enter an idle state

i.e. remember what I said initially in this thread about suspend
actually needing to freeze filesystems, not just sync them?

 FYI,
 I ran analyze_suspend.py -x2
 from current directory /tmp, which is mounted on tmpfs,
 but still found the 2nd sync was very slow -- 200ms
 vs 6 - 20 ms for the sync preceding the 1st suspend.

So where did that time go? As I pointed out previously, function
trace will only tell us if the delay is data writeback or not. We
seem to have confirmed that the delay is, indeed, writeback of dirty
data. Now we need to identify what the dirty data belongs to: we
need to trace individual writeback events to see what dirty inodes
are actually being written.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-19 Thread Len Brown

 Putting a function trace on sys_sync and executing sync manually,
 I was able to see it take 100ms,
 though function trace itself could be contributing to that...

 It would seem that way - you need to get the traces to dump to
 something that has no sync overhead

I don't think that ftrace buffers have sync overhead --
at least not until they are copied into files.

 Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 
 300ms.
 I've seen that in some other results.  Sometimes sync if fast, but sd
 then more than makes up for it by being slow:-(

 Oh, I see that too. Normally That's because the filesystem hasn't
 been told to enter an idle state and so is doing metadata writeback
 IO after the sync. When that happens the sd suspend has wait for
 request queues to drain, IO to complete and device caches to flush.
 This simply cannot be avoided because suspend never tells the
 filesytems to enter an idle state

I captured a trace of a slow sd suspend.
Apparently, it does two operations -- first  a sync_cache,
then a stop operation.  The sync was fast.   The stop command
was where all the time went.

I'll look at a more modern drive on the same system next week,
just for comparison.

 i.e. remember what I said initially in this thread about suspend
 actually needing to freeze filesystems, not just sync them?

I think with the complexity of file systems and the underlying
devices, yes, we need to think about how to efficiently
and safely suspend/resume them.

But sys_sync is too expensive to have hard-coded in the kernel suspend path.
Some machines can suspend and resume in under 10ms -- they
absolutely do not want sys_sync() hard-coded in the suspend path.

 FYI,
 I ran analyze_suspend.py -x2
 from current directory /tmp, which is mounted on tmpfs,
 but still found the 2nd sync was very slow -- 200ms
 vs 6 - 20 ms for the sync preceding the 1st suspend.

 So where did that time go? As I pointed out previously, function
 trace will only tell us if the delay is data writeback or not. We
 seem to have confirmed that the delay is, indeed, writeback of dirty
 data. Now we need to identify what the dirty data belongs to: we
 need to trace individual writeback events to see what dirty inodes
 are actually being written.

I expect that analyze_suspend.py is moving data around between
the back-to-back suspends when the -x2 option is used -- will look into it.

thanks,
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-19 Thread Len Brown

 Can you repeat this test on your system, so that we can determine if
 the 5ms sync time is actually just the overhead of inode cache
 traversal? If that is the case, the speed of sync on a clean
 filesystem is already a solved problem - the patchset should be
 merged in the 4.2 cycle

Yes, drop_caches does seem to help repeated sync on this system:
Exactly what patch series does this?  I'm running ext4 (the default,
not btrfs)

[lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sleep 0 ; done

real0m0.002s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.001s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.001s
sys 0m0.000s

real0m0.001s
user0m0.000s
sys 0m0.001s
[lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done

real0m0.004s
user0m0.000s
sys 0m0.003s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.003s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.002s
user0m0.000s
sys 0m0.002s
[lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
ext4_inode_cache3536   3536   1008   164 : tunables00
  0 : slabdata221221  0
[lenb@d975xbx ~]$ sudo sh -c echo 3  /proc/sys/vm/drop_caches 
[lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
ext4_inode_cache 553   1680   1008   164 : tunables00
  0 : slabdata105105  0
[lenb@d975xbx ~]$ for i in `seq 0 1 10`; do time sync ; done

real0m0.002s
user0m0.000s
sys 0m0.001s

real0m0.002s
user0m0.000s
sys 0m0.002s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.001s

real0m0.001s
user0m0.000s
sys 0m0.002s


 While they were all to slow, none of them were
  O(500ms), so yes, there
 does seem to be some state change
 that causes the 2nd sync after a resume to be especially slow.

 Unfortunately, I've not got an ftrace on the 500ms flavor yet.

 This is the problem we really need to reproduce and track down.

Putting a function trace on sys_sync and executing sync manually,
I was able to see it take 100ms,
though function trace itself could be contributing to that...

[lenb@d975xbx ~]$ grep \# trace.20150619_013702
# tracer: function_graph
#
# TIMECPU  DURATION  FUNCTION CALLS
#  |  | |   | |   |   |   |
  374.665063 |   0) # 2229.612 us |} /* __schedule */
  374.665064 |   0) # 2230.571 us |  } /* schedule */
  374.665064 |   0) # 2231.494 us |} /* schedule_timeout */
  374.665065 |   0) # 2235.937 us |  } /* wait_for_completion */
  374.745616 |   0) # 80518.73 us |  } /* __schedule */
  374.745616 |   0) # 80519.47 us |} /* schedule */
  374.745617 |   0) # 80520.28 us |  } /*
schedule_timeout */
  374.745621 |   0) # 80526.38 us |} /*
io_schedule_timeout */
  374.745621 |   0) # 80527.23 us |  } /* bit_wait_io */
  374.745622 |   0) # 80531.04 us |} /* __wait_on_bit */
  374.745623 |   0) # 80531.95 us |  } /* wait_on_page_bit */
  374.745644 |   0) # 80555.58 us |} /* filemap_fdatawait_range */
  374.745644 |   0) # 80556.36 us |  } /* filemap_fdatawait */
  374.748029 |   0) # 1300.848 us |  } /* __schedule */
  374.748029 |   0) # 1301.376 us |} /* schedule */
  374.748029 |   0) # 1301.923 us |  } /*
schedule_timeout */
  374.748032 |   0) # 1306.133 us |} /*
io_schedule_timeout */
  374.748032 |   0) # 1306.651 us |  } /* bit_wait_io */
  374.748033 |   0) # 1309.298 us |} /* __wait_on_bit */
  374.748033 |   0) # 1309.838 us |  } /* wait_on_page_bit */
  374.750502 |   0) # 1099.379 us |  } /* __schedule */

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-18 Thread Dave Chinner

On Thu, Jun 18, 2015 at 10:35:44PM -0400, Len Brown wrote:
> On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner  wrote:
> > On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote:
> >> On Sun, May 17, 2015 at 9:57 PM, NeilBrown  wrote:
> >> > 1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
> >> >other people claim there shouldn't be much delay.
> >> >
> >> >Len: can you provide numbers?  How long does the sys_sync take
> >> >(min/max/mean).  I think an interesting number would be in a quick
> >> >"resume, do something, suspend" cycle, what percent of the time is 
> >> > spent
> >> >in sys_sync.
> >> >Maybe also report what filesystems are in use, and whether you expect
> >> >there to be any dirty data.
> >> >
> >> >If I run "time sync; time sync; time sync" it reports about 50msec for
> >> >each sync.  If I run "time sleep 0; time sleep 0; time sleep 0" it 
> >> > reports
> >> >about 2 msec.  So sys_sync is taking at least 50msec.
> >> >Almost all of that is in sync_inodes_sb() and much of that is in
> >> >_raw_spin_lock though I might be misinterpreting perf.  It seems 
> >> > to
> >> >wait for a BDI flusher thread to go off and do nothing.
> >> >
> >> >Len: is 50msec enough to bother you?
> >>
> >> 50ms is not acceptable.
> >> 5ms is also not acceptable.
> >>
> >> If it were guaranteed to be under 1ms, it would not behigh  on my
> >> "performance pain" list, but I would still require the option
> >> to delete it entirely.
> >>
> >> But sys_sync time is random on  many systems,
> >> with a very large maximum duration.
> >>
> >> Attached is the output from analyze_suspend.py -x2
> >> on a desktop machine, which has just an inexpensive SSD for storage.
> >> It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel
> >> with no tweaks.
> >
> > More information required. Mounted filesystems, mount options, etc
> > at minimum so we have some idea of what is being synced. We don't
> > even know what filesystem you are using
> 
> In this case, I am using Fedora 22 defaults on a single SSD:
> 
> ext4 (rw,relatime,seclabel,data=ordered)

OK, nothing unusual.

> [lenb@d975xbx ~]$ mount
> /dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered)

> /dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)


And so effectively only a single filesystem that would do IO when
sync is called.

> > The graph posted just has a single box labelled "sync_filesystems"
> > with no other information about what is actually taking that time.
> > We need to know what is consuming all that time *inside*
> > sync_filesystems() to make any progress here.
> 
> Just for grins, did 100 sys_sync instead of one.
> They all ran about the same 5ms.

Ok, so what's the overhead? On one of my test VMs, running on
spinning disk backed devices and a single ext3 root filesystem and
running that sync scalability patch series:

> > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
> > superblock-scaling

$ for i in `seq 0 1 100`; do time sleep 0 ; done 2>&1 | stats

real0m0.002s
user0m0.00-0.004000(0.000168317+/-0.00058)s
sys 0m0.00-0.003000(0.00154455+/-0.00074)s
$ for i in `seq 0 1 100`; do time sync ; done 2>&1 | stats

real0m0.002s
user0m0.00-0.002000(0.000118812+/-0.00041)s
sys 0m0.00-0.003000(0.00151485+/-0.0007)s
$

The sync runtime is exactly the same as running "sleep 0" 100 times.
Without the sync scalability patchset:

$ for i in `seq 0 1 100`; do time sync ; done 2>&1 | stats

real  0m0.007s
user  0m0.00-0.004000(0.000316832+/-0.00096)s
sys   0m0.003000-0.008000(0.00690099+/-0.00099)s
$

There's 5ms extra time on every sync, and it is entirely system CPU
time. That will be the inode cache traversal overhead.  To confirm,
drop the cache and rerun:

$ sudo grep ext3_inode /proc/slabinfo
ext3_inode_cache   23086  23086960   174 : tunables000 : 
slabdata   1358   1358  0
$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches "
$ sudo grep ext3_inode /proc/slabinfo
ext3_inode_cache3152   4488960   174 : tunables000 : 
slabdata264264  0
$ for i in `seq 0 1 100`; do time sync ; done 2>&1 | stats

real  0m0.002s
user  0m0.00-0.005000(0.000227723+/-0.00074)s
sys   0m0.00-0.002000(0.00182178+/-0.00057)s

Dropping the inode cache (from 23,000 inodes to 3,000) makes the 5ms
sync runtime disappear.

Can you repeat this test on your system, so that we can determine if
the 5ms ""sync time" is actually just the overhead of inode cache
traversal? If that is the case, the speed of sync on a clean
filesystem is already a solved problem - the patchset should be
merged in the 4.2 cycle

> While they were all to slow, none of them were
>  O(500ms), so yes, there
> does seem to be some state change
> that causes the 2nd sync after a resume to be especially slow.
> 
> Unfortunately, I've not got an ftrace on the 500ms

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-18 Thread Len Brown

On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner  wrote:
> On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote:
>> On Sun, May 17, 2015 at 9:57 PM, NeilBrown  wrote:
>> > On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes
>> >  wrote:
>> >
>> >> > > Data loss may be caused for hotplug storage(like USB), or all storage
>> >> > > when power is exhausted during suspend.
>> >> >
>> >> > Which also may very well happen at run time, right?
>> >>
>> >> Intuitively users treat "suspended" as a bit like off and do remove
>> >> devices they've "finished with".
>> >>
>> >> Technically yes the instant off/on on a phone is the same as the suspend
>> >> to RAM on a laptop but it doesn't mean the model in people's heads is the
>> >> same... not yet anyway.
>> >>
>> >> > > Is there obvious advantage to remove sys_sync() in the case?
>> >> >
>> >> > Yes, there is.  It is not necessary to sync() every time you suspend
>> >> > if you do that very often.
>> >>
>> >> But if you do it very often you won't have any dirty pages to flush so it
>> >> will be very fast.
>> >
>> > Several people have said things like this, and yet Len clearly saw a 
>> > problem
>> > so sometimes it isn't as fast as you think it should be.  I guess we need
>> > some data.
>> >
>> > In fact there seem to be a number of open questions:
>> >
>> > 1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
>> >other people claim there shouldn't be much delay.
>> >
>> >Len: can you provide numbers?  How long does the sys_sync take
>> >(min/max/mean).  I think an interesting number would be in a quick
>> >"resume, do something, suspend" cycle, what percent of the time is spent
>> >in sys_sync.
>> >Maybe also report what filesystems are in use, and whether you expect
>> >there to be any dirty data.
>> >
>> >If I run "time sync; time sync; time sync" it reports about 50msec for
>> >each sync.  If I run "time sleep 0; time sleep 0; time sleep 0" it 
>> > reports
>> >about 2 msec.  So sys_sync is taking at least 50msec.
>> >Almost all of that is in sync_inodes_sb() and much of that is in
>> >_raw_spin_lock though I might be misinterpreting perf.  It seems to
>> >wait for a BDI flusher thread to go off and do nothing.
>> >
>> >Len: is 50msec enough to bother you?
>>
>> 50ms is not acceptable.
>> 5ms is also not acceptable.
>>
>> If it were guaranteed to be under 1ms, it would not behigh  on my
>> "performance pain" list, but I would still require the option
>> to delete it entirely.
>>
>> But sys_sync time is random on  many systems,
>> with a very large maximum duration.
>>
>> Attached is the output from analyze_suspend.py -x2
>> on a desktop machine, which has just an inexpensive SSD for storage.
>> It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel
>> with no tweaks.
>
> More information required. Mounted filesystems, mount options, etc
> at minimum so we have some idea of what is being synced. We don't
> even know what filesystem you are using

In this case, I am using Fedora 22 defaults on a single SSD:

ext4 (rw,relatime,seclabel,data=ordered)


[lenb@d975xbx ~]$ mount
/dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered)
devtmpfs on /dev type devtmpfs (rw,relatime,seclabel,size=3053436k,nr_inodes=76)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,rela)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=6)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,x)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cp)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatim)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,r)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blk)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,h)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relati)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,me)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,d)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,f)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=21,pgrp=1,tim)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
tmpfs on /tmp type tmpfs

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-18 Thread Dave Chinner

On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote:
> On Sun, May 17, 2015 at 9:57 PM, NeilBrown  wrote:
> > On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes
> >  wrote:
> >
> >> > > Data loss may be caused for hotplug storage(like USB), or all storage
> >> > > when power is exhausted during suspend.
> >> >
> >> > Which also may very well happen at run time, right?
> >>
> >> Intuitively users treat "suspended" as a bit like off and do remove
> >> devices they've "finished with".
> >>
> >> Technically yes the instant off/on on a phone is the same as the suspend
> >> to RAM on a laptop but it doesn't mean the model in people's heads is the
> >> same... not yet anyway.
> >>
> >> > > Is there obvious advantage to remove sys_sync() in the case?
> >> >
> >> > Yes, there is.  It is not necessary to sync() every time you suspend
> >> > if you do that very often.
> >>
> >> But if you do it very often you won't have any dirty pages to flush so it
> >> will be very fast.
> >
> > Several people have said things like this, and yet Len clearly saw a problem
> > so sometimes it isn't as fast as you think it should be.  I guess we need
> > some data.
> >
> > In fact there seem to be a number of open questions:
> >
> > 1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
> >other people claim there shouldn't be much delay.
> >
> >Len: can you provide numbers?  How long does the sys_sync take
> >(min/max/mean).  I think an interesting number would be in a quick
> >"resume, do something, suspend" cycle, what percent of the time is spent
> >in sys_sync.
> >Maybe also report what filesystems are in use, and whether you expect
> >there to be any dirty data.
> >
> >If I run "time sync; time sync; time sync" it reports about 50msec for
> >each sync.  If I run "time sleep 0; time sleep 0; time sleep 0" it 
> > reports
> >about 2 msec.  So sys_sync is taking at least 50msec.
> >Almost all of that is in sync_inodes_sb() and much of that is in
> >_raw_spin_lock though I might be misinterpreting perf.  It seems to
> >wait for a BDI flusher thread to go off and do nothing.
> >
> >Len: is 50msec enough to bother you?
> 
> 50ms is not acceptable.
> 5ms is also not acceptable.
> 
> If it were guaranteed to be under 1ms, it would not behigh  on my
> "performance pain" list, but I would still require the option
> to delete it entirely.
> 
> But sys_sync time is random on  many systems,
> with a very large maximum duration.
> 
> Attached is the output from analyze_suspend.py -x2
> on a desktop machine, which has just an inexpensive SSD for storage.
> It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel
> with no tweaks.

More information required. Mounted filesystems, mount options, etc
at minimum so we have some idea of what is being synced. We don't
even know what filesystem you are using

> The 1st suspend starts with a sync that takes 6.6ms
> But the 2nd suspend starts with a sync that takes 451.8ms

The graph posted just has a single box labelled "sync_filesystems"
with no other information about what is actually taking that time.
We need to know what is consuming all that time *inside*
sync_filesystems() to make any progress here.

> So the reasoning that subsequent sys_sync's should be instantly quick
> because they are following a previous sys_sync is found to be simply
> erroneous by observations such as this.

Which indicates that either the system is not idle as expected, or
there's some kind of bug in one of the filesystems that is being
synced.

But we can't do any analysis of the problem you are seeing if the
only information you give us "it takes too long".  Something like a
function trace that tells use exactly what functions are being
called and how long they take to execute would be really handy
here.

Alternatively, you don't need to suspend to test this - just run a
test program on an idle system that does two sync() calls 50ms apart
and see what happens. If you can't reproduce it outside of a suspend
operation, then we need to look at what suspend is actually causing
to be dirtied in those 50ms.

If you can reproduce it, you can then even narrow down the
filesystem that is causing the problem by using syncfs() instead of
sync() to test the filesystems one at a time, and once you've done
that take a writeback event trace so we can see exactly what
inodes/data is being written back, and hence identify what the first
sync is not cleaning.

FWIW, I'd also suggest trying the changes to the sync code here:

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
superblock-scaling

As they avoid the need for sync to walk all the cached inodes
in the system needlessly and so should reduce the CPU overhead of
sys_sync() quite substantially.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-18 Thread Dave Chinner

On Thu, Jun 18, 2015 at 10:35:44PM -0400, Len Brown wrote:
 On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner da...@fromorbit.com wrote:
  On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote:
  On Sun, May 17, 2015 at 9:57 PM, NeilBrown ne...@suse.de wrote:
   1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
  other people claim there shouldn't be much delay.
  
  Len: can you provide numbers?  How long does the sys_sync take
  (min/max/mean).  I think an interesting number would be in a quick
  resume, do something, suspend cycle, what percent of the time is 
   spent
  in sys_sync.
  Maybe also report what filesystems are in use, and whether you expect
  there to be any dirty data.
  
  If I run time sync; time sync; time sync it reports about 50msec for
  each sync.  If I run time sleep 0; time sleep 0; time sleep 0 it 
   reports
  about 2 msec.  So sys_sync is taking at least 50msec.
  Almost all of that is in sync_inodes_sb() and much of that is in
  _raw_spin_lock though I might be misinterpreting perf.  It seems 
   to
  wait for a BDI flusher thread to go off and do nothing.
  
  Len: is 50msec enough to bother you?
 
  50ms is not acceptable.
  5ms is also not acceptable.
 
  If it were guaranteed to be under 1ms, it would not behigh  on my
  performance pain list, but I would still require the option
  to delete it entirely.
 
  But sys_sync time is random on  many systems,
  with a very large maximum duration.
 
  Attached is the output from analyze_suspend.py -x2
  on a desktop machine, which has just an inexpensive SSD for storage.
  It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel
  with no tweaks.
 
  More information required. Mounted filesystems, mount options, etc
  at minimum so we have some idea of what is being synced. We don't
  even know what filesystem you are using
 
 In this case, I am using Fedora 22 defaults on a single SSD:
 
 ext4 (rw,relatime,seclabel,data=ordered)

OK, nothing unusual.

 [lenb@d975xbx ~]$ mount
 /dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered)

 /dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)


And so effectively only a single filesystem that would do IO when
sync is called.

  The graph posted just has a single box labelled sync_filesystems
  with no other information about what is actually taking that time.
  We need to know what is consuming all that time *inside*
  sync_filesystems() to make any progress here.
 
 Just for grins, did 100 sys_sync instead of one.
 They all ran about the same 5ms.

Ok, so what's the overhead? On one of my test VMs, running on
spinning disk backed devices and a single ext3 root filesystem and
running that sync scalability patch series:

  git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
  superblock-scaling

$ for i in `seq 0 1 100`; do time sleep 0 ; done 21 | stats

real0m0.002s
user0m0.00-0.004000(0.000168317+/-0.00058)s
sys 0m0.00-0.003000(0.00154455+/-0.00074)s
$ for i in `seq 0 1 100`; do time sync ; done 21 | stats

real0m0.002s
user0m0.00-0.002000(0.000118812+/-0.00041)s
sys 0m0.00-0.003000(0.00151485+/-0.0007)s
$

The sync runtime is exactly the same as running sleep 0 100 times.
Without the sync scalability patchset:

$ for i in `seq 0 1 100`; do time sync ; done 21 | stats

real  0m0.007s
user  0m0.00-0.004000(0.000316832+/-0.00096)s
sys   0m0.003000-0.008000(0.00690099+/-0.00099)s
$

There's 5ms extra time on every sync, and it is entirely system CPU
time. That will be the inode cache traversal overhead.  To confirm,
drop the cache and rerun:

$ sudo grep ext3_inode /proc/slabinfo
ext3_inode_cache   23086  23086960   174 : tunables000 : 
slabdata   1358   1358  0
$ sudo sh -c echo 3  /proc/sys/vm/drop_caches 
$ sudo grep ext3_inode /proc/slabinfo
ext3_inode_cache3152   4488960   174 : tunables000 : 
slabdata264264  0
$ for i in `seq 0 1 100`; do time sync ; done 21 | stats

real  0m0.002s
user  0m0.00-0.005000(0.000227723+/-0.00074)s
sys   0m0.00-0.002000(0.00182178+/-0.00057)s

Dropping the inode cache (from 23,000 inodes to 3,000) makes the 5ms
sync runtime disappear.

Can you repeat this test on your system, so that we can determine if
the 5ms sync time is actually just the overhead of inode cache
traversal? If that is the case, the speed of sync on a clean
filesystem is already a solved problem - the patchset should be
merged in the 4.2 cycle

 While they were all to slow, none of them were
  O(500ms), so yes, there
 does seem to be some state change
 that causes the 2nd sync after a resume to be especially slow.
 
 Unfortunately, I've not got an ftrace on the 500ms flavor yet.

This is the problem we really need to reproduce and track down.

  So the reasoning that subsequent sys_sync's should be instantly quick
  because they

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-18 Thread Dave Chinner

On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote:
 On Sun, May 17, 2015 at 9:57 PM, NeilBrown ne...@suse.de wrote:
  On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes
  gno...@lxorguk.ukuu.org.uk wrote:
 
Data loss may be caused for hotplug storage(like USB), or all storage
when power is exhausted during suspend.
  
   Which also may very well happen at run time, right?
 
  Intuitively users treat suspended as a bit like off and do remove
  devices they've finished with.
 
  Technically yes the instant off/on on a phone is the same as the suspend
  to RAM on a laptop but it doesn't mean the model in people's heads is the
  same... not yet anyway.
 
Is there obvious advantage to remove sys_sync() in the case?
  
   Yes, there is.  It is not necessary to sync() every time you suspend
   if you do that very often.
 
  But if you do it very often you won't have any dirty pages to flush so it
  will be very fast.
 
  Several people have said things like this, and yet Len clearly saw a problem
  so sometimes it isn't as fast as you think it should be.  I guess we need
  some data.
 
  In fact there seem to be a number of open questions:
 
  1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
 other people claim there shouldn't be much delay.
 
 Len: can you provide numbers?  How long does the sys_sync take
 (min/max/mean).  I think an interesting number would be in a quick
 resume, do something, suspend cycle, what percent of the time is spent
 in sys_sync.
 Maybe also report what filesystems are in use, and whether you expect
 there to be any dirty data.
 
 If I run time sync; time sync; time sync it reports about 50msec for
 each sync.  If I run time sleep 0; time sleep 0; time sleep 0 it 
  reports
 about 2 msec.  So sys_sync is taking at least 50msec.
 Almost all of that is in sync_inodes_sb() and much of that is in
 _raw_spin_lock though I might be misinterpreting perf.  It seems to
 wait for a BDI flusher thread to go off and do nothing.
 
 Len: is 50msec enough to bother you?
 
 50ms is not acceptable.
 5ms is also not acceptable.
 
 If it were guaranteed to be under 1ms, it would not behigh  on my
 performance pain list, but I would still require the option
 to delete it entirely.
 
 But sys_sync time is random on  many systems,
 with a very large maximum duration.
 
 Attached is the output from analyze_suspend.py -x2
 on a desktop machine, which has just an inexpensive SSD for storage.
 It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel
 with no tweaks.

More information required. Mounted filesystems, mount options, etc
at minimum so we have some idea of what is being synced. We don't
even know what filesystem you are using

 The 1st suspend starts with a sync that takes 6.6ms
 But the 2nd suspend starts with a sync that takes 451.8ms

The graph posted just has a single box labelled sync_filesystems
with no other information about what is actually taking that time.
We need to know what is consuming all that time *inside*
sync_filesystems() to make any progress here.

 So the reasoning that subsequent sys_sync's should be instantly quick
 because they are following a previous sys_sync is found to be simply
 erroneous by observations such as this.

Which indicates that either the system is not idle as expected, or
there's some kind of bug in one of the filesystems that is being
synced.

But we can't do any analysis of the problem you are seeing if the
only information you give us it takes too long.  Something like a
function trace that tells use exactly what functions are being
called and how long they take to execute would be really handy
here.

Alternatively, you don't need to suspend to test this - just run a
test program on an idle system that does two sync() calls 50ms apart
and see what happens. If you can't reproduce it outside of a suspend
operation, then we need to look at what suspend is actually causing
to be dirtied in those 50ms.

If you can reproduce it, you can then even narrow down the
filesystem that is causing the problem by using syncfs() instead of
sync() to test the filesystems one at a time, and once you've done
that take a writeback event trace so we can see exactly what
inodes/data is being written back, and hence identify what the first
sync is not cleaning.

FWIW, I'd also suggest trying the changes to the sync code here:

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
superblock-scaling

As they avoid the need for sync to walk all the cached inodes
in the system needlessly and so should reduce the CPU overhead of
sys_sync() quite substantially.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-06-18 Thread Len Brown

On Thu, Jun 18, 2015 at 9:09 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Jun 18, 2015 at 02:14:31AM -0400, Len Brown wrote:
 On Sun, May 17, 2015 at 9:57 PM, NeilBrown ne...@suse.de wrote:
  On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes
  gno...@lxorguk.ukuu.org.uk wrote:
 
Data loss may be caused for hotplug storage(like USB), or all storage
when power is exhausted during suspend.
  
   Which also may very well happen at run time, right?
 
  Intuitively users treat suspended as a bit like off and do remove
  devices they've finished with.
 
  Technically yes the instant off/on on a phone is the same as the suspend
  to RAM on a laptop but it doesn't mean the model in people's heads is the
  same... not yet anyway.
 
Is there obvious advantage to remove sys_sync() in the case?
  
   Yes, there is.  It is not necessary to sync() every time you suspend
   if you do that very often.
 
  But if you do it very often you won't have any dirty pages to flush so it
  will be very fast.
 
  Several people have said things like this, and yet Len clearly saw a 
  problem
  so sometimes it isn't as fast as you think it should be.  I guess we need
  some data.
 
  In fact there seem to be a number of open questions:
 
  1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
 other people claim there shouldn't be much delay.
 
 Len: can you provide numbers?  How long does the sys_sync take
 (min/max/mean).  I think an interesting number would be in a quick
 resume, do something, suspend cycle, what percent of the time is spent
 in sys_sync.
 Maybe also report what filesystems are in use, and whether you expect
 there to be any dirty data.
 
 If I run time sync; time sync; time sync it reports about 50msec for
 each sync.  If I run time sleep 0; time sleep 0; time sleep 0 it 
  reports
 about 2 msec.  So sys_sync is taking at least 50msec.
 Almost all of that is in sync_inodes_sb() and much of that is in
 _raw_spin_lock though I might be misinterpreting perf.  It seems to
 wait for a BDI flusher thread to go off and do nothing.
 
 Len: is 50msec enough to bother you?

 50ms is not acceptable.
 5ms is also not acceptable.

 If it were guaranteed to be under 1ms, it would not behigh  on my
 performance pain list, but I would still require the option
 to delete it entirely.

 But sys_sync time is random on  many systems,
 with a very large maximum duration.

 Attached is the output from analyze_suspend.py -x2
 on a desktop machine, which has just an inexpensive SSD for storage.
 It is a fresh install of Fedora 22 with a stock 4.0.4-301 kernel
 with no tweaks.

 More information required. Mounted filesystems, mount options, etc
 at minimum so we have some idea of what is being synced. We don't
 even know what filesystem you are using

In this case, I am using Fedora 22 defaults on a single SSD:

ext4 (rw,relatime,seclabel,data=ordered)


[lenb@d975xbx ~]$ mount
/dev/sda3 on / type ext4 (rw,relatime,seclabel,data=ordered)
devtmpfs on /dev type devtmpfs (rw,relatime,seclabel,size=3053436k,nr_inodes=76)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,rela)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=6)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,x)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cp)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatim)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,r)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blk)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,h)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relati)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,me)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,d)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,f)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=21,pgrp=1,tim)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
tmpfs on /tmp type tmpfs (rw,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
configfs on /sys/kernel/config type configfs (rw,relatime)
sunrpc on /var/lib/nfs/rpc_pipefs type

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-17 Thread NeilBrown

On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes
 wrote:

> > > Data loss may be caused for hotplug storage(like USB), or all storage
> > > when power is exhausted during suspend.
> > 
> > Which also may very well happen at run time, right?
> 
> Intuitively users treat "suspended" as a bit like off and do remove
> devices they've "finished with".
> 
> Technically yes the instant off/on on a phone is the same as the suspend
> to RAM on a laptop but it doesn't mean the model in people's heads is the
> same... not yet anyway.
> 
> > > Is there obvious advantage to remove sys_sync() in the case?
> > 
> > Yes, there is.  It is not necessary to sync() every time you suspend
> > if you do that very often.
> 
> But if you do it very often you won't have any dirty pages to flush so it
> will be very fast.

Several people have said things like this, and yet Len clearly saw a problem
so sometimes it isn't as fast as you think it should be.  I guess we need
some data.

In fact there seem to be a number of open questions:

1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
   other people claim there shouldn't be much delay.

   Len: can you provide numbers?  How long does the sys_sync take
   (min/max/mean).  I think an interesting number would be in a quick
   "resume, do something, suspend" cycle, what percent of the time is spent
   in sys_sync.
   Maybe also report what filesystems are in use, and whether you expect
   there to be any dirty data.

   If I run "time sync; time sync; time sync" it reports about 50msec for
   each sync.  If I run "time sleep 0; time sleep 0; time sleep 0" it reports
   about 2 msec.  So sys_sync is taking at least 50msec.
   Almost all of that is in sync_inodes_sb() and much of that is in
   _raw_spin_lock though I might be misinterpreting perf.  It seems to
   wait for a BDI flusher thread to go off and do nothing.

   Len: is 50msec enough to bother you?

2/ Is a 'sys_sync' really necessary.  People claim that "users might lose
   data".  I claim that some user data could be still in the application
   buffers so they would lose that anyway, and applications call fsync after
   writing everything out, so sys_sync isn't necessary.
   Does anyone have an example of an application which writes important
   user data, but doesn't call "fsync" shortly after all data has been written
   out of the application's internal buffers?

3/ Is a 'sys_sync' sufficient.  Len claims that in some cases, when running
 sync; sync
   the second sync takes longer, suggesting that the first sync didn't do
   everything that was expected.   Prior to Linux 1.3.20, sys_sync didn't
   wait for everything.  Since then it seems to be designed to, but maybe
   something isn't right there.
   In that case, having sys_sync may not be helping anyway.

   Len: can you reliably reproduce this?  Can you provide a recipe?

4/ Which part of 'sync' is really important?
   sys_sync does two things.  It initiates writeout on dirty data and it
   wait for writeout to complete.  Even if the former isn't necessary, the
   latter probably is.  Do we need to reliably wait for all output queues
   to flush?  Where is the best place to do that?

Thanks,
NeilBrown

> 
> > And it is done in such a place that everything needs to wait for it to 
> > complete.
> 
> Only because the code deciding to trigger any automated suspend doesn't
> do a sync a few seconds before. In the case the user goes to the menus
> and does power->suspend then yes it's a delay. In the case where the OS
> at some level has decided that it's 10 seconds from automatically
> suspending to something the user space can issue a pre-emptive sync to
> get the queue size down.
> 
> Alan

pgpIOVBdOCU28.pgp
Description: OpenPGP digital signature

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-17 Thread NeilBrown

On Fri, 15 May 2015 11:35:57 +0100 One Thousand Gnomes
gno...@lxorguk.ukuu.org.uk wrote:

   Data loss may be caused for hotplug storage(like USB), or all storage
   when power is exhausted during suspend.
  
  Which also may very well happen at run time, right?
 
 Intuitively users treat suspended as a bit like off and do remove
 devices they've finished with.
 
 Technically yes the instant off/on on a phone is the same as the suspend
 to RAM on a laptop but it doesn't mean the model in people's heads is the
 same... not yet anyway.
 
   Is there obvious advantage to remove sys_sync() in the case?
  
  Yes, there is.  It is not necessary to sync() every time you suspend
  if you do that very often.
 
 But if you do it very often you won't have any dirty pages to flush so it
 will be very fast.

Several people have said things like this, and yet Len clearly saw a problem
so sometimes it isn't as fast as you think it should be.  I guess we need
some data.

In fact there seem to be a number of open questions:

1/ Len has seen enough delays to bother sending a patch.  Twice.  Lots of
   other people claim there shouldn't be much delay.

   Len: can you provide numbers?  How long does the sys_sync take
   (min/max/mean).  I think an interesting number would be in a quick
   resume, do something, suspend cycle, what percent of the time is spent
   in sys_sync.
   Maybe also report what filesystems are in use, and whether you expect
   there to be any dirty data.

   If I run time sync; time sync; time sync it reports about 50msec for
   each sync.  If I run time sleep 0; time sleep 0; time sleep 0 it reports
   about 2 msec.  So sys_sync is taking at least 50msec.
   Almost all of that is in sync_inodes_sb() and much of that is in
   _raw_spin_lock though I might be misinterpreting perf.  It seems to
   wait for a BDI flusher thread to go off and do nothing.

   Len: is 50msec enough to bother you?


2/ Is a 'sys_sync' really necessary.  People claim that users might lose
   data.  I claim that some user data could be still in the application
   buffers so they would lose that anyway, and applications call fsync after
   writing everything out, so sys_sync isn't necessary.
   Does anyone have an example of an application which writes important
   user data, but doesn't call fsync shortly after all data has been written
   out of the application's internal buffers?

3/ Is a 'sys_sync' sufficient.  Len claims that in some cases, when running
 sync; sync
   the second sync takes longer, suggesting that the first sync didn't do
   everything that was expected.   Prior to Linux 1.3.20, sys_sync didn't
   wait for everything.  Since then it seems to be designed to, but maybe
   something isn't right there.
   In that case, having sys_sync may not be helping anyway.

   Len: can you reliably reproduce this?  Can you provide a recipe?

4/ Which part of 'sync' is really important?
   sys_sync does two things.  It initiates writeout on dirty data and it
   wait for writeout to complete.  Even if the former isn't necessary, the
   latter probably is.  Do we need to reliably wait for all output queues
   to flush?  Where is the best place to do that?

Thanks,
NeilBrown



 
  And it is done in such a place that everything needs to wait for it to 
  complete.
 
 Only because the code deciding to trigger any automated suspend doesn't
 do a sync a few seconds before. In the case the user goes to the menus
 and does power-suspend then yes it's a delay. In the case where the OS
 at some level has decided that it's 10 seconds from automatically
 suspending to something the user space can issue a pre-emptive sync to
 get the queue size down.
 
 Alan



pgpIOVBdOCU28.pgp
Description: OpenPGP digital signature

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread Alan Stern

On Fri, 15 May 2015, Alan Stern wrote:

> On Fri, 15 May 2015, NeilBrown wrote:
> 
> > Some storage devices don't handle suspend as well as they should and lose
> > requests resulting in corruption.  They should obviously be fixed, but it is
> > you who gets the problem reports and you are not in a position to fix them.
> > So you want a general solution that hides those problems.
> > sys_sync at suspend time is a sort-of solution because it flushes and waits
> > so there  is less in-flight IO immediately after a sys_sync and so less
> > opportunity for a bad device to stuff up.
> > But you seem to suggest that sys_sync isn't a complete solution and it
> > doesn't guarantee that xfs is not doing some background metadata IO.
> > 
> > Maybe a sensible thing to do would be to hook the "disk" devices into 
> > suspend
> > and have them flush their queue and possibly send a CACHE_FLUSH command.
> > That would provide more of a guarantee for you, and less of a cost for Len,
> > would it not?
> 
> The sd driver already does this.

Sorry -- I meant that it does send a SYNCHRONIZE CACHE command.  It 
doesn't flush the I/O queue.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread Alan Stern

On Fri, 15 May 2015, NeilBrown wrote:

> Some storage devices don't handle suspend as well as they should and lose
> requests resulting in corruption.  They should obviously be fixed, but it is
> you who gets the problem reports and you are not in a position to fix them.
> So you want a general solution that hides those problems.
> sys_sync at suspend time is a sort-of solution because it flushes and waits
> so there  is less in-flight IO immediately after a sys_sync and so less
> opportunity for a bad device to stuff up.
> But you seem to suggest that sys_sync isn't a complete solution and it
> doesn't guarantee that xfs is not doing some background metadata IO.
> 
> Maybe a sensible thing to do would be to hook the "disk" devices into suspend
> and have them flush their queue and possibly send a CACHE_FLUSH command.
> That would provide more of a guarantee for you, and less of a cost for Len,
> would it not?

The sd driver already does this.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread Alan Stern

On Fri, 15 May 2015, Dave Chinner wrote:

> It has nothing to do with journalling, and everything to do with
> bring filesystems to an *idle state* before suspend runs.  We have a
> long history of bug reports with XFS that go: suspend, resume, XFS
> almost immediately detects corruption, shuts down.
> 
> The problem is that "sync" doesn't make the filesystem idle - XFs
> has *lots* of background work going on, and if we aren't *real
> careful* the filesystem is still doing work while the hardware gets
> powerd down and the suspend image is being taken. the result is on
> resume that the on-disk filesystem state does not match the memory
> image pulled back from resume, and we get shutdowns.
> 
> sys_sync() does not guarantee a filesystem is idle - it guarantees
> the data in memory is recoverable, butit doesn't stop the filesystem
> from doing things like writing back metadata or running background
> cleaup tasks. If those aren't stopped properly, then we get into
> the state where in-memory and on-disk state get out of whack. And
> s2ram can have these problems too, because if there is IO in flight
> when the hardware is powered down, that IO is lost
> 
> Every time some piece of generic infrastructure changes behaviour
> w.r.t. suspend/resume, we get a new set of problems being reported
> by users. It's extremely hard to test for these problems and it
> might take months of occasional corruption reports from a user to
> isolate it to being a suspend/resume problem.  It's a game of
> whack-a-mole, because quite often they come down to the fact that
> something changed and nobody in the XFS world knew they had to now
> set an different initialisation flag on some structure or workqueue
> to make it work the way it needed to work.
> 
> Go back an look at the history of sys_sync() in suspend discussions
> over the past 10 years.  You'll find me saying exactly the same
> thing again and again about sys_sync(): it does not guarantee the
> filesystem is in an idle or coherent, unchanging state, and nothing
> in the suspend code tells the filesystem to enter an idle or frozen
> state. We actually have mechanisms for doing this - we use it in the
> storage layers to idle the filesystem while we do things like *take
> a snapshot*.
> 
> What is the mechanism suspend to disk uses? It *takes a snapshot* of
> system state, written to disk. It's supposed to be consistent, and
> the only way you can guarantee the state of an active, mounted
> filesystem has consistent in-memory state and on-disk state and
> that it won't get changed is to *freeze the filesystem*.
> 
> Removing the sync is only going to make this problem worse because
> the delta between on-disk and in-memory state is going to be much,
> much larger. There is also likely to be significant filesystem
> activity occurring when the filesystem has all it's background
> threads and work queues abruptly frozen with no warning or
> co-ordination, which makes it impossible for anyone to test
> suspend/resume reliably.
> 
> Sorry for the long rant, but I've been saying the same thing for 10
> years, which is abotu as long as I've been dealing with filesystem
> corruptions that have resulted from suspend/resume.

Are you suggesting that filesystems should register a "freeze the
filesystem" routine, and the PM core should call this routine during 
hibernation (aka suspend-to-disk)?  With a corresponding "unfreeze" 
hook for resume, of course.

That seems like a good idea to me.  It would also present a logical
place to add some simple checks for changes to the on-disk image caused
by the user moving the storage device to another system, modifying the
contents, and then moving it back all while the current system was 
hibernating (or suspended).

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread One Thousand Gnomes

> > Data loss may be caused for hotplug storage(like USB), or all storage
> > when power is exhausted during suspend.
> 
> Which also may very well happen at run time, right?

Intuitively users treat "suspended" as a bit like off and do remove
devices they've "finished with".

Technically yes the instant off/on on a phone is the same as the suspend
to RAM on a laptop but it doesn't mean the model in people's heads is the
same... not yet anyway.

> > Is there obvious advantage to remove sys_sync() in the case?
> 
> Yes, there is.  It is not necessary to sync() every time you suspend
> if you do that very often.

But if you do it very often you won't have any dirty pages to flush so it
will be very fast.

> And it is done in such a place that everything needs to wait for it to 
> complete.

Only because the code deciding to trigger any automated suspend doesn't
do a sync a few seconds before. In the case the user goes to the menus
and does power->suspend then yes it's a delay. In the case where the OS
at some level has decided that it's 10 seconds from automatically
suspending to something the user space can issue a pre-emptive sync to
get the queue size down.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread One Thousand Gnomes

  Data loss may be caused for hotplug storage(like USB), or all storage
  when power is exhausted during suspend.
 
 Which also may very well happen at run time, right?

Intuitively users treat suspended as a bit like off and do remove
devices they've finished with.

Technically yes the instant off/on on a phone is the same as the suspend
to RAM on a laptop but it doesn't mean the model in people's heads is the
same... not yet anyway.

  Is there obvious advantage to remove sys_sync() in the case?
 
 Yes, there is.  It is not necessary to sync() every time you suspend
 if you do that very often.

But if you do it very often you won't have any dirty pages to flush so it
will be very fast.

 And it is done in such a place that everything needs to wait for it to 
 complete.

Only because the code deciding to trigger any automated suspend doesn't
do a sync a few seconds before. In the case the user goes to the menus
and does power-suspend then yes it's a delay. In the case where the OS
at some level has decided that it's 10 seconds from automatically
suspending to something the user space can issue a pre-emptive sync to
get the queue size down.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread Alan Stern

On Fri, 15 May 2015, NeilBrown wrote:

 Some storage devices don't handle suspend as well as they should and lose
 requests resulting in corruption.  They should obviously be fixed, but it is
 you who gets the problem reports and you are not in a position to fix them.
 So you want a general solution that hides those problems.
 sys_sync at suspend time is a sort-of solution because it flushes and waits
 so there  is less in-flight IO immediately after a sys_sync and so less
 opportunity for a bad device to stuff up.
 But you seem to suggest that sys_sync isn't a complete solution and it
 doesn't guarantee that xfs is not doing some background metadata IO.
 
 Maybe a sensible thing to do would be to hook the disk devices into suspend
 and have them flush their queue and possibly send a CACHE_FLUSH command.
 That would provide more of a guarantee for you, and less of a cost for Len,
 would it not?

The sd driver already does this.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread Alan Stern

On Fri, 15 May 2015, Dave Chinner wrote:

 It has nothing to do with journalling, and everything to do with
 bring filesystems to an *idle state* before suspend runs.  We have a
 long history of bug reports with XFS that go: suspend, resume, XFS
 almost immediately detects corruption, shuts down.
 
 The problem is that sync doesn't make the filesystem idle - XFs
 has *lots* of background work going on, and if we aren't *real
 careful* the filesystem is still doing work while the hardware gets
 powerd down and the suspend image is being taken. the result is on
 resume that the on-disk filesystem state does not match the memory
 image pulled back from resume, and we get shutdowns.
 
 sys_sync() does not guarantee a filesystem is idle - it guarantees
 the data in memory is recoverable, butit doesn't stop the filesystem
 from doing things like writing back metadata or running background
 cleaup tasks. If those aren't stopped properly, then we get into
 the state where in-memory and on-disk state get out of whack. And
 s2ram can have these problems too, because if there is IO in flight
 when the hardware is powered down, that IO is lost
 
 Every time some piece of generic infrastructure changes behaviour
 w.r.t. suspend/resume, we get a new set of problems being reported
 by users. It's extremely hard to test for these problems and it
 might take months of occasional corruption reports from a user to
 isolate it to being a suspend/resume problem.  It's a game of
 whack-a-mole, because quite often they come down to the fact that
 something changed and nobody in the XFS world knew they had to now
 set an different initialisation flag on some structure or workqueue
 to make it work the way it needed to work.
 
 Go back an look at the history of sys_sync() in suspend discussions
 over the past 10 years.  You'll find me saying exactly the same
 thing again and again about sys_sync(): it does not guarantee the
 filesystem is in an idle or coherent, unchanging state, and nothing
 in the suspend code tells the filesystem to enter an idle or frozen
 state. We actually have mechanisms for doing this - we use it in the
 storage layers to idle the filesystem while we do things like *take
 a snapshot*.
 
 What is the mechanism suspend to disk uses? It *takes a snapshot* of
 system state, written to disk. It's supposed to be consistent, and
 the only way you can guarantee the state of an active, mounted
 filesystem has consistent in-memory state and on-disk state and
 that it won't get changed is to *freeze the filesystem*.
 
 Removing the sync is only going to make this problem worse because
 the delta between on-disk and in-memory state is going to be much,
 much larger. There is also likely to be significant filesystem
 activity occurring when the filesystem has all it's background
 threads and work queues abruptly frozen with no warning or
 co-ordination, which makes it impossible for anyone to test
 suspend/resume reliably.
 
 Sorry for the long rant, but I've been saying the same thing for 10
 years, which is abotu as long as I've been dealing with filesystem
 corruptions that have resulted from suspend/resume.

Are you suggesting that filesystems should register a freeze the
filesystem routine, and the PM core should call this routine during 
hibernation (aka suspend-to-disk)?  With a corresponding unfreeze 
hook for resume, of course.

That seems like a good idea to me.  It would also present a logical
place to add some simple checks for changes to the on-disk image caused
by the user moving the storage device to another system, modifying the
contents, and then moving it back all while the current system was 
hibernating (or suspended).

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-15 Thread Alan Stern

On Fri, 15 May 2015, Alan Stern wrote:

 On Fri, 15 May 2015, NeilBrown wrote:
 
  Some storage devices don't handle suspend as well as they should and lose
  requests resulting in corruption.  They should obviously be fixed, but it is
  you who gets the problem reports and you are not in a position to fix them.
  So you want a general solution that hides those problems.
  sys_sync at suspend time is a sort-of solution because it flushes and waits
  so there  is less in-flight IO immediately after a sys_sync and so less
  opportunity for a bad device to stuff up.
  But you seem to suggest that sys_sync isn't a complete solution and it
  doesn't guarantee that xfs is not doing some background metadata IO.
  
  Maybe a sensible thing to do would be to hook the disk devices into 
  suspend
  and have them flush their queue and possibly send a CACHE_FLUSH command.
  That would provide more of a guarantee for you, and less of a cost for Len,
  would it not?
 
 The sd driver already does this.

Sorry -- I meant that it does send a SYNCHRONIZE CACHE command.  It 
doesn't flush the I/O queue.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-14 Thread Ming Lei

On Fri, May 15, 2015 at 8:59 AM, Rafael J. Wysocki  wrote:
> On Fri, May 15, 2015 at 2:40 AM, Ming Lei  wrote:
>> On Fri, May 15, 2015 at 8:34 AM, Rafael J. Wysocki  
>> wrote:
>>> On Friday, May 15, 2015 09:54:26 AM Dave Chinner wrote:
 ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
 > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner  
 > wrote:
 >
 > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
 > > > From: Len Brown 
 > > >
 > > > Remove sys_sync() from the kernel's suspend flow.
 > > >
 > > > sys_sync() is extremely expensive in some configurations,
 > > > and so the kernel should not force users to pay this cost
 > > > on every suspend.
 > >
 > > Since when? Please explain what your use case is that makes this
 > > so prohibitively expensive it needs to be removed.
 > >
 > > >
 > > > The user-space utilities s2ram and s2disk choose to invoke sync() 
 > > > today.
 > > > A user can invoke suspend directly via /sys/power/state to skip that 
 > > > cost.
 > >
 > > So, you want to have s2disk write all the dirty pages in memory to
 > > the suspend image, rather than to the filesystem?
 > >
 > > Either way you have to write that dirty data to disk, but if you
 > > write it to the suspend image, it then has to be loaded again on
 > > resume, and then written again to the filesystem the system has
 > > resumed. This doesn't seem very efficient to me
 > >
 > > And, quite frankly, machines fail to resume from suspne dall the
 > > time. e.g. run out of batteries when they are under s2ram
 > > conditions, or s2disk fails because a kernel upgrade was done before
 > > the s2disk and so can't be resumed. With your change, users lose all
 > > the data that was buffered in memory before suspend, whereas right
 > > now it is written to disk and so nothing is lost if the resume from
 > > suspend fails for whatever reason.
 > >
 > > IOWs, I can see several good reasons why the sys_sync() needs to
 > > remain in the suspend code. User data safety and filesystem
 > > integrity is far, far more important than a couple of seconds
 > > improvement in suspend speed
 >
 > To be honest, this sounds like superstition and fear, not science and 
 > fact.
 >
 > "filesystem integrity" is not an issue for the fast majority of 
 > filesystems
 > which use journalling to ensure continued integrity even after a crash.  
 > I
 > think even XFS does that :-)

 It has nothing to do with journalling, and everything to do with
 bring filesystems to an *idle state* before suspend runs.  We have a
 long history of bug reports with XFS that go: suspend, resume, XFS
 almost immediately detects corruption, shuts down.

 The problem is that "sync" doesn't make the filesystem idle - XFs
 has *lots* of background work going on, and if we aren't *real
 careful* the filesystem is still doing work while the hardware gets
 powerd down and the suspend image is being taken. the result is on
 resume that the on-disk filesystem state does not match the memory
 image pulled back from resume, and we get shutdowns.

 sys_sync() does not guarantee a filesystem is idle - it guarantees
 the data in memory is recoverable, butit doesn't stop the filesystem
 from doing things like writing back metadata or running background
 cleaup tasks. If those aren't stopped properly, then we get into
 the state where in-memory and on-disk state get out of whack. And
 s2ram can have these problems too, because if there is IO in flight
 when the hardware is powered down, that IO is lost

 Every time some piece of generic infrastructure changes behaviour
 w.r.t. suspend/resume, we get a new set of problems being reported
 by users. It's extremely hard to test for these problems and it
 might take months of occasional corruption reports from a user to
 isolate it to being a suspend/resume problem.  It's a game of
 whack-a-mole, because quite often they come down to the fact that
 something changed and nobody in the XFS world knew they had to now
 set an different initialisation flag on some structure or workqueue
 to make it work the way it needed to work.

 Go back an look at the history of sys_sync() in suspend discussions
 over the past 10 years.  You'll find me saying exactly the same
 thing again and again about sys_sync(): it does not guarantee the
 filesystem is in an idle or coherent, unchanging state, and nothing
 in the suspend code tells the filesystem to enter an idle or frozen
 state. We actually have mechanisms for doing this - we use it in the
 storage layers to idle the filesystem while we do things like *take
 a snapshot*.

 What is the mechanism suspend to

Re: [PATCH 1/1] suspend: delete sys_sync()

2015-05-14 Thread NeilBrown

On Fri, 15 May 2015 09:54:26 +1000 Dave Chinner  wrote:

> ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
> > On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner  wrote:
> > 
> > > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
> > > > From: Len Brown 
> > > > 
> > > > Remove sys_sync() from the kernel's suspend flow.
> > > > 
> > > > sys_sync() is extremely expensive in some configurations,
> > > > and so the kernel should not force users to pay this cost
> > > > on every suspend.
> > > 
> > > Since when? Please explain what your use case is that makes this
> > > so prohibitively expensive it needs to be removed.
> > > 
> > > > 
> > > > The user-space utilities s2ram and s2disk choose to invoke sync() today.
> > > > A user can invoke suspend directly via /sys/power/state to skip that 
> > > > cost.
> > > 
> > > So, you want to have s2disk write all the dirty pages in memory to
> > > the suspend image, rather than to the filesystem?
> > > 
> > > Either way you have to write that dirty data to disk, but if you
> > > write it to the suspend image, it then has to be loaded again on
> > > resume, and then written again to the filesystem the system has
> > > resumed. This doesn't seem very efficient to me
> > > 
> > > And, quite frankly, machines fail to resume from suspne dall the
> > > time. e.g. run out of batteries when they are under s2ram
> > > conditions, or s2disk fails because a kernel upgrade was done before
> > > the s2disk and so can't be resumed. With your change, users lose all
> > > the data that was buffered in memory before suspend, whereas right
> > > now it is written to disk and so nothing is lost if the resume from
> > > suspend fails for whatever reason.
> > > 
> > > IOWs, I can see several good reasons why the sys_sync() needs to
> > > remain in the suspend code. User data safety and filesystem
> > > integrity is far, far more important than a couple of seconds
> > > improvement in suspend speed
> > 
> > To be honest, this sounds like superstition and fear, not science and fact.
> > 
> > "filesystem integrity" is not an issue for the fast majority of filesystems
> > which use journalling to ensure continued integrity even after a crash.  I
> > think even XFS does that :-)
> 
> It has nothing to do with journalling, and everything to do with
> bring filesystems to an *idle state* before suspend runs.  We have a
> long history of bug reports with XFS that go: suspend, resume, XFS
> almost immediately detects corruption, shuts down.
> 
> The problem is that "sync" doesn't make the filesystem idle - XFs
> has *lots* of background work going on, and if we aren't *real
> careful* the filesystem is still doing work while the hardware gets
> powerd down and the suspend image is being taken. the result is on
> resume that the on-disk filesystem state does not match the memory
> image pulled back from resume, and we get shutdowns.
> 
> sys_sync() does not guarantee a filesystem is idle - it guarantees
> the data in memory is recoverable, butit doesn't stop the filesystem
> from doing things like writing back metadata or running background
> cleaup tasks. If those aren't stopped properly, then we get into
> the state where in-memory and on-disk state get out of whack. And
> s2ram can have these problems too, because if there is IO in flight
> when the hardware is powered down, that IO is lost

This seems to be the nub of your complaint - yes?

Some storage devices don't handle suspend as well as they should and lose
requests resulting in corruption.  They should obviously be fixed, but it is
you who gets the problem reports and you are not in a position to fix them.
So you want a general solution that hides those problems.
sys_sync at suspend time is a sort-of solution because it flushes and waits
so there  is less in-flight IO immediately after a sys_sync and so less
opportunity for a bad device to stuff up.
But you seem to suggest that sys_sync isn't a complete solution and it
doesn't guarantee that xfs is not doing some background metadata IO.

Maybe a sensible thing to do would be to hook the "disk" devices into suspend
and have them flush their queue and possibly send a CACHE_FLUSH command.
That would provide more of a guarantee for you, and less of a cost for Len,
would it not?

Thanks,
NeilBrown



> 
> Every time some piece of generic infrastructure changes behaviour
> w.r.t. suspend/resume, we get a new set of problems being reported
> by users. It's extremely hard to test for these problems and it
> might take months of occasional corruption reports from a user to
> isolate it to being a suspend/resume problem.  It's a game of
> whack-a-mole, because quite often they come down to the fact that
> something changed and nobody in the XFS world knew they had to now
> set an different initialisation flag on some structure or workqueue
> to make it work the way it needed to work.
> 
> Go back an look at the history of sys_sync() in suspend discussions
>

1 2 >

1 - 100 of 142 matches

Mail list logo